Skip to content

Conversation

@sergeyklay
Copy link
Contributor

@sergeyklay sergeyklay commented Aug 18, 2025

The docstring for the dask_ml.cluster.KMeans class currently omits random as a valid option for the init parameter. However, this option is fully implemented in the underlying k_init function and serves as a critical scalable alternative to the default k-means||, which can overwhelm the scheduler on large datasets.

These changes make the documentation accurately reflect the implementation and help users make better-informed decisions when choosing an initialization strategy for large-scale clustering tasks.

Also covers #918

The docstring for the dask_ml.cluster.KMeans class currently omits 'random'
as a valid option for the init parameter. However, this option is fully
implemented in the underlying k_init function and serves as a critical
scalable alternative to the default 'k-means||', which can overwhelm the
scheduler on large datasets.
@sergeyklay
Copy link
Contributor Author

@TomAugspurger Could you please take a look

Copy link
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@TomAugspurger TomAugspurger merged commit fcc8111 into dask:main Sep 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants