Skip to content

Conversation

@TomAugspurger
Copy link
Member

Implements k-means||, a scalable k-means initialization alternative to K-Means++

except ImportError:
pass
else:
coloredlogs.install()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mrocklin@carbon:~/workspace/dask-ml$ python benchmarks/k_means_kdd.py 
Traceback (most recent call last):
  File "benchmarks/k_means_kdd.py", line 12, in <module>
    import coloredlogs
ModuleNotFoundError: No module named 'coloredlogs'

else:
logger.info("Uploading to cloud storage")
upload(local, fs)
path = "dask-data/kddcup/kdd.parq/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running this with a dask cluster on localhost and ran into errors here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, missing the s3://.

Some context, the kdd-cup dataset is the largest dataset used in the k-means|| paper, but it isn't that big... I can cluster the entire dataset just fine on my laptop.

I just setup a cluster to benchmark k-means on the airlines dataset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't necessarily need to engage s3 here though, no? I wanted to run this locally just to see the diagnostic dashboard (things look great by the way) and ran into issues here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I remember now. I assumed that using the distributed scheduler implied a remote cluster. I'll have to refactor it a bit more then.

@TomAugspurger
Copy link
Member Author

TomAugspurger commented Oct 9, 2017 via email

@TomAugspurger
Copy link
Member Author

For some reason, travis is pickup up a2541e4 as the merge commit, which is for #15 . Opening a new PR.

@TomAugspurger TomAugspurger deleted the k-means branch October 18, 2017 12:50
TomAugspurger pushed a commit to TomAugspurger/dask-ml that referenced this pull request Oct 17, 2019
TomAugspurger pushed a commit to TomAugspurger/dask-ml that referenced this pull request Oct 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants