Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Gist showing some comparisons on PCA: https://gist.github.com/dced18b10aa28cba434771a37a3576f7
Dask is slower on small datasets (1,000 x 500), but quite a bit faster on multi-GB datasets. For 1 100,000 x 5,000 array of doubles (4GB), dask took 11s vs. 35s for scikit-learn, using the 'randomized' solvers (svd_compressed for dask, Halko et. al for scikit-learn).
Nov 8, 2017
Scikit-learn's solvers seem to have a higher degree of precision. In most of the tests, I needed to adjust the tolerence from something like
I wish I had kept better notes on the patches I made to the scikit-learn test suite as I was going through. I have another branch that redid everything in a more careful way, but it was taking a while to re-do everything. TomAugspurger@07bfcb2 shows what the required differences for accuracy were like. The other commits in that branch do things like remove arpacak-specific solvers, remove tests where