Fixing sign stability in incremental PCA #742

eric-czech · 2020-09-21T16:27:37Z

Hey @TomAugspurger, this includes some small changes to address the wrinkle I mentioned at #737 (comment). The "u vs v" choice in svd_flip does depend on whether or not you plan on using the right or left singular vectors and there is no way (I can find) to correct them both simultaneously. Ideally, all PCA functions use a "v-based" correction since the principal components are equivalent to the right singular vectors. I chose the "u-based" correction as the default in dask though, since that's what sklearn does, which leads to the somewhat awkward scenario at this level where you have to skip the correction in dask and then apply a similar one.

It would be most convenient for this project (and what we're trying to do) if a "v-based" correction was the default. I added that in dask/dask#6658 and wanted to point that out here. That would obviate the need for any kind of svd-related sign correction in dask-ml after the next dask release.

FYI: The code before my changes was using sklearn.svd_flip.u_based_decision=False only for incremental pca to get the stability in signs that all PCA functions should probably have. I can't see any reason why that shouldn't be what we do across the board.

eric-czech · 2020-09-21T17:33:54Z

Also FYI, all the build failures appear to be unrelated to these changes.

TomAugspurger · 2020-09-22T11:31:22Z

dask/dask#6658 is in now. Would this PR be a good place to start using it for Dask>2.27?

eric-czech · 2020-09-22T13:16:29Z

Would this PR be a good place to start using it for Dask>2.27?

Yep, I'll add the extra logic for it.

Do you have a guideline in your head for how long you like to support older releases of dask before it would become appropriate to increment the current dask version bound at 2.4.0 to something more recent? I can see the tests and code for the different PCA implementations cleaning up nicely after that, but I wasn't sure if increasing that bound is deceptively complicated given https://github.com/dask/dask-ml/blob/master/ci/environment-3.6.yaml#L8. Perhaps py3.6 support is difficult beyond dask 2.4?

TomAugspurger · 2020-09-22T14:36:01Z

It's more "when it becomes annoying" to support older versions of Dask than a specific timespan. It's pinned to 2.4 in that file since we test all of the minimum supported versions on a single job (Python, Dask, scikit-learn, etc.)

…

On Tue, Sep 22, 2020 at 8:16 AM Eric Czech ***@***.***> wrote: Would this PR be a good place to start using it for Dask>2.27? Yep, I'll add the extra logic for it. Do you have a guideline in your head for how long you like to support older releases of dask before it would become appropriate to increment the current dask version bound at 2.4.0 to something more recent? I can see the tests and code for the different PCA implementations cleaning up nicely after that, but I wasn't sure if increasing that bound is deceptively complicated given https://github.com/dask/dask-ml/blob/master/ci/environment-3.6.yaml#L8. Perhaps py3.6 support is difficult beyond dask 2.4? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#742 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOIUJTW5WA2RBOMB5BDTSHCPT5ANCNFSM4RUXT2RA> .

eric-czech · 2020-09-22T17:10:41Z

Hey @TomAugspurger, it isn't pretty but this should take care of it once dask 2.28 is out: d3c8b5f.

TomAugspurger

I'm pushing a commit to change the comparison a bit, so that our job testing against Dask master hits this (right now the version on master is 2.27.0+7..., which is strictly larger than 2.27, but less than 2.28.

dask_ml/_compat.py

TomAugspurger · 2020-09-24T14:36:39Z

Just the unrelated hyperband timeouts. Thanks!

I'll write up a changelog and get a release out today.

eric-czech · 2020-09-25T00:12:44Z

Thanks again @TomAugspurger.

eric-czech added 3 commits September 21, 2020 11:21

Update svd tests for svd_flip

4b591b9

Updating PCA for svd_flip and unknown array shapes

20a64b1

Add svd_flip based on right singular vectors in incremental PCA

44df0ca

Add dask >= 2.28 logic in incremental PCA

d3c8b5f

eric-czech mentioned this pull request Sep 24, 2020

Add compute flag to TruncatedSVD #743

Merged

TomAugspurger reviewed Sep 24, 2020

View reviewed changes

dask_ml/_compat.py Outdated Show resolved Hide resolved

Update dask_ml/_compat.py

0d7d492

TomAugspurger merged commit bef33d6 into dask:master Sep 24, 2020

eric-czech mentioned this pull request Oct 2, 2020

PCA for short-fat arrays #731

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing sign stability in incremental PCA #742

Fixing sign stability in incremental PCA #742

eric-czech commented Sep 21, 2020 •

edited

Loading

eric-czech commented Sep 21, 2020 •

edited

Loading

TomAugspurger commented Sep 22, 2020

eric-czech commented Sep 22, 2020

TomAugspurger commented Sep 22, 2020 via email

eric-czech commented Sep 22, 2020

TomAugspurger left a comment

TomAugspurger commented Sep 24, 2020

eric-czech commented Sep 25, 2020

Fixing sign stability in incremental PCA #742

Fixing sign stability in incremental PCA #742

Conversation

eric-czech commented Sep 21, 2020 • edited Loading

eric-czech commented Sep 21, 2020 • edited Loading

TomAugspurger commented Sep 22, 2020

eric-czech commented Sep 22, 2020

TomAugspurger commented Sep 22, 2020 via email

eric-czech commented Sep 22, 2020

TomAugspurger left a comment

Choose a reason for hiding this comment

TomAugspurger commented Sep 24, 2020

eric-czech commented Sep 25, 2020

eric-czech commented Sep 21, 2020 •

edited

Loading

eric-czech commented Sep 21, 2020 •

edited

Loading