-
-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing sign stability in incremental PCA #742
Conversation
Also FYI, all the build failures appear to be unrelated to these changes. |
dask/dask#6658 is in now. Would this PR be a good place to start using it for Dask>2.27? |
Yep, I'll add the extra logic for it. Do you have a guideline in your head for how long you like to support older releases of dask before it would become appropriate to increment the current dask version bound at 2.4.0 to something more recent? I can see the tests and code for the different PCA implementations cleaning up nicely after that, but I wasn't sure if increasing that bound is deceptively complicated given https://github.com/dask/dask-ml/blob/master/ci/environment-3.6.yaml#L8. Perhaps py3.6 support is difficult beyond dask 2.4? |
It's more "when it becomes annoying" to support older versions of Dask than
a specific timespan.
It's pinned to 2.4 in that file since we test all of the minimum supported
versions on a single job (Python, Dask, scikit-learn, etc.)
…On Tue, Sep 22, 2020 at 8:16 AM Eric Czech ***@***.***> wrote:
Would this PR be a good place to start using it for Dask>2.27?
Yep, I'll add the extra logic for it.
Do you have a guideline in your head for how long you like to support
older releases of dask before it would become appropriate to increment the
current dask version bound at 2.4.0 to something more recent? I can see the
tests and code for the different PCA implementations cleaning up nicely
after that, but I wasn't sure if increasing that bound is deceptively
complicated given
https://github.com/dask/dask-ml/blob/master/ci/environment-3.6.yaml#L8.
Perhaps py3.6 support is difficult beyond dask 2.4?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#742 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIUJTW5WA2RBOMB5BDTSHCPT5ANCNFSM4RUXT2RA>
.
|
Hey @TomAugspurger, it isn't pretty but this should take care of it once dask 2.28 is out: d3c8b5f. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pushing a commit to change the comparison a bit, so that our job testing against Dask master hits this (right now the version on master is 2.27.0+7...
, which is strictly larger than 2.27, but less than 2.28.
Just the unrelated hyperband timeouts. Thanks! I'll write up a changelog and get a release out today. |
Thanks again @TomAugspurger. |
Hey @TomAugspurger, this includes some small changes to address the wrinkle I mentioned at #737 (comment). The "u vs v" choice in
svd_flip
does depend on whether or not you plan on using the right or left singular vectors and there is no way (I can find) to correct them both simultaneously. Ideally, all PCA functions use a "v-based" correction since the principal components are equivalent to the right singular vectors. I chose the "u-based" correction as the default in dask though, since that's what sklearn does, which leads to the somewhat awkward scenario at this level where you have to skip the correction in dask and then apply a similar one.It would be most convenient for this project (and what we're trying to do) if a "v-based" correction was the default. I added that in dask/dask#6658 and wanted to point that out here. That would obviate the need for any kind of svd-related sign correction in dask-ml after the next dask release.
FYI: The code before my changes was using
sklearn.svd_flip.u_based_decision=False
only for incremental pca to get the stability in signs that all PCA functions should probably have. I can't see any reason why that shouldn't be what we do across the board.