Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added PCA subspace comparison methods and fixed n_components selection #2613

Merged
merged 7 commits into from May 18, 2020

Conversation

lilyminium
Copy link
Member

@lilyminium lilyminium commented Mar 11, 2020

Fixes #2623

This could be helpful for looking at subspaces.

Changes made in this Pull Request:

  • method to calculate the root mean square inner product of subspaces
  • method to calculate the cumulative overlap of a vector in a subspace
  • fixed n_component selection of principal components as it was necessary for the subspace comparisons

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@codecov
Copy link

codecov bot commented Mar 11, 2020

Codecov Report

Merging #2613 into develop will increase coverage by 0.00%.
The diff coverage is 98.61%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #2613   +/-   ##
========================================
  Coverage    91.12%   91.13%           
========================================
  Files          175      175           
  Lines        23642    23708   +66     
  Branches      3090     3100   +10     
========================================
+ Hits         21544    21606   +62     
- Misses        1480     1482    +2     
- Partials       618      620    +2     
Impacted Files Coverage Δ
package/MDAnalysis/analysis/pca.py 96.68% <98.61%> (+1.39%) ⬆️
package/MDAnalysis/coordinates/GRO.py 93.46% <0.00%> (-1.97%) ⬇️
package/MDAnalysis/core/selection.py 99.47% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 375245c...5729efa. Read the comment docs.

@lilyminium lilyminium changed the title Added PCA subspace comparison methods Added PCA subspace comparison methods and fixed n_components selection Apr 11, 2020
if self._calculated:
if n is None:
n = len(self._variance)
self.variance = self._variance[:n]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to permanently change self.variance?

In a recent discussion with @ianmkenney and @VOD555 we found it confusing that the "total" variance might only be calculated from a subset of components. The docs were not very clear on this to start with. This can be confusing if you want to plot how much each PC contributes to the total observed variance but only selected, say, n=5.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, hence the addition of the new and static attributes _variance and _p_components. The cumulative variance is also calculated from the total _variance so that now it does not add up to 1 if n_components is not None. TBH I don't really get the idea of throwing away information that was expensive to compute and is already in memory, so setting n_components is more of a convenient way to truncate or extend the visible variance, cumulated_variance, and p_components. It does not alter their values. We could suggest to users that they manually set self._variance = self.variance if they want to save space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fix the issue that @ianmkenney and @VOD555 saw – guys, if you still see a problem here, please say something now.

n = len(self._variance)
self.variance = self._variance[:n]
self.cumulated_variance = (np.cumsum(self._variance) /
np.sum(self._variance))[:n]
Copy link
Member

@orbeckst orbeckst May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, this line should fix the issue that @ianmkenney and @VOD555 saw.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes suggested but I'll approve it already.

package/MDAnalysis/analysis/pca.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/pca.py Show resolved Hide resolved
@orbeckst orbeckst mentioned this pull request May 14, 2020
@orbeckst
Copy link
Member

@lilyminium Please merge when the CI is happy. Either squash-merge or consolidate your history yourself if you think it makes more sense as separate commits.

@lilyminium lilyminium merged commit 39b6385 into MDAnalysis:develop May 18, 2020
@lilyminium lilyminium deleted the rmsip branch May 18, 2020 05:16
@lilyminium
Copy link
Member Author

Thanks @orbeckst!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PCA p_components and cumulated_variance incorrect when n_components is specified
3 participants