You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
p_components is a matrix that defines the first n principal components, where n is specified upon initialization. cumulated_variance is a vector that reflects the proportion of the total variance that is explained by the corresponding principal component and the ones that precede it, no matter how many PCs are kept.
Actual behavior
When n_components is specified, p_components is a matrix with n rows. This produces an incorrect result because the components are represented as the column vectors. The values of the cumulated_variance vector change depending on n_components, and always converge to 1.
Code to reproduce the behavior
This code uses my trajectory, but it illustrates the point. There are 238 atoms, and selecting 5 components. p_components is a 5x714 matrix where it should instead be a 714x5. The cumulated_variance converges to 1 even though the first 5 components do not actually explain 100% of the variation, but rather about 62%.
def_conclude(self):
self.cov/=self.n_frames-1e_vals, e_vects=np.linalg.eig(self.cov)
sort_idx=np.argsort(e_vals)[::-1]
self.variance=e_vals[sort_idx]
self.cumulated_variance= (np.cumsum(self.variance) /np.sum(self.variance)) # calculated before variance sliceself.variance=self.variance[:self.n_components]
self.cumulated_variance=self.cumulated_variance[:self.n_components]
self.p_components=e_vects[:, sort_idx[:self.n_components]] # all rows, slice of columnsself._calculated=True
And the documentation at line 136 should read "p_components: array, (n_atoms * 3, n_components)" instead of "p_components: array, (n_components, n_atoms * 3)"
The text was updated successfully, but these errors were encountered:
#2613)
- fixes#2623 and now correctly computes cumulated variance
- adds root mean square inner product and cumulative overlap method as ways to compare subspaces
Expected behavior
p_components is a matrix that defines the first n principal components, where n is specified upon initialization. cumulated_variance is a vector that reflects the proportion of the total variance that is explained by the corresponding principal component and the ones that precede it, no matter how many PCs are kept.
Actual behavior
When n_components is specified, p_components is a matrix with n rows. This produces an incorrect result because the components are represented as the column vectors. The values of the cumulated_variance vector change depending on n_components, and always converge to 1.
Code to reproduce the behavior
This code uses my trajectory, but it illustrates the point. There are 238 atoms, and selecting 5 components. p_components is a 5x714 matrix where it should instead be a 714x5. The cumulated_variance converges to 1 even though the first 5 components do not actually explain 100% of the variation, but rather about 62%.
Current version of MDAnalysis
MDAnalysis 0.21.1
Python 3.7.5
Ubuntu 18.04
Proposed solution
The _conclude function (line 272-281) of pca.py is currently:
Fix:
And the documentation at line 136 should read "p_components: array, (n_atoms * 3, n_components)" instead of "p_components: array, (n_components, n_atoms * 3)"
The text was updated successfully, but these errors were encountered: