New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2 PCA implementations that give same results but different from Python scikit-learn implementation ... #81
Comments
Maybe this is an odd suggestion, but to debug this can we write a simpler test, say, with the identity matrix as ( -1/sqrt(2), 1/sqrt(2) ) which, interestingly, with [[-0.70710678 0.70710678] |
Hi, @SergeStinckwich I looked into the failing test for the Jacobi Transformation form of PCA a little deeper and noted that transposing and negating the matrix of eigenvectors results in the test passing: The Scikit Learn library, from what I can see, uses the SVD implementation which involves flipping signs. For this test, then, does it make sense to directly compare the Jacobi matrix of eigenvectors with their SVD computed one? Might we instead assert that the negated transpose of Jacobi equals the SVD one? Please forgive my ignorance- I'm not too familiar with this domain, so I'm sure I'm missing something! |
@SergeStinckwich I noted a similar thing with PolyMath's PCA implementation using SVD, where transposing and negating the returned matrix from This phenomenon is interesting, I think, perhaps worth investigating. Also I note that |
Thank you for spending time to investigate more on this. I forget a little bit about the details of implementation. I will have to spend some time to remember what I have done here. |
Thank you for your help on this, @SergeStinckwich. The code is really well-written compared to the Python library. I read this paper on the subject and your work was very easy to follow from that. So far the difference between your work and the Python one is just |
Hi, @SergeStinckwich, Given that in SciKit-Learn,
(the first two columns from PolyMath match SciKit-Learn's) the first thing to note is the |
On further investigation, when we compute the |
In Mathematica if we use SVD: m = {{-1, -1}, {2, -1}, {-3, -2}, {1, 1}, {2, 1}, {3, 2}}
{{-1, -1}, {2, -1}, {-3, -2}, {1, 1}, {2, 1}, {3, 2}}
{u, w, v} = SingularValueDecomposition[N[m]]
{{{0.227413, -0.184384, -0.602751, 0.246542, 0.356209,
0.602751}, {-0.204296, -0.949274, 0.0746245,
0.109693, -0.184317, -0.0746245}, {0.598729, -0.113805, 0.705625,
0.128753, 0.165622, 0.294375}, {-0.227413, 0.184384, 0.106997,
0.942819, -0.0498161, -0.106997}, {-0.371316, -0.070579,
0.187377, -0.0715718, 0.884194, -0.187377}, {-0.598729, 0.113805,
0.294375, -0.128753, -0.165622, 0.705625}}, {{6.01037, 0.}, {0.,
1.96863}, {0., 0.}, {0., 0.}, {0., 0.}, {0.,
0.}}, {{-0.86491, -0.501927}, {-0.501927, 0.86491}}}
v
{{-0.86491, -0.501927}, {-0.501927, 0.86491}} |
Action points now:
|
I tried Wolfram Alpha and our V matrix matches theirs. |
@SergeStinckwich In relation to the first of our action points, to find another example as an acceptance test, I tried the example in section 3.1 of the PCA tutorial. The PCA of scikit learn matches the output published in the above paper almost exactly:.
This is not the case for PolyMath, sadly, for both implementations. However, as we've discovered, the problem is a bit further up, in how we're computing the eigenvectors. |
@SergeStinckwich Out of curiosity, I used the mean centred data from the PCA tutorial:
and for the SVD-based implementation of PCA, PolyMath's output is correct up to negation: Similarly for the Jacobi implementation, again our output is correct up to the negation: |
On point 3, Scikit-Learn uses scipy for the SVD part of the PCA. In turn scipy delegates to LAPACK driver routines and according to the documentation, by default, it uses |
We have 2 implementations of PCA :
They give the same results but the result are different from the one you can find with sci-kit learn in Python:
pca.components_
returns :pca.transform(X)
returns:I try to implement a flipsvd method like this one : https://github.com/scikit-learn/scikit-learn/blob/4c65d8e615c9331d37cbb6225c5b67c445a5c959/sklearn/utils/extmath.py#L609
but fails until now.
Please have a look to tests of
PMPrincipalComponentAnalyserTest
.The text was updated successfully, but these errors were encountered: