Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loadings Different (negative) vs SPSS/R Psych Lib results #89

Closed
Db-pckr opened this issue Aug 23, 2021 · 11 comments · Fixed by #114
Closed

Loadings Different (negative) vs SPSS/R Psych Lib results #89

Db-pckr opened this issue Aug 23, 2021 · 11 comments · Fixed by #114
Assignees

Comments

@Db-pckr
Copy link

Db-pckr commented Aug 23, 2021

Based on a correlation matrix, the calculated results with factor_analyzer are different than when running in SPSS, as in they seem to be multiplied by (-1). Communalities are more or less equal.

Example Matrix below (12x12):

1.00 | 0.53 | 0.26 | 0.14 | 0.18 | 0.24 | 0.24 | 0.22 | 0.20 | 0.21 | 0.21 | 0.36
0.53 | 1.00 | 0.33 | 0.34 | 0.39 | 0.51 | 0.50 | 0.42 | 0.27 | 0.43 | 0.35 | 0.52
0.26 | 0.33 | 1.00 | 0.22 | 0.28 | 0.24 | 0.27 | 0.28 | 0.09 | 0.16 | 0.03 | 0.18
0.14 | 0.34 | 0.22 | 1.00 | 0.56 | 0.47 | 0.49 | 0.34 | 0.28 | 0.37 | 0.27 | 0.29
0.18 | 0.39 | 0.28 | 0.56 | 1.00 | 0.55 | 0.59 | 0.49 | 0.25 | 0.43 | 0.30 | 0.40
0.24 | 0.51 | 0.24 | 0.47 | 0.55 | 1.00 | 0.80 | 0.55 | 0.30 | 0.51 | 0.49 | 0.55
0.24 | 0.50 | 0.27 | 0.49 | 0.59 | 0.80 | 1.00 | 0.56 | 0.31 | 0.58 | 0.50 | 0.56
0.22 | 0.42 | 0.28 | 0.34 | 0.49 | 0.55 | 0.56 | 1.00 | 0.27 | 0.37 | 0.32 | 0.42
0.20 | 0.27 | 0.09 | 0.28 | 0.25 | 0.30 | 0.31 | 0.27 | 1.00 | 0.55 | 0.28 | 0.29
0.21 | 0.43 | 0.16 | 0.37 | 0.43 | 0.51 | 0.58 | 0.37 | 0.55 | 1.00 | 0.52 | 0.51
0.21 | 0.35 | 0.03 | 0.27 | 0.30 | 0.49 | 0.50 | 0.32 | 0.28 | 0.52 | 1.00 | 0.55
0.36 | 0.52 | 0.18 | 0.29 | 0.40 | 0.55 | 0.56 | 0.42 | 0.29 | 0.51 | 0.55 | 1.00

Code:

fa = FactorAnalyzer(method='minres', n_factors=1, rotation=None, is_corr_matrix=True, bounds=(0.005, 1)) fa.fit(fa_df) print(fa.loadings_)

Result:
[[-0.3883726 ]
[-0.66186571]
[-0.32924641]
[-0.55939523]
[-0.66587561]
[-0.81117504]
[-0.8457027 ]
[-0.6317671 ]
[-0.44712954]
[-0.69460134]
[-0.58560195]
[-0.69916123]]

SPSS Code produces almost the same result except each value *(-1) (i.e. positive value/absolute value), while get_communalities() returns mostly the same values (mostly because SPSS rounds values on display).

Any idea on what am I missing or what is the issue?

Thanks

@Db-pckr
Copy link
Author

Db-pckr commented Sep 1, 2021

Update: Tested vs R Psych library, same issue. Realized that the problem seems to occurr when the correlation matrix is all positive. If even 1 value from it is negative, the results match. Maybe this helps debug...

@Db-pckr Db-pckr changed the title Loadings Different (negative) vs SPSS result Loadings Different (negative) vs SPSS/R Psych Lib results Sep 1, 2021
@jbiggsets
Copy link
Collaborator

Thanks for the follow up! I will look into this very soon.

@jbiggsets
Copy link
Collaborator

Update: Tested vs R Psych library, same issue. Realized that the problem seems to occurr when the correlation matrix is all positive. If even 1 value from it is negative, the results match. Maybe this helps debug...

I can't seem to reproduce this issue. For example, using the data provided above (data), here are my results:

import pandas as pd
from factor_analyzer import FactorAnalyzer

df = pd.DataFrame([[float(val) for val in row.split(' | ')]
                   for row in data.strip().split('\n')])

fa = FactorAnalyzer(method='minres',
                    n_factors=1,
                    rotation=None,
                    bounds=(0.005, 1),
                    is_corr_matrix=True).fit(df)
print(fa.loadings_)
[[0.3879858 ]
 [0.66334567]
 [0.32897377]
 [0.55966426]
 [0.66396016]
 [0.81430826]
 [0.8469053 ]
 [0.63367546]
 [0.44783303]
 [0.69420312]
 [0.58345214]
 [0.6963522 ]]

This matches R's psych library. Let me know if I'm missing something!

@Db-pckr
Copy link
Author

Db-pckr commented Oct 7, 2021

I'm using:
pandas 1.2.4
numpy 1.20.2
python 3.8.10

If you're getting correct results I would guess it's because of an older numpy version to be honest, and how it is used internally in factor_analyzer.
Thanks!

@celip38
Copy link

celip38 commented Aug 31, 2022

I encounter the same issue (negative factor loadings):

I'm using:
pandas 1.4.3
numpy 1.21.5
python 3.9.12

Which version of packages do you suggest for avoiding this please?

Thanks!

@desilinguist
Copy link
Member

@celip38 please share your data, if possible, so we can try to reproduce the issue.

@Db-pckr
Copy link
Author

Db-pckr commented Sep 2, 2022

@desilinguist You can use the data I presented above to test this problem.

@desilinguist
Copy link
Member

desilinguist commented Sep 3, 2022

Thanks @Db-pckr. I can replicate this on my end too with the latest numpy library.

I poked around a bit and found that numpy.linalg.eigh() used for the eigenvalue decomposition was returning an all-negative first eigenvector for this correlation matrix whereas if the more general – but less efficient – numpy.linalg.eig() returns an all-positive first eigenvector, viz.

With eigh():

array([[-0.17816009],
       [-0.30460323],
       [-0.15106223],
       [-0.25699352],
       [-0.3048854 ],
       [-0.37392409],
       [-0.3888924 ],
       [-0.2909789 ],
       [-0.20564148],
       [-0.31877273],
       [-0.26791673],
       [-0.31975958]])

and, with eig():

array([[0.17816009],
       [0.30460323],
       [0.15106223],
       [0.25699352],
       [0.3048854 ],
       [0.37392409],
       [0.3888924 ],
       [0.2909789 ],
       [0.20564148],
       [0.31877273],
       [0.26791673],
       [0.31975958]])```

However, neither is incorrect because, as we know, if $v$ is an eigenvector, then so is $\alpha*v$, where $\alpha$ is any scalar ( $\neq 0$ ). It also follows that signs on factor loadings are also kind of meaningless because all they do is flip the (already arbitrary) interpretation of the latent factor.

So, while we could replace eigh() with eig() to force the results to match what SPSS and R do, I am not convinced that we need to do that since this is not really a bug.

@jbiggsets any thoughts?

@jbiggsets
Copy link
Collaborator

Yes, I would be inclined not to change this, since it doesn't really strike me as a bug and we use eigh pretty consistently throughout. Maybe we can mention it in the documentation?

@desilinguist
Copy link
Member

Adding to the documentation sounds like a good idea. I'll do that!

@celip38
Copy link

celip38 commented Sep 5, 2022

Thanks a lot!

@desilinguist desilinguist added this to To do in Release 0.4.1 Sep 5, 2022
@desilinguist desilinguist self-assigned this Sep 5, 2022
@desilinguist desilinguist moved this from To do to In progress in Release 0.4.1 Sep 8, 2022
Release 0.4.1 automation moved this from In progress to Done Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

4 participants