Loadings Different (negative) vs SPSS/R Psych Lib results #89

Db-pckr · 2021-08-23T14:28:53Z

Based on a correlation matrix, the calculated results with factor_analyzer are different than when running in SPSS, as in they seem to be multiplied by (-1). Communalities are more or less equal.

Example Matrix below (12x12):

1.00 | 0.53 | 0.26 | 0.14 | 0.18 | 0.24 | 0.24 | 0.22 | 0.20 | 0.21 | 0.21 | 0.36
0.53 | 1.00 | 0.33 | 0.34 | 0.39 | 0.51 | 0.50 | 0.42 | 0.27 | 0.43 | 0.35 | 0.52
0.26 | 0.33 | 1.00 | 0.22 | 0.28 | 0.24 | 0.27 | 0.28 | 0.09 | 0.16 | 0.03 | 0.18
0.14 | 0.34 | 0.22 | 1.00 | 0.56 | 0.47 | 0.49 | 0.34 | 0.28 | 0.37 | 0.27 | 0.29
0.18 | 0.39 | 0.28 | 0.56 | 1.00 | 0.55 | 0.59 | 0.49 | 0.25 | 0.43 | 0.30 | 0.40
0.24 | 0.51 | 0.24 | 0.47 | 0.55 | 1.00 | 0.80 | 0.55 | 0.30 | 0.51 | 0.49 | 0.55
0.24 | 0.50 | 0.27 | 0.49 | 0.59 | 0.80 | 1.00 | 0.56 | 0.31 | 0.58 | 0.50 | 0.56
0.22 | 0.42 | 0.28 | 0.34 | 0.49 | 0.55 | 0.56 | 1.00 | 0.27 | 0.37 | 0.32 | 0.42
0.20 | 0.27 | 0.09 | 0.28 | 0.25 | 0.30 | 0.31 | 0.27 | 1.00 | 0.55 | 0.28 | 0.29
0.21 | 0.43 | 0.16 | 0.37 | 0.43 | 0.51 | 0.58 | 0.37 | 0.55 | 1.00 | 0.52 | 0.51
0.21 | 0.35 | 0.03 | 0.27 | 0.30 | 0.49 | 0.50 | 0.32 | 0.28 | 0.52 | 1.00 | 0.55
0.36 | 0.52 | 0.18 | 0.29 | 0.40 | 0.55 | 0.56 | 0.42 | 0.29 | 0.51 | 0.55 | 1.00

Code:

fa = FactorAnalyzer(method='minres', n_factors=1, rotation=None, is_corr_matrix=True, bounds=(0.005, 1)) fa.fit(fa_df) print(fa.loadings_)

Result:
[[-0.3883726 ]
[-0.66186571]
[-0.32924641]
[-0.55939523]
[-0.66587561]
[-0.81117504]
[-0.8457027 ]
[-0.6317671 ]
[-0.44712954]
[-0.69460134]
[-0.58560195]
[-0.69916123]]

SPSS Code produces almost the same result except each value *(-1) (i.e. positive value/absolute value), while get_communalities() returns mostly the same values (mostly because SPSS rounds values on display).

Any idea on what am I missing or what is the issue?

Thanks

The text was updated successfully, but these errors were encountered:

Db-pckr · 2021-09-01T11:04:39Z

Update: Tested vs R Psych library, same issue. Realized that the problem seems to occurr when the correlation matrix is all positive. If even 1 value from it is negative, the results match. Maybe this helps debug...

jbiggsets · 2021-09-01T14:39:30Z

Thanks for the follow up! I will look into this very soon.

jbiggsets · 2021-10-07T02:03:18Z

Update: Tested vs R Psych library, same issue. Realized that the problem seems to occurr when the correlation matrix is all positive. If even 1 value from it is negative, the results match. Maybe this helps debug...

I can't seem to reproduce this issue. For example, using the data provided above (data), here are my results:

import pandas as pd
from factor_analyzer import FactorAnalyzer

df = pd.DataFrame([[float(val) for val in row.split(' | ')]
                   for row in data.strip().split('\n')])

fa = FactorAnalyzer(method='minres',
                    n_factors=1,
                    rotation=None,
                    bounds=(0.005, 1),
                    is_corr_matrix=True).fit(df)
print(fa.loadings_)

[[0.3879858 ]
 [0.66334567]
 [0.32897377]
 [0.55966426]
 [0.66396016]
 [0.81430826]
 [0.8469053 ]
 [0.63367546]
 [0.44783303]
 [0.69420312]
 [0.58345214]
 [0.6963522 ]]

This matches R's psych library. Let me know if I'm missing something!

Db-pckr · 2021-10-07T08:22:26Z

I'm using:
pandas 1.2.4
numpy 1.20.2
python 3.8.10

If you're getting correct results I would guess it's because of an older numpy version to be honest, and how it is used internally in factor_analyzer.
Thanks!

celip38 · 2022-08-31T13:39:36Z

I encounter the same issue (negative factor loadings):

I'm using:
pandas 1.4.3
numpy 1.21.5
python 3.9.12

Which version of packages do you suggest for avoiding this please?

Thanks!

desilinguist · 2022-09-01T17:50:09Z

@celip38 please share your data, if possible, so we can try to reproduce the issue.

Db-pckr · 2022-09-02T08:06:40Z

@desilinguist You can use the data I presented above to test this problem.

desilinguist · 2022-09-03T17:41:00Z

Thanks @Db-pckr. I can replicate this on my end too with the latest numpy library.

I poked around a bit and found that numpy.linalg.eigh() used for the eigenvalue decomposition was returning an all-negative first eigenvector for this correlation matrix whereas if the more general – but less efficient – numpy.linalg.eig() returns an all-positive first eigenvector, viz.

With eigh():

array([[-0.17816009],
       [-0.30460323],
       [-0.15106223],
       [-0.25699352],
       [-0.3048854 ],
       [-0.37392409],
       [-0.3888924 ],
       [-0.2909789 ],
       [-0.20564148],
       [-0.31877273],
       [-0.26791673],
       [-0.31975958]])

and, with eig():

array([[0.17816009],
       [0.30460323],
       [0.15106223],
       [0.25699352],
       [0.3048854 ],
       [0.37392409],
       [0.3888924 ],
       [0.2909789 ],
       [0.20564148],
       [0.31877273],
       [0.26791673],
       [0.31975958]])```

However, neither is incorrect because, as we know, if $v$ is an eigenvector, then so is $\alpha*v$, where $\alpha$ is any scalar ( $\neq 0$ ). It also follows that signs on factor loadings are also kind of meaningless because all they do is flip the (already arbitrary) interpretation of the latent factor.

So, while we could replace eigh() with eig() to force the results to match what SPSS and R do, I am not convinced that we need to do that since this is not really a bug.

@jbiggsets any thoughts?

jbiggsets · 2022-09-03T18:36:20Z

Yes, I would be inclined not to change this, since it doesn't really strike me as a bug and we use eigh pretty consistently throughout. Maybe we can mention it in the documentation?

desilinguist · 2022-09-03T18:46:12Z

Adding to the documentation sounds like a good idea. I'll do that!

celip38 · 2022-09-05T07:16:58Z

Thanks a lot!

Db-pckr changed the title ~~Loadings Different (negative) vs SPSS result~~ Loadings Different (negative) vs SPSS/R Psych Lib results Sep 1, 2021

desilinguist added this to To do in Release 0.4.1 Sep 5, 2022

desilinguist self-assigned this Sep 5, 2022

desilinguist moved this from To do to In progress in Release 0.4.1 Sep 8, 2022

desilinguist mentioned this issue Sep 8, 2022

Update documentation and add random seed #114

Merged

desilinguist closed this as completed in #114 Sep 8, 2022

Release 0.4.1 automation moved this from In progress to Done Sep 8, 2022

nachomaiz mentioned this issue Feb 8, 2024

Comparison with SPSS #120

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loadings Different (negative) vs SPSS/R Psych Lib results #89

Loadings Different (negative) vs SPSS/R Psych Lib results #89

Db-pckr commented Aug 23, 2021 •

edited

Db-pckr commented Sep 1, 2021

jbiggsets commented Sep 1, 2021

jbiggsets commented Oct 7, 2021

Db-pckr commented Oct 7, 2021

celip38 commented Aug 31, 2022

desilinguist commented Sep 1, 2022

Db-pckr commented Sep 2, 2022

desilinguist commented Sep 3, 2022 •

edited

jbiggsets commented Sep 3, 2022

desilinguist commented Sep 3, 2022

celip38 commented Sep 5, 2022

Loadings Different (negative) vs SPSS/R Psych Lib results #89

Loadings Different (negative) vs SPSS/R Psych Lib results #89

Comments

Db-pckr commented Aug 23, 2021 • edited

Db-pckr commented Sep 1, 2021

jbiggsets commented Sep 1, 2021

jbiggsets commented Oct 7, 2021

Db-pckr commented Oct 7, 2021

celip38 commented Aug 31, 2022

desilinguist commented Sep 1, 2022

Db-pckr commented Sep 2, 2022

desilinguist commented Sep 3, 2022 • edited

jbiggsets commented Sep 3, 2022

desilinguist commented Sep 3, 2022

celip38 commented Sep 5, 2022

Db-pckr commented Aug 23, 2021 •

edited

desilinguist commented Sep 3, 2022 •

edited