### Sample program for Factor Analysis  

#### Import libraries  

In [7]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from factor_analyzer import FactorAnalyzer

ModuleNotFoundError: No module named 'matplotlib'

#### Parameters  

In [None]:
csv_in = 'subjects5.csv'

%config InlineBackend.figure_formats = {'png', 'retina'}  # for high-reso graph

#### Read CSV data  

In [None]:
df = pd.read_csv(csv_in, delimiter=',', skiprows=0, header=0)
print(df.shape)
print(df.info())
display(df.head())

#### Factor analysis  

In [None]:
fa = FactorAnalyzer(n_factors=2, rotation='varimax', method='ml')
#fa = FactorAnalyzer(n_factors=2, rotation='varimax', method='minres')
fa.fit(df.values)

#### Correlation matrix (相関行列)   

In [None]:
df_corr = df.corr(method='pearson')
display(df_corr)

#### Eigenvalues (固有値)  

In [None]:
eigen_org, eigen_cf = fa.get_eigenvalues()
ser_eigen_org = pd.Series(eigen_org)
ser_eigen_cf = pd.Series(eigen_cf)
print(ser_eigen_org)
print(ser_eigen_cf)

#### Scree plot (スクリープロット)  

In [None]:
x = np.array(range(len(ser_eigen_org)))+1
plt.plot(x, ser_eigen_org, marker='o')
plt.xlabel('Eigenvalue No.')
plt.ylabel('Eigenvalue')
plt.show()

**Number of factors: two seems to be appropriate because the third eigenvalue is less than 1 and also the graph becomes gently-sloping at the third eigenvalue.**  
**3つめの固有値は1を下回っていて、かつそこからグラフがなだらかになっているので、因子数は2が良さそう**  

#### Loadings (因子負荷量)  

In [None]:
loadings = fa.loadings_
df_loadings = pd.DataFrame(loadings, index=df.columns,
                           columns=['Factor1','Factor2'])
display(df_loadings)

#### Scores (因子得点)  

In [None]:
scores = fa.transform(df)
df_scores = pd.DataFrame(scores, columns=['Factor1','Factor2'])
print(df_scores.shape)
display(df_scores.head())

#### Uniquenesses (独自因子の割合)  

In [None]:
uniqueness = fa.get_uniquenesses()
ser_uniqueness = pd.Series(uniqueness, index=df.columns)
print(ser_uniqueness)

**Not so high uniqueness for all data,  
so all data are somehow affected by the common factors**

#### Contribution of each factor (各共通因子の寄与)  
- Variance: 因子負荷量平方和(因子寄与)  
- Proportion Var: 寄与率  
- Cumulative Var: 累積寄与率

In [None]:
fa_var = fa.get_factor_variance()
df_fa_var = pd.DataFrame(fa_var,
                         index=['var', 'prop_var', 'cum_var'],
                         columns=['Factor1', 'Factor2'])
display(df_fa_var)

**More than 78% of total variance can be explained by Factor1 and Factor2**  

In [None]:
# slightly modified from biplot() in pca_and_biplot.ipynb of DM-08
def biplot_fa(score_2d, loadings, load_labels=None):
    plt.figure(figsize=(10,10))
    r1 = 1.5
    r2 = 1.01
    if load_labels is None:
        load_labels = range(len(loadings))
    for i, coef in enumerate(loadings):
        plt.arrow(0, 0, coef[0]*r1, coef[1]*r1, color='r')    
        plt.text(coef[0]*r1*r2, coef[1]*r1*r2, load_labels[i],
                 color='b', fontsize=20)
    for i in range(len(score_2d)):
        m = '${}$'.format(i)
        plt.scatter(score_2d[i,0], score_2d[i,1], marker=m, s=500, c='k')
    plt.xlabel('F_1')
    plt.ylabel('F_2')
    plt.grid()
    return None

In [None]:
biplot_fa(scores, loadings, load_labels=df.columns)

**According to loading (>=0.5) and biplot,  
F_1: Ability of Language, F_2: Ability of Science**

**No.5: good at science, not good at language      
No.10: not good at both science and language  
No.34,35,36,37,46: good at language, not good at science    
No.40: good at both science and language    
etc.**