# Chapter 11 - Exploratory Factor Analysis

In [2]:
import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

## Bartlett's Sphericity Test

Bartlett's Sphericity Test, is used to assess whether the correlation matrix of the dataset is an identity matrix. The test is often used to determine whether data is suitable for factor analysis or PCA.

    H0: the correlation matrix of the dataset is an identity matrix. 

In [4]:
pca_data = np.array([[0, 5.90], [.90, 5.40], [1.80, 4.40], [2.60, 4.60], [3.30, 3.50], 
                     [4.40, 3.70], [5.20, 2.80], [6.10, 2.80], [6.50, 2.40], [7.40, 1.50]])
pca_data

array([[0. , 5.9],
       [0.9, 5.4],
       [1.8, 4.4],
       [2.6, 4.6],
       [3.3, 3.5],
       [4.4, 3.7],
       [5.2, 2.8],
       [6.1, 2.8],
       [6.5, 2.4],
       [7.4, 1.5]])

In [5]:
import factor_analyzer
from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity

chi_square_value,p_value = calculate_bartlett_sphericity(pca_data)
chi_square_value, p_value

(23.01289493462775, 1.6091842267707586e-06)

#### Chi-square:

The chi-square statistic tells us how different the correlation matrix is from an identity matrix. A larger chi-square value indicates that the correlation matrix is significantly different from an identity matrix, meaning the variables in the dataset are correlated and the dataset is suitable for PCA or factor analysis.

#### P-value:

A low p-value (typically less than 0.05 or 0.01) indicates strong evidence against the null hypothesis, meaning the correlation matrix is likely not an identity matrix and that the data has meaningful correlations between the variables. In this case, the p-value is very small (around 1.61e-06), which suggests that the correlation matrix is significantly different from the identity matrix, which gives us reason to proceed
with the factor analysis. 

## Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy

The Kaiser-Meyer-Olkin (KMO) test is a measure of sampling adequacy used to assess whether the dataset is suitable for factor analysis or principal component analysis (PCA). The KMO value ranges from 0 to 1, with higher values indicating that factor analysis is more appropriate for the data.

Here's how to interpret the KMO value:

* KMO values between 0.8 and 1.0 suggest that the dataset is very appropriate for factor analysis or PCA (high sampling adequacy).
* KMO values between 0.6 and 0.8 suggest that the data is adequate but not excellent for factor analysis.
* KMO values between 0.5 and 0.6 suggest that the data might be suitable for factor analysis but might require some caution.
* KMO values below 0.5 indicate that the data is not appropriate for factor analysis.

In [8]:
from factor_analyzer.factor_analyzer import calculate_kmo

kmo_all,kmo_model = calculate_kmo(pca_data)
kmo_model

0.49999999999999994

In [9]:
from factor_analyzer import FactorAnalyzer

fa = FactorAnalyzer(rotation=None,     # Initialize the FactorAnalyzer object with no rotation (i.e., use the unrotated factor solution)
                    method='minres',  # Use the 'minres' method (Minimum Residual method) for factor extraction
                    n_factors=1)      # Specify the number of factors to extract (1 factor in this case)
fa.fit(pca_data)

In [10]:
fa.loadings_  # Factor loadings

array([[-0.98816761],
       [ 0.98816761]])

#### Interpretation of the Loadings:

    The factor loading of -0.98816761 for the first feature means that Feature 1 has a strong negative correlation with the extracted factor.

    The factor loading of 0.98816761 for the second feature means that Feature 2 has a strong positive correlation with the same extracted factor.

In [12]:
fa.get_communalities()  # Communalities for each variable in the factor analysis model

array([0.97647522, 0.97647522])

Represents the proportion of the variance in that variable that is explained by the factors. It is a measure of how much of the variance in a variable is accounted for by the factors extracted in the model.

Communality values range from 0 to 1:

    A high communality (close to 1) means that the variable is well represented by the factors, i.e., most of its variance is explained by the factors.

    A low communality (close to 0) means that the variable is not well represented by the factors and that a large part of its variance is unexplained by the factors.

Feature 1 and Feature 2 each have a communality of `0.97647522`, meaning that **approximately 97.6%** of the variance in each feature is explained by the extracted factor.

The remaining 2.4% of the variance in each feature is not explained by the extracted factor and is considered as "unique variance" (the part of the variance that is not captured by the factor).

## Exploratory Factor Analysis on USA Arrests Data

In [15]:
url = ('https://raw.githubusercontent.com/jordaan-c/Unsupervised-Learning-Final-Project/refs/heads/main/UsArrests.csv')

data = pd.read_csv(url)

data.head()

Unnamed: 0,City,Murder,Assault,UrbanPop,Rape
0,Alabama,13.2,236,58,21.2
1,Alaska,10.0,263,48,44.5
2,Arizona,8.1,294,80,31.0
3,Arkansas,8.8,190,50,19.5
4,California,9.0,276,91,40.6


In [16]:
df = data.drop(columns=['City'])  #Alternative: df = pd.DataFrame(data, columns=[‘Murder’, ‘Assault’, ‘UrbanPop’, ‘Rape’])

df.head()

Unnamed: 0,Murder,Assault,UrbanPop,Rape
0,13.2,236,58,21.2
1,10.0,263,48,44.5
2,8.1,294,80,31.0
3,8.8,190,50,19.5
4,9.0,276,91,40.6


In [17]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()  # Initialize a StandardScaler object to standardize the features of the dataset (mean=0, variance=1)
scaler.fit(df)             # Fit the StandardScaler to the dataset 'df', calculating the mean and standard deviation for each feature

In [18]:
scaled_data = scaler.transform(df)  # Transform the data in 'df' using the calculated mean and standard deviation from the scaler
scaled_data[:5] 

array([[ 1.25517927,  0.79078716, -0.52619514, -0.00345116],
       [ 0.51301858,  1.11805959, -1.22406668,  2.50942392],
       [ 0.07236067,  1.49381682,  1.00912225,  1.05346626],
       [ 0.23470832,  0.23321191, -1.08449238, -0.18679398],
       [ 0.28109336,  1.2756352 ,  1.77678094,  2.08881393]])

In [19]:
from factor_analyzer import FactorAnalyzer

fa = FactorAnalyzer()         # Initialize the FactorAnalyzer object to perform factor analysis

fa.set_params(n_factors=4,    # Set the number of factors to extract (4 factors)
              rotation=None)  # Specify that no rotation should be applied to the factor solution (unrotated solution)

fa.fit(scaled_data)           # Fit the FactorAnalyzer model to the scaled data, extracting 4 factors without rotation

In [20]:
fa.loadings_

array([[ 0.84370394, -0.37474146, -0.07321271,  0.        ],
       [ 0.91937036, -0.10367227,  0.17283121,  0.        ],
       [ 0.33167926,  0.54884826,  0.06269644,  0.        ],
       [ 0.78479778,  0.29235871, -0.15025673,  0.        ]])

#### Feature 1 (first row):

    Factor 1 has a loading of 0.84370394, meaning Feature 1 has a strong positive contribution to Factor 1.

    Factor 2 has a loading of -0.37474146, meaning Feature 1 has a moderate negative contribution to Factor 2.

    The other factors have smaller loadings, especially Factor 3 and Factor 4 where the values are close to zero, indicating minimal influence.

In [22]:
eigenvalue, eigenvector = fa.get_eigenvalues()

eigenvalue    # The amount of variance that each principal component explains in the data

array([2.48024158, 0.98976515, 0.35656318, 0.17343009])

    Factor 1 explains about 2.48 units of variance in the data.
    Factor 2 explains 0.99 units of variance in the data.

In [24]:
fa.get_communalities()

array([0.85762759, 0.88586043, 0.41517638, 0.72395827])

Communality is the proportion of variance in each feature that is explained by the extracted factors. It indicates how well the factors explain the variance in each of the original variables.

    Feature 1 has a communality of 0.86, meaning that approximately 86% of the variance in Feature 1 is explained by the extracted factors.

**Varimax** is a orthogonal rotation method used to make the factor loadings more interpretable by maximizing the variance of the squared loadings of a factor across the variables.

Without rotation, factor analysis can result in factors that are not easily interpretable because many features may contribute to each factor. Varimax rotation simplifies this by ensuring that each factor has strong loadings on only a few features and near-zero loadings on the others.

In [27]:
fa_varimax = FactorAnalyzer(rotation='varimax')  # Initialize the FactorAnalyzer object with Varimax rotation applied to the factors
fa_varimax.fit(scaled_data)                      # Fit the FactorAnalyzer model to the scaled data

In [28]:
fa_varimax.loadings_

array([[ 0.91516486,  0.02762848,  0.13905951],
       [ 0.88036791,  0.32020407, -0.09100618],
       [ 0.05909459,  0.64142576, -0.0160376 ],
       [ 0.56292624,  0.59517677,  0.22986286]])

In [29]:
fa_varimax.get_communalities()

array([0.85762759, 0.88586043, 0.41517638, 0.72395827])