## Factor Analysis (FA)

### Overview

Factor Analysis (FA) is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. FA aims to uncover the underlying relationships between observed variables by identifying these latent factors.

### Mathematical Foundations

#### 1. **Model Assumptions**

Given a dataset $X \in \mathbb{R}^{n \times p}$, FA assumes that the observed variables can be expressed as linear combinations of a few underlying factors plus noise:

$$ X = LF + \epsilon $$

where:
- $L \in \mathbb{R}^{p \times k}$ is the factor loading matrix.
- $F \in \mathbb{R}^{n \times k}$ is the matrix of factors.
- $\epsilon \in \mathbb{R}^{n \times p}$ is the matrix of error terms (specific variances).

Here, $k$ is the number of factors, and typically $k \ll p$.

#### 2. **Covariance Structure**

The covariance matrix of the observed variables can be decomposed into:

$$ \Sigma = LL^T + \Psi $$

where:
- $\Sigma$ is the covariance matrix of the observed data.
- $L$ is the factor loading matrix.
- $\Psi$ is a diagonal matrix representing the variances of the error terms.

#### 3. **Estimation of Factor Loadings**

FA involves estimating the factor loading matrix $L$ and the specific variances $\Psi$. The common methods for estimation include:

- **Maximum Likelihood Estimation (MLE)**: Optimizes the likelihood function of the observed data given the model parameters.
- **Principal Component Factor Analysis (PCFA)**: Uses Principal Component Analysis (PCA) to estimate initial factor loadings and then refines them.

### Example

Consider a dataset with three observed variables that are influenced by two underlying factors. The FA model can uncover these factors.

1. **Data Matrix**

   Suppose we have a data matrix $X$:

   $$
   X = \begin{bmatrix}
   x_{11} & x_{12} & x_{13} \\
   x_{21} & x_{22} & x_{23} \\
   \vdots & \vdots & \vdots \\
   x_{n1} & x_{n2} & x_{n3}
   \end{bmatrix}
   $$

2. **Estimate Factor Loadings**

   Using MLE or PCFA, we estimate the factor loading matrix $L$ and the specific variances $\Psi$.

   $$
   L = \begin{bmatrix}
   l_{11} & l_{12} \\
   l_{21} & l_{22} \\
   l_{31} & l_{32}
   \end{bmatrix}, \quad
   \Psi = \begin{bmatrix}
   \psi_{11} & 0 & 0 \\
   0 & \psi_{22} & 0 \\
   0 & 0 & \psi_{33}
   \end{bmatrix}
   $$

3. **Factor Scores**

   Compute the factor scores $F$ for each observation:

   $$
   F = X L^T (L L^T + \Psi)^{-1}
   $$

### When to Use FA

- **Exploratory data analysis**: To identify underlying factors that explain the observed correlations.
- **Data reduction**: To reduce the number of observed variables by modeling them with fewer factors.
- **Psychometrics and social sciences**: Commonly used to analyze survey and test data.

### How to Use FA

1. **Standardize the data** if necessary to have zero mean and unit variance.
2. **Determine the number of factors** to extract using criteria like the scree plot or eigenvalues.
3. **Choose the estimation method** (e.g., MLE, PCFA).
4. **Estimate the factor loading matrix** $L$ and specific variances $\Psi$.
5. **Compute factor scores** for each observation.
6. **Interpret the factors** to understand the underlying structure.

### Advantages

- **Identifies latent variables**: Helps to uncover hidden factors that explain observed correlations.
- **Data reduction**: Reduces the dimensionality of the data while retaining essential information.
- **Improves understanding**: Provides insights into the structure and relationships within the data.

### Disadvantages

- **Assumes linear relationships**: Assumes that the observed variables are linear combinations of factors.
- **Factor indeterminacy**: Factors are not uniquely determined and may require rotation for interpretation.
- **Complexity**: Choosing the number of factors and interpreting them can be challenging.

### Assumptions

1. **Linearity**: The observed variables are linear combinations of the factors plus error.
2. **Independence**: The factors are uncorrelated with each other.
3. **Normality**: The observed variables and errors are normally distributed (assumed in MLE).
4. **Identifiability**: The number of factors should be less than or equal to the number of observed variables.

### Conclusion

Factor Analysis (FA) is a powerful technique for identifying underlying factors that explain observed correlations among variables. By modeling observed data as linear combinations of latent factors, FA helps in data reduction and exploratory data analysis. While it requires careful consideration of assumptions and interpretation, FA's ability to reveal hidden structures makes it a valuable tool in various fields such as psychology, social sciences, and finance.