# Factor Analysis
**Raphael Kreft** - 23.05.2022

In this notebook, we will discover a technology, widely used across various areas such as Machine Learning, Data Analysis and Drug Discovery. During our exploration of Factor-Analysis we will use the following questions as guidelines:

- What is factor analysis?
- What are the relationships between covariance matrix, factor analysis, and principal component analysis (PCA)?
- What do we mean with loadings?
- Why factors are orthogonal to each other? What is the consequence?
- How can we use factor analysis as a generative model?
- What is the relationship between factor analysis and autoencoder?
- How can you it explain factor analysis to a high-school student?

## What is Factor Analysis

Factor Analysis originally was invented to analyse intelligence tests. Based on the results of an IQ Test, the psychologist Charles Spearman deduced, that a good portion of the Test results was explainable by just one personal factor, the so-called g-factor(general factor). The Idea of one or multiple factors that are accurately explaining observed data(but not in the data explicitly) was further developed and became an important technique in the field of descriptive statistics where we consider the following setting:

Consider a Dataset $X$ containing samples $x_i\; i=1..N$ where $N$ is the size of the Dataset. The Dataset has different "Columns" that model a specific piece of Information, for example temperature, height or price. Each sample is associated with values for each of the columns. You can imagine a Dataset as a table, where each row contains one sample, and each column contains the values for a specific variable. The following example shows an example dataset about penguins:

In [1]:
import seaborn as sns

dataset = sns.load_dataset('penguins')
dataset

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female
...,...,...,...,...,...,...,...
339,Gentoo,Biscoe,,,,,
340,Gentoo,Biscoe,46.8,14.3,215.0,4850.0,Female
341,Gentoo,Biscoe,50.4,15.7,222.0,5750.0,Male
342,Gentoo,Biscoe,45.2,14.8,212.0,5200.0,Female


The entries of one column can be interpreted as samples from an underlying random variable.

As mentioned above, the goal of factor analysis is to find hidden factors for explicitly observed data. We assume the existance of a set of hidden variables that explains the observed correlation and interrelations between the observed variables.

**An example:** Imagine you are a big company that serves online content streaming. Like all other hip tech companies you want to have a recommendation system that recommends users what video they could watch next. For this sake, you are given a Dataset containing the name of people and a movie they watched when. Factor analysis can then be used to determine and isolate factors that predispose a person to like a specific kind of video. A question one can ask is: Are the weekday, time and age all good measures for the watchtime a person has per week?

### General Approach to factor Analysis

In general, Factor Analysis consists of two steps: 
1. The **Factor extraction** and 
2. the **Factor Rotation**. 

Where the goal of factor extraction is to find the latent factors, Factor Rotation pursues a better interpretability of factors by simplifying their structure.

## What are the relationships between covariance matrix, factor analysis and PCA?

The covariance of two variables X, Y describes how much two variables vary together: $$\sigma(x, y) = \frac{1}{n-1}\sum_i=1^n (x_i -x_{mean})(y_i-y_{mean})$$.

A **Covariance Matrix** $C^{d\dot d}$ where d is the number of variables/columns in a dataset, contains the pairwise covariance between all Variables. For a Dataset with two Variables, X and Y the Covariance Matrix looks like that: 
$$C = \left[ {\begin{array}{cc}
    \sigma(x,x) & \sigma(x,y) \\
    \sigma(y,x) & \sigma(y,y) \\
  \end{array} } \right]$$

Exactly these correlations describe the interrelationships between the variables. Factor analysis aims at finding (fewer) latent variables/factors that can explain and model these correlations / interrelationships. 

There are **different Methods to tackle factor extraction, which differ in the assumptions made about variance**. Starting with a Total Variance of a Dataset, We differentiate between Common Variance and Unique Variance.
1. Common variance: Is a Variance that is shared by a set of Variables
2. Unique variance: Is the variance inherent to one variable. Here we further differentiate between Error and specific variance
![Diagram Showing Vriance Types](https://stats.oarc.ucla.edu/wp-content/uploads/2018/05/fig02d.png)

One Method to find factors is **Principal Component Analysis(PCA)**. PCA is a famous unsupervised Machine Learning Technique that is often reffered to as dimensionality reduction technique. PCA will output the directions of maximal variance in the dataset, also called Principal Components. The Principal Components are then the latent factors. PCA assumes that there is just a common Variance and never bothers about Unique variance.

Another technique that also accounts for Unique Variance is **Common Factor Analysis**.


## What do we mean with loadings?

A factor loading, also called factor score **measures how strong a connection between a specific factor and an observed variable is**. It correlates a latent factor with a common observed variable. If a factor score is high, we know that it has a strong correlation between itself and the specific variable for that the factor score is denoted.

Loadings are an important measure that allows to interpret the results of factor analysis. To pick up the example from above: Having a high loading between the observed variable watchtime and a latent factor, we can be quite sure that this factor heavily influences/determines the outcome of the observed variable.

## Why are factors orthogonal to each other? What is the consequence?

When a factor would not be orthogonal to each other, it would mean that there would be a correlation between them. Thus the factors would themselves again have a factor. As we consider the factors to be independent of each other we need them to be orthogonal.

## How can we use factor analysis as a generative model ? 

Factor Analysis aims at finding a set of factors that can model the explicitly observed variables and samples in a dataset.
After obtaining a set of factors, we can use a linear combination of those to generate arbitrary samples. In case of PCA, we can use a linear combination of Principal Components to either reconstruct samples from the dataset or generate new samples by choosing individual linear factors.

In Linear-Algebra terms, the Principal Components form an orthogonal Basis of a vector-space which the data lives in.

## What is the relationship between factor analysis and autoencoders?

Autoencoders are a type of Neural Network Architecture where the goal is to learn a representation of data. They consist of 3 Parts: The encoder, the latent space representation and the decoder. 

To train the network, it gets fed with data samples ex images. The encoder compresses the input into a latent representation. From this representation the decoder tries to reconstruct the original input. With more samples and time, the network gets better and better and the reconstructed output will cerainly look like the original input.

Autoencoders are applicaple in a variety of areas, such as compression Algorithms or generative models such as Github Copilot.

![Autoencoder Architecture](https://www.compthree.com/images/blog/ae/ae.png)

Similar to factor analysis, autoencoders learn a latent representation. Autoencoders learn efficient encodings to represent high dimensional data by a low dimensional space while losing as few information as possible. They are superior to PCA when they work with highly non-linear and complex data.