# Chapter 5 - Dimensionality Reduction Methods

## Segment 1 - Explanatory factor analysis

## Background 
#### Factor Analysis
- used to identify factors from original data that can be used as features in models
    - In another words, explore data set in order to find root cause that explains why the data is acting in a certain way
    - AKA latent variables: 
        - important, inferred, not directly observable 
        - a linear combination of certain variables 
- Assumption 
    - features are metric
    - features are continuous or ordinal 
    - r>0.3 correlation between features in dataset
    - > 100 sample size and >5 sample per feature 
    - samples should be homogenous 
- Factor Loading:
    - get values close to 1 / -1: factors (latent vars) have strong influence on the variable 
    - close to 0: weakly influence on the variable 
    - if >1: the factor and the variable are very correlated 

In [1]:
import pandas as pd
import numpy as np

import sklearn
from sklearn.decomposition import FactorAnalysis

from sklearn import datasets

### Factor analysis on iris dataset

In [2]:
iris =  datasets.load_iris()

X = iris.data
variable_names = iris.feature_names

X[0:10,]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1]])

In [4]:
factor = FactorAnalysis().fit(X) # init a FactorAnalysis obj and fit the model 

DF = pd.DataFrame(factor.components_, columns=variable_names)


In [5]:
DF

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,0.706989,-0.158005,1.654236,0.70085
1,0.115161,0.159635,-0.044321,-0.01403
2,-0.0,0.0,0.0,0.0
3,-0.0,0.0,0.0,-0.0


#### Interpretaton:
 - factor 1 ( row 0) has very high score with 3 of the variables 
 - other factors are not having high score 
 - drop other factors and use factor one in later analysis