In [None]:
''' Factor analysis is a regression method, which is applied to discover root causes
or hidden factors, to say why data is acting in a certain way. Or present in a dataset
but not observable.
Factors are latent variables, that are quite meaningful but are inferred and not directly
observable. These factors are actually synthetic representation of datasets with the 
extra dimensionality and information redundancy

Factor analysis Assumptions
1. Features are metric
2. Features are either continuos or ordinal
3. There is a ' r > 0.3 correlation between the features in the data set
4. Have more than 100 observations and  > 5 observations per feature
5. Sample is homogenous '''

In [None]:
''' Below we are using Factor analysis to uncover latent variables from the iris 
data set. A latent variable is a hidden variable that impacts how data is behaving.

--Iris Data set ---
Iris flowers (Labels) - Setosa , Versicolor, Virginica
Attributes (predictive features) - Sepal length, Sepal width, Petal length,Petal width

Inorder to reduce the dimensionality by uncovering the combination of features that
 contain the most information. Or most variance in the data set. These will be our
 factors or latent variables. '''

In [2]:
import numpy as np
import pandas as pd

import sklearn
from sklearn.decomposition import FactorAnalysis
from sklearn import datasets

In [3]:
# Below are the feature names on which we are doing factor analysis
iris = datasets.load_iris()
X = iris.data

variable_names = iris.feature_names
X[0:10,]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1]])

In [5]:
factor = FactorAnalysis().fit(X = X)

In [6]:

df = pd.DataFrame(factor.components_,columns=variable_names)
print(df)

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0           0.706989         -0.158005           1.654236           0.70085
1           0.115161          0.159635          -0.044321          -0.01403
2          -0.000000          0.000000           0.000000           0.00000
3          -0.000000          0.000000           0.000000          -0.00000
