## Dimensionality Reduction Methods (Linear)

- Principal Component Analysis (PCA) (Linear)
- Independent Component Analysis (ICA) (Linear)
- Factor Analysis (Linear)
- Linear Discriminant Analysis (LDA) (Linear)

PCA (Principal Component Analysis): Popularly used for dimensionality reduction in continuous data, PCA rotates and projects data along the direction of increasing variance. The features with the maximum variance are the principal components.

Factor Analysis : a technique that is used to reduce a large number of variables into fewer numbers of factors. The values of observed data are expressed as functions of a number of possible causes in order to find which are the most important. The observations are assumed to be caused by a linear transformation of lower dimensional latent factors and added Gaussian noise.

LDA (Linear Discriminant Analysis) : projects data in a way that the class separability is maximised. Examples from same class are put closely together by the projection. Examples from different classes are placed far apart by the projection.

ICA (Independent Component Analysis) : transforms the dataset into columns of independent components. Blind Source Separation and the "cocktail party problem" are other names for it.


***

## PCA

#### When should you use PCA?

- Do you want to reduce the number of variables, but aren’t able to identify variables to completely remove from consideration?
- Do you want to ensure your variables are independent of one another?
- Are you comfortable making your independent variables less interpretable?
- If you answered “yes” to all three questions, then PCA is a good method to use. If you answered “no” to question 3, you should not use PCA.

Some more particular use cases for PCA include:

- When latent features are driving the patterns in data.
- For Dimensionality reduction.
- To visualize high-dimensional data.
- To reduce the noise.
- As a pre-processing step to improve the performance of other algorithms.

By identifying which “directions” are most “important,” we can compress or project our data into a smaller space by dropping the “directions” that are the “least important.”

#### Shortcomings of PCA

If the number of variables is large, it becomes hard to interpret the principal components.
PCA is most suitable when variables have a linear relationship among them.
Also, PCA is susceptible to big outliers.

In [1]:
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.decomposition import FastICA

In [2]:
from sklearn.datasets import load_iris
data = pd.read_csv('...')

In [3]:
pca = PCA(n_components=2) #components are the directions
pca.fit(data)


TypeError: float() argument must be a string or a number, not 'Bunch'

In [None]:
pca.components_
pca.transform() #transform data

In [None]:
pca.explained_variance_ratio_ #eigenvalues

***

## Independent Component Analysis (ICA)


ICA is a method for dimensionality reduction similar to PCA in the sense that it takes a set of features and produces a diﬀerent set that is useful in some way.
The motivation behind ICA would be to take the original set of features and try to identify those of them that contribute independently to the dataset, in other words, those with the least correlation to the other features. So it will isolate those most important components. This problem is called Blind Source Isolation.


#### Drawbacks of ICA
ICA cannot uncover non-linear relationships of the dataset. ICA does not tell us anything about the order of independent components or how many of them are relevant.

## PCA OR ICA??

PCA vs ICA
PCA removes correlations, but not higher order dependence.
ICA removes correlations and higher order dependence.
PCA: some components are more important than others (eigenvalues).
ICA: all components are equally important.
PCA: vectors are orthogonal.
ICA vectors are not orthogona

In [None]:
ica = FastICA(n_components=2) # specify num components to keep
ica.fit_transform(data) #transform and fit