# Random Projection 

![r_p_1.png](pics/r_p_1.png)

![r_p_1.png](pics/r_p_2.png)

![r_p_1.png](pics/r_p_3.png)

![r_p_1.png](pics/r_p_4.png)

Paper: [Random projection in dimensionality reduction: Applications to image and text data](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.8124&rep=rep1&type=pdf)

This paper examines using Random Projection to reduce the dimensionality of image and text data. It shows how Random Projection proves to be a computationally simple method of dimensionality reduction, while still preserving the similarities of data vectors to a high degree. The paper shows this on real-world datasets including noisy and noiseless images of natural scenes, and text documents from a newsgroup corpus.

Paper: [Random Projections for k-means Clustering](https://papers.nips.cc/paper/3901-random-projections-for-k-means-clustering.pdf)

This paper uses Random Projection as an efficient dimensionality reduction step before conducting k-means clustering on a dataset of 400 face images of dimensions 64 × 64.

![r_p_1.png](pics/r_p_5.png)

![r_p_1.png](pics/r_p_6.png)


# Independent Component Analysis

A method similar to PCA is ICA, it takes a set of features and produces a different set that is useful in some way. But it's different in that PCA works to maximize variance, ICA assumes that the features are mixtures of independent sources and it tries to isolate these independent sources that are mixed in this dataset. 

Let's look at the example of cocktail party: 

![i_c_a_1.png](pics/i_c_a_1.png)
![i_c_a_1.png](pics/i_c_a_2.png)
![i_c_a_1.png](pics/i_c_a_3.png)
![i_c_a_1.png](pics/i_c_a_4.png)

This is a type of problem called **Blind source separation**

## ICA Algorithm

![ica_alg_1.png](pics/ica_alg_1.png)

![ica_alg_1.png](pics/ica_alg_2.png)
The goal is to find the best W. 

Paper: "[Independent component analysis: algorithms and applications](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.322.679&rep=rep1&type=pdf)" (pdf)

![ica_alg_1.png](pics/ica_alg_3.png)

### Assumptions

1. The components are statistically independent
2. Components have non-Gaussian distributions (very important here)

Non-Gaussian distribution of components is the key to estimating ICA and without it, we'll not be able to calculate, we'll not be able to store the original signals if they were Gaussian. 

So building from here, 
1. the Central Limit Theory tells us that the distribution of a sum of independent variables tends towards a Gaussian distribution. 
2. So, knowing that, we take W to be a matrix that maximizes the non-Gaussianity of W transpose X. In this case we need to calculate non-Gaussianity because that is the term the whole algorithm tries to maximize. So, what is one way to calculate non-Gaussianity?
3. The term $w^{+} = E\{xg(w^{T}x)\} - E\{g´(w^{T}x)\}w$ is an approximation of something called **negentropy** (it comes from Information Theory), you don't need to know all these details, as long as you know the assumptions of non-Gaussianity. 

![ica_quiz_1.png](pics/ica_quiz_1.png)
![ica_quiz_1.png](pics/ica_quiz_2.png)

## Applications

Paper: [Independent Component Analysis of Electroencephalographic Data](http://papers.nips.cc/paper/1091-independent-component-analysis-of-electroencephalographic-data.pdf)

This paper is an example of how ICA is used to transform EEG scan data to do blind source separation. For example, on the left are the readings of 14 channels from an EEG scan that lasted 4.5 seconds. On the right are the independent components extracted from that dataset:

![eeg-ica.png](pics/eeg-ica.png)

Paper: [Applying Independent Component Analysis to Factor Model in Finance](https://pdfs.semanticscholar.org/a34b/e08a20eba7523600203a32abb026a8dd85a3.pdf) [PDF]