## 1. What are the main motivations for reducing a dataset's dimensionality? What are the main drawbacks?

Make it easier for the next models to work on lower dimension datasets, increasing speed, and avoiding overfitting resulting from high dimension datasets.

The main drawbacks are loosing a bit of information, result of reducing variance for example; this also leads to loosing information when recovering the initial dataset in the same proportion the variance was reduced. 

## 2. What is the curse of dimensionality?

It is the name given to the problems of having several dimensions in our dataset. These problems are:

1. Exponential growth in data volumen. As number of dimensions increase, we need much more data to represent the same information.
2. Data sparsity. As dimensions increase, the data density dicreases, making it harder for models to find patterns.
3. Computational cost. As dimensions increase, we need different algorithms to tackle these new dimensions.

## 3. Once a dataset's dimensionality has been reduced, is it possible to reverse the operation? If so, how? If not, why?

Yes, the operation can be reversed. For example, we execute a dot product, and so, we can reverse it by making the dot product of the transposed matrix. We will loose the same variance the selected hyperplane lets us reduce in the first operation

## 4. Can PCA be used to reduce the dimensionality of a highly non linear dataset?

Yes, it can, by removing useless dimensions. If there are no useless dimensions, we end up losing too much information by using PCA, because we will be squashing the information and not "unrolling" it, as it happens with the swiss roll.

## 5. Suppose you perform PCA on a 1,000-dimensional dataset, setting the explained variance ratio to 95%. How many dimensions will the resulting dataset have?

It depends on the data. We must choose the number of dimensions that add up to a sufficient variance, in this case 95%. 

Example:

In [1]:
# Import np and PCA from sklearn

import numpy as np
from sklearn.decomposition import PCA

# Create a numpy array of 1000 dimensions, with a variance of 0.7
data = np.random.normal(0, 0.7, (1000, 1000))

pca = PCA()
pca.fit(data)
cumsum = np.cumsum(pca.explained_variance_ratio_)

min_variance = 0.95
d = np.argmax(cumsum >= min_variance) + 1
print(d)

# Plot the elbow graph with a tick at the point where the variance is 95%
import matplotlib.pyplot as plt
plt.plot(cumsum)
plt.xlabel('Dimensions')
plt.ylabel('Explained Variance')
plt.title('Elbow Graph')
plt.axhline(y=min_variance, color='r', linestyle='--')
plt.axvline(x=d, color='r', linestyle='--')

plt.show()

## 6. In what cases would you use vanilla PCA, Incremental PCA, Randomized PCA, or Kernel PCA?

Incremental PCA is used when the whole dataset does not fit into memory; Randomized PCA can be used to make the computation faster by finding the first principal components by approximation; and Kernel PCA for when you need unsupervised learning and multiples approaches by using different kernels.

## 7. How can you evaluate the performance of a dimensionality reduction algorithm on your dataset?

By measuring the reconstruction error for example by making a reverse transformation. 
If we are using dim-red as a preprocessing step, we can measure the model evaluation used after the dim-red.

## 8. Does it make any sense to chain two different dimensionality reduction algorithms?

Yes, it makes sense. We can use PCA first to quickly get rid of useless dimensions, and the LLE to slowly reduce dimensions. This can be done too by using only LLE.



## 9. Load the MNIST dataset (introduced in Chapter 3), and split it into a training set and a test set (take the first 60,000 instances for training, and the remaining 10,000 for testing). Train a Random Forest classifier on the dataset and time how long it takes, then evaluate the resulting model on the test set. Next, use PCA to reduce the dataset's dimensionality, with an explained variance ratio of 95%. Train a new Random Forest classifier on the reduced dataset and see how long it takes. Was training much faster? Next, evaluate the classifier on the test set: how does it compare to the previous classifier?

## 10. Use t-SNE to reduce the MNIST dataset down to two dimensions and plot the result using Matplotlib. You can use a scatterplot using 10 different colors to represent each image's target class. Alternatively, you can write colored digits at the location of each instance, or even plot scaled-down versions of the digit images themselves (if you plot all digits, the visualization will be too cluttered, so you should either draw a random sample or plot an instance only if no other instance has already been plotted at a close distance). You should obtain a nice visualization of the MNIST dataset in two dimensions. Try using other dimensionality reduction algorithms such as PCA, LLE, or MDS and compare the resulting visualizations.
