# Dimensionality Reduction Exercise

In this exercise, you will be asked to build several Machine Learning models, while understanding the value of PCA dimensionality reduction. Make sure your code is readable, functional, documented and that you give elaborate explanations and some plots to go with your code.

## Load the MNIST dataset attached to this exercise (it is already divided to train and test sets, load both)

In [None]:
# your code here
import pandas as pd
df_train = pd.read_csv('/Users/danfinel/Downloads/mnist_train.csv')
df_test = pd.read_csv('/Users/danfinel/Downloads/mnist_test.csv')

In [None]:
df_train.shape,df_test.shape

## 1. Build a classifier of your choice on the given data (your features are the pixels), and evaluate it. Elaborate on the performance of your model.

In [3]:
X_train = df_train.drop(columns = ['label'])
y_train = df_train.label
X_test = df_test.drop(columns = ['label'])
y_test = df_test.label

In [4]:
# your code here
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
dt = DecisionTreeClassifier()
dt.fit(X_train,y_train)
y_pred_rf = dt.predict(X_test)
print(classification_report(y_test,y_pred_rf))

              precision    recall  f1-score   support

           0       0.91      0.93      0.92       980
           1       0.96      0.96      0.96      1135
           2       0.86      0.86      0.86      1032
           3       0.82      0.86      0.84      1010
           4       0.88      0.87      0.87       982
           5       0.84      0.82      0.83       892
           6       0.90      0.89      0.90       958
           7       0.92      0.90      0.91      1028
           8       0.82      0.80      0.81       974
           9       0.84      0.85      0.85      1009

    accuracy                           0.88     10000
   macro avg       0.88      0.88      0.88     10000
weighted avg       0.88      0.88      0.88     10000



Based on the metrics we get, the model performance is great with an accuracy of 0.88 and precison and recall for each value of the target variable > 0.8

## 2. Perform a PCA dimensionality reduction on the data, and re-train the same model on the new top k PCA-ed features. Evaluate the new model and elaborate on the performance of your model, and compare it to the performance of model without PCA.
## The value of k is for you to choose, but it must be pretty small.  Try some different numbers, and explain why you chose that number.

In [5]:
# your code here
from sklearn.decomposition import PCA
ks = [3,4,5,6,7,8,9,10]
best_acc = 0
best_k = 0
for k in ks:
  pca = PCA(n_components = k)
  pca.fit(X_train,y_train)
  X_train_transformed = pca.transform(X_train)
  X_test_transformed = pca.transform(X_test)

  dt2 = DecisionTreeClassifier()
  dt2.fit(X_train_transformed,y_train)
  y_pred_dt_pca = dt2.predict(X_test_transformed)
  acc = dt2.score(X_test_transformed,y_test)

  if acc > best_acc:
    best_acc = acc
    best_k = k

In [6]:
best_k,best_acc

(10, 0.8233)

Based on the accuracy we get, we find that k = 10 seems to have the best accuracy for k between 3 and 10.

In [7]:
print(classification_report(y_test,y_pred_dt_pca))

              precision    recall  f1-score   support

           0       0.88      0.88      0.88       980
           1       0.96      0.97      0.96      1135
           2       0.86      0.84      0.85      1032
           3       0.77      0.79      0.78      1010
           4       0.77      0.77      0.77       982
           5       0.74      0.78      0.76       892
           6       0.89      0.89      0.89       958
           7       0.86      0.84      0.85      1028
           8       0.76      0.73      0.75       974
           9       0.72      0.72      0.72      1009

    accuracy                           0.82     10000
   macro avg       0.82      0.82      0.82     10000
weighted avg       0.82      0.82      0.82     10000



The result we get are much lower than the model without PCA. The accuracy is lower and most ot recalls and precisions are lower.

## 3. Compare the model metrics that you got from question 2, to a model with random subset of regular features:
- Use the same number of features k as you used in question 2.
- The actual features used is full regular pixel features without PCA.  
- But instead of using all such 784 features, use a random subset of size k of features from question 2.

Elaborate on your findings.

In [8]:
# your code here
import numpy as np
import random

random_indexes = np.sort(random.sample(range(784),best_k))
X_train_random = X_train.loc[:,X_train.columns[random_indexes]]
X_test_random = X_test.loc[:,X_test.columns[random_indexes]]
dt3 = DecisionTreeClassifier()
dt3.fit(X_train_random,y_train)
y_pred_random_dt = dt3.predict(X_test_random)
print(classification_report(y_test,y_pred_random_dt))


              precision    recall  f1-score   support

           0       0.53      0.50      0.51       980
           1       0.27      0.92      0.42      1135
           2       0.34      0.25      0.29      1032
           3       0.38      0.46      0.41      1010
           4       0.17      0.06      0.09       982
           5       0.26      0.09      0.14       892
           6       0.23      0.08      0.12       958
           7       0.42      0.52      0.46      1028
           8       0.20      0.10      0.14       974
           9       0.33      0.16      0.22      1009

    accuracy                           0.33     10000
   macro avg       0.31      0.31      0.28     10000
weighted avg       0.31      0.33      0.29     10000



We see that the result we get are very worse when we take random features. It shows the power of PCA for choosing the best features possible and the best coefficient of linearity for those features (on q3 we took all coefficient = 1 for the 10 features).

However, as we are on a supervised learning problem, PCA is less efficient than keeping all the features with no transformation.