# Homework 5
## Due Nov. 20

## 1. Principal Components Analysis (40 pts)

In class, we explored the concept of eigendigits, which were a more information-rich basis for representing the handwritten digits of the MNIST dataset (see course lecture notes 12).  In fact, a similar procedure can be performed for any standardized images dataset.  In this problem, we will find so-called *eigenfaces*, which are pretty much as they sound: the principal components of a face dataset.  The faces that we will use can be found in the so-called 'labelled faces in the wild' data set.  This can be downloaded via the scikit-learn module as follows:

In [2]:
import matplotlib.pyplot as plt
%matplotlib notebook
import numpy as np
from sklearn.datasets import fetch_lfw_people

# Download labelled faces in the wild (only examples for which there are more than 50 examples)
lfw = fetch_lfw_people(min_faces_per_person=50, resize=0.7)

# Interrogate the data for the size of the images (h,w) 
m, h, w = lfw.images.shape

# For our purposes, as in MNIST, we will use a flattened version of the pixels
X = lfw.data
n = X.shape[1]

y = lfw.target


Since the second problem in this homework deals with classification, let's split the LFW data into a training and test dataset.

In [4]:
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33)

### 1.1
Using either your own implementation or the implementation given by scikit-learn, **perform a principal components analysis on the LFW data, retaining sufficient components to explain 95% of the total data variance (10pts).**  To prove that your PCA is successfully capturing this level of variability, **Generate a scree plot showing the cumulative explained variance as a function of number of principal components (10pts).**

In [7]:
from sklearn.decomposition import PCA
from sklearn import decomposition

fig,axs = plt.subplots(nrows=5,ncols=5)
for r in axs:
    for ax in r:
        ax.imshow(X_train[np.random.randint(len(X_train)),:].reshape((h,w)))

        
plt.show()

pca = decomposition.PCA(0.95,copy=True,whiten=False)
pca.fit(X_train)
Z = pca.transform(X_train)

fig,axs = plt.subplots(nrows=5,ncols=2)
counter = 0
for r in axs:
    for ax in r:
        ax.imshow(pca.components_[counter,:].reshape((h,w)))
        counter+=1        
plt.show()

cumulative_variance_ratio = np.cumsum(pca.explained_variance_ratio_)

print (cumulative_variance_ratio)



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

[ 0.20977249  0.34458441  0.41215873  0.46787429  0.51248666  0.53949986
  0.56300579  0.58343786  0.60251493  0.61995609  0.63499762  0.64936303
  0.6606122   0.67134488  0.68175483  0.69103994  0.69983778  0.70834543
  0.71628476  0.72371864  0.73057107  0.73712632  0.74290777  0.7483859
  0.75384436  0.75892338  0.76387141  0.76870204  0.77328332  0.77746646
  0.78144377  0.78520031  0.7888461   0.79228639  0.79564337  0.79895336
  0.80217638  0.8052799   0.80827576  0.81120962  0.81411982  0.81689812
  0.81955747  0.82217949  0.82468596  0.82716425  0.82959382  0.83194376
  0.83421351  0.83646602  0.83867762  0.84084912  0.84297974  0.84506916
  0.84706844  0.84906119  0.85099541  0.85291127  0.8547644   0.85658583
  0.85836629  0.86010503  0.86182637  0.86344511  0.86503925  0.86659414
  0.86812749  0.8696501   0.87114785  0.87262962  0.87408072  0.87550791
  0.87690569  0.87827185  0.8795989   0.88091544  0.88219579  0.88346729
  0.88470207  0.88593243  0.8871369   0.88830793  0.

In [8]:

plt.plot(1-cumulative_variance_ratio)
plt.ylabel('Unexplained variance remaining')
plt.show()
174/(h*w)

<IPython.core.display.Javascript object>

0.03076923076923077

### 1.2
**Visualize the first five eigenfaces by reshaping the resulting principal components into images and plotting them (10pts).**

In [9]:
fig,axs = plt.subplots(nrows=5,ncols=2)
fig.set_size_inches(6,15)

data_index = 1044
for i,n_components in enumerate([1,5,10,50,150]):
    X_reconstructed = 0
    counter = 0 
    for c,l in zip(pca.components_,X[data_index]):
        if counter<n_components:
            X_reconstructed += c*l
        counter += 1
    X_reconstructed += pca.mean_
    X_reconstructed = X_reconstructed.reshape((h,w))

    axs[i,0].imshow(X_train[data_index,:].reshape((h,w)))
    axs[i,1].imshow(X_reconstructed)
    axs[i,1].set_title("L="+str(n_components))
plt.show()

<IPython.core.display.Javascript object>

### 1.3 (Turn in the following question on Moodle)
(10pts) Consider the following two datasets (X_1 and X_2), each with three dimensions.  How many principal components do you expect each to have?  How do you know?

In [25]:
X_1 = np.array([[ 0.24658525,  0.846718  ,  0.29263623],
       [ 1.94365644, -0.78759333,  0.81430956],
       [ 0.54530612,  1.33540717,  0.53973449],
       [ 0.08287299,  1.41172682,  0.32378186],
       [ 1.16505735, -0.76913387,  0.4287019 ],
       [ 0.80324671, -0.3969266 ,  0.32223804],
       [ 0.30891776, -0.62816207,  0.02882647],
       [ 0.68643482,  0.95395446,  0.5340083 ],
       [-1.1862806 ,  1.80433744, -0.23227281],
       [ 1.31814933, -0.86135592,  0.48680348],
       [-0.01355747,  0.52411544,  0.09804436],
       [-0.94016758, -1.00530368, -0.67114452],
       [-1.53088917, -0.79227508, -0.9238996 ],
       [ 0.87683622, -1.29639414,  0.17913928],
       [-0.34180964, -0.21053314, -0.21301145],
       [-0.40673884, -0.89787012, -0.38294344],
       [-0.74792211,  1.42602549, -0.08875596],
       [-0.10994822, -1.34930993, -0.3248361 ],
       [-0.09104714, -0.87550541, -0.22062465],
       [-0.18231387,  0.51312677,  0.01146842],
       [ 1.48119305, -0.77899653,  0.58479722],
       [ 0.67944609,  0.31732884,  0.40318881],
       [ 0.8137745 ,  2.09032765,  0.82495278],
       [-0.81678612,  0.9302194 , -0.22234918],
       [ 1.32824051,  0.88054246,  0.84022875]])

X_2 = np.array([[ 9.97897650e-01,  2.99369295e+00,  4.49053943e-01],
       [-9.19396971e-01, -2.75819091e+00, -4.13728637e-01],
       [ 2.63408733e-01,  7.90226199e-01,  1.18533930e-01],
       [ 1.23229118e-01,  3.69687353e-01,  5.54531030e-02],
       [ 1.00365433e+00,  3.01096300e+00,  4.51644450e-01],
       [-9.73346396e-01, -2.92003919e+00, -4.38005878e-01],
       [-2.48058203e-01, -7.44174610e-01, -1.11626192e-01],
       [-1.14257767e+00, -3.42773300e+00, -5.14159949e-01],
       [-4.51139403e-01, -1.35341821e+00, -2.03012732e-01],
       [-2.01440713e-01, -6.04322138e-01, -9.06483206e-02],
       [ 1.15492027e+00,  3.46476081e+00,  5.19714121e-01],
       [ 6.86908285e-02,  2.06072486e-01,  3.09108728e-02],
       [-4.73749885e-01, -1.42124965e+00, -2.13187448e-01],
       [ 1.11404171e+00,  3.34212513e+00,  5.01318770e-01],
       [-8.13230322e-01, -2.43969096e+00, -3.65953645e-01],
       [-8.70193912e-01, -2.61058174e+00, -3.91587261e-01],
       [ 6.55592608e-01,  1.96677783e+00,  2.95016674e-01],
       [ 2.88671096e-02,  8.66013288e-02,  1.29901993e-02],
       [ 6.64698327e-01,  1.99409498e+00,  2.99114247e-01],
       [ 4.83556414e-01,  1.45066924e+00,  2.17600386e-01],
       [ 2.74450530e-01,  8.23351591e-01,  1.23502739e-01],
       [-2.38147337e-03, -7.14442010e-03, -1.07166302e-03],
       [ 1.39721186e+00,  4.19163557e+00,  6.28745336e-01],
       [-1.44265778e+00, -4.32797333e+00, -6.49196000e-01],
       [-7.39010087e-01, -2.21703026e+00, -3.32554539e-01]])

from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot(*X_1.T,'k.')

from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot(*X_2.T,'k.')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

[<mpl_toolkits.mplot3d.art3d.Line3D at 0x17a43143f28>]

## 2. Logistic (actually Softmax) Regression (30pts)

To explore the use of logistic regression, we will again use labelled faces in the wild.  For this problem, fit the PCA-transformed training data using a logistic regression model.  To begin with you'll need to transform both the test and training datasets into the PCA basis.

In [11]:
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)

Now that we have our transformed features, we can implement logistic regression.  Rather than come up with our own, let's use the sklearn implementation, which is quite good:

In [27]:
from sklearn.linear_model import LogisticRegression

### 2.1
Before you begin, there are a few keyword arguments that you will want to provide to this function.  First, you'll want to give it the command multi_class='multinomial'.  This causes the classifier to perform true softmax regression, rather than a strange n-fold 1-vs-many binary logistic regression scheme.  Second, you should set the keyword argument to solver='lbfgs'.  **Fit a logistic regression model to your training data** 

In [24]:

logreg = LogisticRegression(penalty = 'l2', solver= 'lbfgs', multi_class= 'multinomial').fit(X_train_pca,y_train)



### 2.2
**After fitting your model, classify the test set, and print a confusion matrix and the overall accuracy**

In [23]:
y_test_hats = logreg.predict(X_test_pca)

ConfusionMatrix = np.zeros(shape = (12,12))
for i in range(len(y_test)):
    k = y_test[i]
    l = y_test_hats[i]
    ConfusionMatrix[k,l] +=1
print(ConfusionMatrix)    

t = np.trace(ConfusionMatrix)
s = ConfusionMatrix.sum()
accuracy = t/s 
print(accuracy)

[[  14.    4.    0.    4.    0.    0.    0.    0.    1.    0.    1.    0.]
 [   2.   57.    4.    3.    0.    1.    0.    1.    0.    1.    4.    3.]
 [   3.    1.   27.    6.    0.    0.    1.    3.    0.    0.    1.    0.]
 [   1.    7.    6.  135.    6.    4.    0.    2.    1.    2.    2.    5.]
 [   2.    1.    1.    2.   30.    0.    0.    1.    0.    0.    0.    3.]
 [   1.    0.    0.    2.    0.   17.    0.    0.    0.    0.    1.    3.]
 [   0.    1.    2.    2.    1.    1.    7.    1.    0.    0.    0.    0.]
 [   0.    2.    1.    0.    0.    0.    0.   12.    0.    0.    0.    1.]
 [   0.    1.    1.    2.    0.    0.    0.    0.   10.    1.    0.    3.]
 [   0.    1.    0.    0.    0.    1.    1.    0.    0.   18.    0.    1.]
 [   0.    2.    0.    2.    1.    1.    0.    0.    0.    0.   13.    0.]
 [   1.    2.    0.    2.    3.    1.    2.    1.    0.    0.    2.   34.]]
0.726213592233


### 2.3
By default, sklearn applies regularization to this problem, penalizing large parameter values.  You can control the degree of regularization by using the C=1e-4 keyword argument (or C=1e-6 or whatever).  However, it is not always clear what the best regularization should be.  To deal with this problem, sklearn offers the function LogisticRegressionCV, which automatically runs [k-fold cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)#k-fold_cross-validation) on a user-specified range of regularization parameters, and selects the one which minimizes the classification error.  Otherwise, it works just like LogisticRegression.  ***Use this function to determine the optimal value of the regularization parameter C, and report your classification accuracy with this new value.  Does regularization appreciably improve your classification accuracy? *** 

*HINT 1: A good range to check over is from $C=1$ to $C=10^{-11}$.  You'll want the regularization values that you test to be distributed logarithmically, e.g. via the numpy logspace command.*  

In [31]:
from sklearn.linear_model import LogisticRegressionCV
CS = np.logspace(0.0 , -11.0, num = 20)
logregcv = LogisticRegressionCV(CS, cv =5, multi_class = 'multinomial', penalty = 'l2', solver= 'lbfgs' ).fit(X_train_pca,y_train)

y_test_hatscv = logregcv.predict(X_test_pca)

ConfusionMatrixcv = np.zeros(shape = (12,12))
for i in range(len(y_test)):
    k = y_test[i]
    l = y_test_hatscv[i]
    ConfusionMatrixcv[k,l] +=1
print(ConfusionMatrixcv)    

t = np.trace(ConfusionMatrixcv)
s = ConfusionMatrixcv.sum()
accuracycv = t/s 
print(accuracycv)

[[  14.    4.    1.    2.    0.    1.    0.    1.    0.    0.    1.    0.]
 [   3.   64.    0.    5.    0.    1.    0.    0.    0.    0.    1.    2.]
 [   1.    1.   28.    9.    0.    0.    0.    2.    0.    0.    0.    1.]
 [   0.    5.    1.  158.    6.    1.    0.    0.    0.    0.    0.    0.]
 [   0.    1.    1.    3.   27.    0.    0.    1.    1.    0.    1.    5.]
 [   0.    1.    0.    1.    1.   17.    1.    0.    0.    0.    1.    2.]
 [   0.    0.    3.    4.    0.    0.    6.    1.    0.    0.    0.    1.]
 [   0.    2.    1.    0.    0.    0.    1.   11.    0.    0.    0.    1.]
 [   0.    0.    0.    2.    2.    0.    0.    0.    8.    1.    0.    5.]
 [   0.    1.    0.    1.    1.    0.    1.    0.    0.   18.    0.    0.]
 [   0.    1.    0.    1.    0.    0.    0.    0.    0.    0.   17.    0.]
 [   0.    2.    0.    4.    3.    0.    1.    0.    0.    0.    0.   38.]]
0.788349514563


## 3. This time in color (20 pts)(GRAD STUDENTS ONLY)

In the above two problems, we ignored one of the feature dimensions: color.  In fact, every element of the LFW dataset is a color image (previously we averaged the bands).  Rerun the above process of performing a PCA and classifying via logistic regression, but this time import the data using

In [36]:
# Download labelled faces in the wild (only examples for which there are more than 50 examples)
lfw = fetch_lfw_people(min_faces_per_person=50, resize=0.7, color=True)
# Interrogate the data for the size of the images (h,w) 




You will have to modify your code to account for data with different dimensions.  You will also have to think a bit about how to display principal component arrays, since when displaying a 3-band image matplotlib will expect 8 bit integer arrays.  **Can you achieve better classification accuracy using the color dataset?**

In [50]:
# Interrogate the data for the size of the images (h,w,c) 
m, h, w, c = lfw.images.shape

# For our purposes, as in MNIST, we will use a flattened version of the pixels
X = lfw.data
n = X.shape[1]
y = lfw.target

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33)

from sklearn.decomposition import PCA
from sklearn import decomposition

fig,axs = plt.subplots(nrows=5,ncols=5)
for r in axs:
    for ax in r:
        ax.imshow(X_train[np.random.randint(len(X_train)),:].reshape((h,w,c)))

        
plt.show()

pca = decomposition.PCA(0.95,copy=True,whiten=False)
pca.fit(X_train)
Z = pca.transform(X_train)

fig,axs = plt.subplots(nrows=5,ncols=2)
counter = 0
for r in axs:
    for ax in r:
        ax.imshow(pca.components_[counter,:].reshape((h,w,c)))
        counter+=1        
plt.show()

cumulative_variance_ratio = np.cumsum(pca.explained_variance_ratio_)

print (cumulative_variance_ratio)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

[ 0.19519595  0.31812741  0.38464434  0.44057202  0.48215182  0.50925687
  0.53488387  0.55636507  0.57545393  0.59338114  0.6101467   0.62429547
  0.63757851  0.64822876  0.65874113  0.66815735  0.67727824  0.68570717
  0.69347712  0.70105566  0.70833343  0.71520958  0.72144853  0.72718032
  0.73280815  0.73804255  0.74300588  0.74773518  0.75241103  0.75699617
  0.76136917  0.76557619  0.76954148  0.77346366  0.77718992  0.7806983
  0.78412757  0.78744081  0.79059274  0.79370042  0.79670504  0.79959793
  0.80232749  0.80504131  0.80773403  0.81037447  0.81290669  0.81539964
  0.81781229  0.82019137  0.82250354  0.82475442  0.82695039  0.82909387
  0.83120654  0.83328204  0.83529406  0.83728488  0.83923913  0.84115741
  0.8430152   0.84481253  0.84656468  0.84829367  0.85000745  0.85166232
  0.8532413   0.85480106  0.85634877  0.85787665  0.85937906  0.86084667
  0.86230252  0.86374123  0.86514518  0.86648291  0.86781663  0.86912465
  0.87042668  0.8717194   0.87298177  0.87423323  0.

In [54]:
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)


from sklearn.linear_model import LogisticRegressionCV
CS = np.logspace(0.0 , -11.0, num = 20)
logregcv = LogisticRegressionCV(CS, cv =5, multi_class = 'multinomial', penalty = 'l2', solver= 'lbfgs' ).fit(X_train_pca,y_train)

y_test_hatscv = logregcv.predict(X_test_pca)

ConfusionMatrixcv = np.zeros(shape = (12,12))
for i in range(len(y_test)):
    k = y_test[i]
    l = y_test_hatscv[i]
    ConfusionMatrixcv[k,l] +=1
print(ConfusionMatrixcv)    

t = np.trace(ConfusionMatrixcv)
s = ConfusionMatrixcv.sum()
accuracycv = t/s 
print(accuracycv)

[[  17.    1.    2.    1.    0.    1.    0.    0.    0.    0.    1.    0.]
 [   4.   63.    0.    2.    0.    2.    0.    0.    0.    0.    0.    1.]
 [   0.    1.   32.    4.    3.    1.    0.    2.    0.    0.    1.    0.]
 [   0.    5.    2.  174.    3.    0.    0.    0.    0.    0.    0.    1.]
 [   0.    1.    1.    5.   29.    0.    1.    1.    1.    1.    0.    0.]
 [   0.    0.    0.    3.    1.   19.    0.    0.    0.    0.    1.    0.]
 [   1.    1.    1.    1.    1.    0.   11.    0.    0.    0.    1.    1.]
 [   1.    0.    0.    0.    0.    0.    0.   15.    0.    0.    0.    1.]
 [   0.    1.    0.    1.    1.    0.    0.    0.   10.    0.    0.    3.]
 [   0.    0.    0.    2.    0.    0.    0.    0.    1.   13.    0.    0.]
 [   0.    2.    1.    0.    0.    0.    0.    0.    0.    0.   11.    0.]
 [   1.    1.    1.    2.    1.    0.    1.    0.    2.    0.    1.   36.]]
0.834951456311
