### <p style="text-align: right;"> &#9989; Qingxuan Zheng
#### <p style="text-align: right;"> &#9989; Put your group member names here</p>

In [3]:
# imports for the day
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.datasets import fetch_lfw_people
import time
from sklearn.datasets import make_circles
from mpl_toolkits.mplot3d import Axes3D
from sklearn.model_selection import train_test_split

---
## 1. Review of Pre-Class assignment

We'll discussion any questions that came up as a class.

---
## 2. Non-linear svm

If you played with the data a bit or looked around on the web, you should have been able to find a third dimension that would have made separability of the circle data possible.

Let's set the new Z coordinate to:

$$ Z = X^{2} + Y^{2} $$

for all the data in $X$ and $Y$. If you do this with `numpy`, it is a single line of code (no loop required).

<font size=8 color="#009600">&#9998;</font> Do this - Make a 2D circle data set, add the specified Z coordinate and plot it below.

In [4]:
# your answer here
X, y = make_circles(n_samples=100, random_state=123, noise=0.05)
df['z'] = (
    (df['x']) ** 2 +  (df['y']) ** 2  ) 

fig, ax = plt.subplots()

NameError: name 'df' is not defined

### 2.1 SVM on modified data set

Note that we have not added any data to our circle data, we simply created a new dimension **based** on the existing data. In so doing it seems fairly obvious that we can now separate that data. 

**Question:** If we were to run an SVM on the modified data, is the data linearly separable? What dimensionality would the separating element be?

<font size=8 color="#009600">&#9998;</font> If linearly separable, what dimension element is required? I think we need to make a subplot

----
## 3. How to find that "special" dimension

It seems like we just pulled that special Z dimension out of thin air. How is it possible to find such a dimension that might make non-linear data linearly separable? That is what SVM can do. 

The math needed to do this is beyond the scope of this course. But, here is the basic idea:
- we need to define a function $\phi$ that transforms the existing data into a new feature space (a function based on the existing values to generate the new dimension) that would allow us to do better separation.
   - we train and test in that new space, the result of applying $\phi$ into the feature space
- there is a process called <a href="https://en.wikipedia.org/wiki/Kernel_method"> the kernel trick </a> 
    - The kernel trick avoids the explicit mapping that is needed to get linear learning algorithms to learn a nonlinear function or decision boundary. You learned a little about that in the pre-class.
    
Wikipedia describes it well:

> Kernel methods can be thought of as instance-based learners: rather than learning some fixed set of parameters corresponding to the features of their inputs, they instead "remember" the  i-th training example $( x_i , y_i ) $ and learn for it a corresponding weight $w_i$. Prediction for unlabeled inputs, i.e., those not in the training set, is treated by the application of a similarity function **k**, called a kernel, between the unlabeled input $x ′$ and each of the training inputs $x_i$

As you saw in the pre-class, one such useful "kernel" is called the "radial basis function", 
$$ k(x^{i}, x^{j}) = exp( -\frac {\| x^{i} -x^{j}\|^{2}} {2\sigma^{2}}) $$
where
$$ \gamma = \frac{1}{2\sigma} $$ 
making
$$ k(x^{i}, x^{j}) = exp(-\gamma \|x^{i} - x^{j} \| ^{2}) $$
and $\gamma$ is a parameter to be optimized (like `C` in the linear case).

----
## 4. Example using the digits dataset

Let's start with downloading a dataset called "digits" which is included in the sklearn library. That's right, `sklearn` comes with datasets all ready for us to use!

In [None]:
# from sklearn.datasets import fetch_lfw_people, load_digits

sk_data = load_digits();

In [None]:
#Cool slider to browse all of the images.
from ipywidgets import interact
def browse_images(images, labels, categories):
    n = len(images)
    def view_image(i):
        plt.imshow(images[i], cmap=plt.cm.gray_r, interpolation='nearest')
        plt.title('%s' % categories[labels[i]])
        plt.axis('off')
        plt.show()
    interact(view_image, i=(0,n-1))

In [None]:
browse_images(sk_data.images, sk_data.target, sk_data.target_names)

### 4.1 Getting the data

The `sklearn` data comes in a particular format we need to work with. Please take a look at <a href="https://scikit-learn.org/stable/datasets/index.html"> https://scikit-learn.org/stable/datasets/index.html </a> for a quick overview. Now let's inspect the digits arrays to find out what the shapes of the arrays (which can help for plotting the data with matplotlib). **Review the code below and make sure you know what it is doing.**

In [5]:
feature_vectors = sk_data.data
class_labels = sk_data.target
categories = sk_data.target_names

n_samples, n_features = feature_vectors.shape
N, h, w = sk_data.images.shape
n_classes = len(categories)

NameError: name 'sk_data' is not defined

**Question**: Write some code to print out the number of samples, number of features, number of classes, and the shape of the image dimensions:

In [6]:
#Put your answer to the above question here
print(n_samples)
print(n_features)
print(n_classes)
print(N, h, w)

NameError: name 'n_samples' is not defined

**Question**: As a group discuss the difference between the features, samples, and classes.  How do these relate to the shape of the image?  Write down a quick definition of each (the first one has been done for you):

 <font size=8 color="#009600">&#9998;</font>
 
 
1. **n_samples:** Total number of images in the digits dataset. 
2. **n_features:** Total features the dataset have
3. **n_classes:** The number of classes to return
4. **N:**
5. **h:** The height of the image shape
6. **w:** The width of the image shape


### 4.2 Distribution of classes in our data

Let's have a look at the distribution of samples across the target classes:

In [7]:
plt.figure(figsize=(14, 3))

y_unique = np.unique(class_labels)
counts = [(class_labels == i).sum() for i in y_unique]

plt.xticks(y_unique,  categories[y_unique])
locs, labels = plt.xticks()
plt.setp(labels, rotation=45, size=20)
_ = plt.bar(y_unique, counts)

NameError: name 'class_labels' is not defined

<Figure size 1008x216 with 0 Axes>

**Question**: Does this seem like a good set of samples for training our machine learning algorithm? Why?

 <font size=8 color="#009600">&#9998;</font> Do This - I think it is not a good set of samples for training our machine learning algorithm because the size a each numner looks a sightly different.

### 4.3 Train a SVM Classifier based on the training dataset.

Let's split the data into a training set and final testing set as we have done in the past.

We've used `train_test_split` before so do it here now. To stay in sync with the code below, create:
- `train_vectors` and `train_labels` for the training feature vectors and corresponding class labels
- `test_vectors` and `test_labels` for the test feature vectors and labels. 

Do that below:

In [8]:
# your answer here
train_test_split(y_unique)

NameError: name 'y_unique' is not defined

## 4.4 Optimizing hyper-parameters

There are two parameters we might try to optimize to get better results:
- the C parameter of SVM. We discussed this in the previous in-class
- the $\gamma$ parameter of rbf (our radial basis function kernel)

These tuning parameters are often called **hyper-parameters**. These are parameters that affect the algorithm's "learning" process.

### 4.4.1 The C parameter

Here's another way to describe `c`. There are two types of SVM classifiers: hard margin and soft margin. Most SVM implementations are soft margin because "soft" allows for some points to be mis-classified when defining the margin (hard margin does not). The **C** parameter of the `svm.SVC` helps in choosing how "soft" to be:
- small values of C allow for some point misclassification but lower the effect of noisy outliers
- larger values of C push to accomodate even noisy points in favor of higher efficiency

### 4.4.2 The $\gamma$ parameter
The $\gamma$ value is part of the rbf (radial basis function). Intuitively, a small gamma value defines a Gaussian function with a large variance. In this case, two points can be considered similar even if are far from each other. On the other hand, a large gamma value  defines a Gaussian function with a small variance and in this case, two points are considered similar just if they are close to each other. 

### 4.4.3 `GridSearchCV` and choosing hyper-parameters
It can be difficult to search for an optimal combination of a set of hyper-parameters. What is the best combination of $C$ and $\gamma$ for our digits data set? One way to avoid the issue is to test combinations of a fixed set of each hyper-parameter and evaluate which combinations are optimal. In this was we trade time (time to do all the testing) for better results.

Below we import <a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html"> `GridSearchCV` </a> from `sklearn.model.selection`. The idea behind it is pretty simple:
- provide a list of hyper-parameters and, for each hyper-parameter, a set of values you which to test.
- `GridSearchCV` will **exhaustively** search, that is will try **all combinations** of the listed parameters, and based on the **best combination** will return results.

The above terms **exhautively** and **all combinations** are a kind of warning. These mean that considerable computational effort might be required to evaluate which hyper-parameter combinations is "best".

#### How Long
The code below takes about 20 seconds on my rather newish Lenovo laptop. Be patient, might take longer depending on what you are using.

### 4.5 Training the classification model

In [9]:
###############################################################################
# Train a SVM classification model

start = time.time()

#make some temporary variables so you can change this easily
tmp_vectors = train_vectors
tmp_labels = train_labels

print("Fitting the classifier to the training set")
# a dictionary of hyperparameters: key is the name of the parameter, value is a list of values to test
param_grid = {'C': [1e3, 5e3, 1e4, 5e4, 1e5],
              'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], }
# make a classifier by searching over a classifier and the parameter grid
clf = GridSearchCV(SVC(kernel='linear', class_weight='balanced'), param_grid)

# we have a "good" classifier (according to GridSearchCV), how's it look
clf = clf.fit(tmp_vectors, tmp_labels)
print("Best estimator found by grid search:")
print(clf.best_estimator_)

end = time.time()
print("Runtime",end - start)

NameError: name 'train_vectors' is not defined

&#9989; **DO THIS**: Explore the ```clf``` object. What functions does it have access to?  Can you figure out what function you may use to input a unknown feature vector and make a class prediction?

In [10]:
## DO THIS, put your exploration code here.


### 4.6. Show the results of the classification on the testing dataset.

In [11]:
###############################################################################
# Quantitative evaluation of the model quality on the test set

#make some temporary variables so you can change this easily
predict_vectors = test_vectors
true_labels = test_labels

print("Predicting names on the test set")
pred_labels = clf.predict(predict_vectors)

print(classification_report(true_labels, pred_labels))
print(confusion_matrix(true_labels, pred_labels, labels=range(n_classes)))


NameError: name 'test_vectors' is not defined

In [12]:
def plot_gallery(images, true_titles, pred_titles, h, w, n_row=5, n_col=5):
    """Helper function to plot a gallery of portraits"""
    plt.figure(figsize=(1.8 * n_col, 2.4 * n_row))
    plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)
    for i in range(n_row * n_col):
        plt.subplot(n_row, n_col, i + 1)
        plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray_r)
        plt.title('Pred='+str(categories[pred_titles[i]]), size=9)
        plt.xlabel('Actual='+str(categories[true_titles[i]]), size=9)
        plt.xticks(())
        plt.yticks(())

plot_gallery(test_vectors, test_labels, pred_labels, h,w)

NameError: name 'test_vectors' is not defined

**Question:** How well is the classifier doing with the digits dataset? Comment on what information the classification report and confusion matrix provide you.

<font size=8 color="#009600">&#9998;</font> The classifier doing great with the digits dataset. The classification report and confusion matrix provide me how the precision and actual number will match.

**Questions:** What if you created a new random training set from the images using the same fraction of images? What if you just used all of the data -- does it work better or worse? Why?

<font size=8 color="#009600">&#9998;</font> If you created a new random training set from the images using the same fraction of images, I think it will come up a different confusion matrix. If I just used all of the data, I think it will work worse because it may make more error on it.

---
## 5. Face Recognition

Now that we have completed the example for digits dataset. Lets do it again with some faces. Fortunately, scikit-learn comes with a face dataset in exactly the same format as the digits dataset.  This means we should just be able to swap out one with the other. Here is the code for importing the faces data.  This code ensures there are at least 50 faces per person and they are resized to 40%.  

Make sure you go back to the top and our imports section and uncomment the `fetch_lfw_people` part.

```sk_data = fetch_lfw_people(min_faces_per_person=50, resize=0.4)```

&#9989; **DO THIS**:  Repeat the entire process using the face database imported with the command shown above. Answer the following questions.

**Note: you should not need to update any of the provided code as it all hinges on `sk_data`.**

In [14]:
# copy the code above with the modified data set and paste it here

sk_data = fetch_lfw_people(min_faces_per_person=50, resize=0.4)
sk_data


{'data': array([[ 83.      ,  91.666664, 112.      , ..., 112.333336, 146.      ,
         124.333336],
        [ 38.666668,  69.666664,  84.666664, ...,  67.      ,  88.333336,
         115.      ],
        [ 82.666664,  58.      ,  59.      , ..., 171.33333 ,  90.333336,
          82.333336],
        ...,
        [ 50.333332,  65.666664,  88.      , ..., 197.      , 179.33333 ,
         166.33333 ],
        [138.      , 158.66667 , 169.66667 , ..., 232.66667 , 228.33333 ,
         226.      ],
        [ 30.      ,  27.      ,  32.666668, ...,  35.      ,  35.333332,
          61.      ]], dtype=float32),
 'images': array([[[ 83.      ,  91.666664, 112.      , ...,  54.333332,
           62.      ,  76.333336],
         [ 89.      ,  95.      , 100.      , ...,  51.666668,
           47.666668,  69.333336],
         [ 93.      ,  93.666664,  85.333336, ...,  62.      ,
           47.666668,  56.      ],
         ...,
         [ 48.666668,  46.666668,  45.666668, ..., 177.66667 ,
     

**Question:** How long did it take to train the face recognition classifier?

<font size=8 color="#009600">&#9998;</font> From the result that I run out, I couln't find out hoe long it take to train the face recognition classifier

**Question:** How well did the SVM algorithm work on the face recognition problem?  Can you think of real world applications where this level of face recognition may be acceptable?

<font size=8 color="#009600">&#9998;</font> 

**Question:** Why is the face recognition not working as well?  

<font size=8 color="#009600">&#9998;</font> 

**Question:** Give some example science problems where this type of machine learning classification (SVM) may be used.

<font size=8 color="#009600">&#9998;</font> 

-----
### Congratulations, we're done!

Now, you just need to submit this assignment by uploading it to the course <a href="https://d2l.msu.edu/">Desire2Learn</a> web page for today's submission folder (Don't forget to add your names in the first cell).


&#169; Copyright Michigan State University Board of Trustees