# Homework 10: Evaluating our Gesture Recognition NNs 🕸

Name: Chenye Qi

Student ID: 475337

Collaborators: 


## Instructions

In our _last_ homework (woohoo!), we will be analyzing and evaluating the gesture recognition data and models created in `Lab10`. This is a great opportunity to recap the **Data Science workflow** with all its major aspects: 

- exploratory data analysis (EDA) and data profiling
- machine learning workkflow
- training, validation, testing data
- model comparison
- presenting results (creating plot)

It will be extremely helpful to review **Lab 10 (Gesture Recognition with Neural Networks)** first.

In general, you should feel free to import any package that we have previously used in class. Ensure that all plots have the necessary components that a plot should have (e.g. axes labels, a title, a legend).

Furthermore, in addition to recording your collaborators on this homework, please also remember to cite/indicate all external sources used when finishing this assignment. This includes peers, TAs, and links to online sources. Note that these citations will not free you from your obligation to submit your _own_ code and write-ups, however, they will be taken into account during the grading and regrading process.

### Submission instructions
* Submit this python notebook including your answers in the code cells as homework submission.
* **Feel free to add as many cells as you need to** — just make sure you don't change what we gave you. 
* **Does it spark joy?** Note that you will be partially graded on the presentation (_cleanliness, clarity, comments_) of your notebook so make sure you [Marie Kondo](https://lifehacker.com/marie-kondo-is-not-a-verb-1833373654) your notebook before submitting it.

## 1. Introduction

The data needed for this assignemnt can be found [here](https://wustl.box.com/s/q8mnl1o2zq2bh0ca5zajtk3msnu03ou8). All of it was gathered in `Homework 10 (Part I)`: 
- training
- validation
- augmented
- testing

Here are the neural network models trained on `training`:
- cse217_v1.h5 (still training; watch for announcement on Piazza)
- cse217_v2.h5 (still training; watch for announcement on Piazza)

Here are the neural network models trained on `augmented`:
- cse217_v1_augmented.h5 (still training; watch for announcement on Piazza)
- cse217_v2_augmented.h5 (still training; watch for announcement on Piazza)

Note that to train these models we used the `validation` dataset to determine when to stop the training process. 

## 2. Test Data Collection, Data Profiling, and Model Understanding

In this section, we will get a feel for our data.

### Problem 0

Following the instructions in `Lab10_DataAquisition` take 15 images of rock, paper, and scissors gestures (cf. `1.1 How To Take The Pictures`) and scale them using the provided code (`1.2 Storing, Scaling, and Sharing the Images`). Store them in a folder called `testing` along with the already collected data.

In [1]:
from os import makedirs, mkdir
from os.path import exists

base = 'utility/data'
raw = f'{base}/raw'
dirs = ['rock', 'paper', 'scissors']

if not exists(raw):
    makedirs(raw, exist_ok=True)

for sign in dirs:
    path = f'{raw}/{sign}'
    
    if not exists(path):
        mkdir(path)

**Try this!** Store the images you took of rocks (✊), papers (🤚), and scissors (✌️) in the correct folders in `utility/data/raw`. Then, run the following cell to produced rescaled images, which will be stored in `utility/data/testing`.

In [2]:
import os
import warnings
from utility.util import load_image, resize_image, save_image


testing = f'{base}/testing'

for sign in dirs:
    path = f'{testing}/{sign}'
    
    if not exists(path):
        makedirs(path, exist_ok=True)

for path, _, files in os.walk(raw):
    sign = os.path.basename(path)

    for file in files:
        if '.DS_Store' not in file:
            input_path = f'{path}/{file}'
            output_path = f'{testing}/{sign}/{file}'

            # note! warnings about lossy conversion are ok
            image = load_image(input_path)
            image = resize_image(image, (500, 500))

            save_image(output_path, image)

utility/data/raw/paper/IMG_9660.jpg




utility/data/raw/paper/IMG_9661.jpg




utility/data/raw/paper/IMG_9663.jpg




utility/data/raw/paper/IMG_9662.jpg




utility/data/raw/paper/IMG_9659.jpg




utility/data/raw/rock/IMG_9653.jpg




utility/data/raw/rock/IMG_9650.jpg




utility/data/raw/rock/IMG_9651.jpg




utility/data/raw/rock/IMG_9648.jpg




utility/data/raw/rock/IMG_9649.jpg




utility/data/raw/scissors/IMG_9658.jpeg




utility/data/raw/scissors/IMG_9654.jpeg




utility/data/raw/scissors/IMG_9655.jpeg




utility/data/raw/scissors/IMG_9656.jpeg




utility/data/raw/scissors/IMG_9657.jpeg




### Problem 1

**Write-up!**  Report the number of images per class in each of the four datasets. Are the dataset balanced? No code submission required.
> Hint: For most of this you can use the code from `Lab10_Model` with light modifications. 

### Problem 2
Now, let's look at our models. 

**Write-up!**  Compare the following statistics for all four models: 
- number of parameters
- number of convolutional layers
- number of dense layers
- size of the model (`.h5`) file 

What are the most surprising aspects of these statistics to you? 

In [4]:
from tensorflow.keras.models import load_model

cse217_v1 = load_model('/Users/apple/Desktop/s20/217a /hw10_part2/utility/models/model.v1.raw.h5', compile=False)
cse217_v2 = load_model('/Users/apple/Desktop/s20/217a /hw10_part2/utility/models/model.v2.raw.h5', compile=False)
cse217_v1_augmented = load_model('/Users/apple/Desktop/s20/217a /hw10_part2/utility/models/model.v1.augmented.h5', compile=False)
cse217_v2_augmented = load_model('/Users/apple/Desktop/s20/217a /hw10_part2/utility/models/model.v2.augmented.h5', compile=False)

cse217_v1.summary()
cse217_v2.summary()
cse217_v1_augmented.summary()
cse217_v2_augmented.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 126, 126, 5)       140       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 124, 124, 5)       230       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 62, 62, 5)         0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 60, 60, 5)         230       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 30, 30, 5)         0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 28, 28, 5)         230       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 14, 14, 5)         0

## 3. Model Comparison: v1 vs v2

By now we should know all of the ins and outs about our datasets and models (right?). Let's evaluate and compare the models. 

### Problem 3

First let's investiage which of the two versions `cse217_v1` or `cse217_v2` performs better in the non-augmented setting. You can use the code provided in the *updated version* of  `Lab10_Model` under `5. Evaluate Neural Network on Validation Data` with light modifications. 

**Write-up** For both versions report the accuracy on all three datasets `training`, `validation`, and `testing` and summarize your findings. 
- Which model performs better? Justify your answer based on the presented accuraccies. 
- Argue whether we can be happy with the perfomrance of our model. If yes, justify why, if no, give suggestions on how to imporve the performance. 

In the following cell, we provide an example of how to load the testing. Note the dimensions of the dataset (especially the size of the images).

In [23]:
from utility.util import load_dataset

target_shape = (500, 500)
X_test_example, y_test_example = load_dataset('utility/data/testing', target_shape)

utility/data/testing/paper/IMG_9660.jpg
utility/data/testing/paper/IMG_9661.jpg
utility/data/testing/paper/IMG_9663.jpg
utility/data/testing/paper/IMG_9662.jpg
utility/data/testing/paper/IMG_9659.jpg
utility/data/testing/rock/IMG_9653.jpg
utility/data/testing/rock/IMG_9650.jpg
utility/data/testing/rock/IMG_9651.jpg
utility/data/testing/rock/IMG_9648.jpg
utility/data/testing/rock/IMG_9649.jpg
utility/data/testing/scissors/IMG_9658.jpeg
utility/data/testing/scissors/IMG_9654.jpeg
utility/data/testing/scissors/IMG_9655.jpeg
utility/data/testing/scissors/IMG_9656.jpeg
utility/data/testing/scissors/IMG_9657.jpeg


In [33]:
import skimage
import numpy as np
def accuracy(X, y, model):
    acc = 0
    label =  y
    image = X
    patchsize = model.input_shape[1]
    for i in range(0,len(X)):
        image_tran = skimage.transform.resize(image[i], (patchsize,patchsize))
        outs = model.predict(np.array([image_tran]))
        predicted = np.argmax(outs)
        if label[i][predicted] == 1:
            acc+=1
    
    print("Number of pictures predicted correctly by model: %d" % acc)
    print("Number of picutres in the dataset: %d" % len(X))

    return acc/len(X)


# Compute Model Performance
target_shape = (500, 500)
X_train_example, y_train_example = load_dataset('utility/data/Raw_data/training', target_shape)
v1_training = accuracy(X_train_example, y_train_example, cse217_v1) 
print('Accuracy of v1_training =', v1_training)
v2_training = accuracy(X_train_example, y_train_example, cse217_v2) 
print('Accuracy of v2_training =', v2_training)

target_shape = (500, 500)
X_val_example, y_val_example = load_dataset('utility/data/Raw_data/validation', target_shape)
v1_val = accuracy(X_val_example, y_val_example, cse217_v1) 
print('Accuracy of v1_val =', v1_val)
v2_val = accuracy(X_val_example, y_val_example, cse217_v2) 
print('Accuracy of v2_val =', v2_val)

v1_test = accuracy(X_test_example, y_test_example, cse217_v1) 
print('Accuracy of v1_test =', v1_test)
v2_test = accuracy(X_test_example, y_test_example, cse217_v2) 
print('Accuracy of v2_test =', v2_test)

utility/data/Raw_data/training/paper/IMG_7436.jpeg
utility/data/Raw_data/training/paper/IMG_0996.JPG
utility/data/Raw_data/training/paper/IMG_5150.jpg
utility/data/Raw_data/training/paper/IMG_1849.jpg
utility/data/Raw_data/training/paper/IMG_0598.jpeg
utility/data/Raw_data/training/paper/paper5n.jpg
utility/data/Raw_data/training/paper/IMG_6636.jpeg
utility/data/Raw_data/training/paper/IMG_6266.jpeg
utility/data/Raw_data/training/paper/IMG_6504.jpg
utility/data/Raw_data/training/paper/paper2jche.JPG
utility/data/Raw_data/training/paper/IMG_7322.jpeg
utility/data/Raw_data/training/paper/20200420_184236.jpg
utility/data/Raw_data/training/paper/IMG_20200423_204825.jpg
utility/data/Raw_data/training/paper/Paper.jpg
utility/data/Raw_data/training/paper/paper3zzhao.jpg
utility/data/Raw_data/training/paper/IMG_1451.JPG
utility/data/Raw_data/training/paper/IMG_0997.JPG
utility/data/Raw_data/training/paper/IMG_5151.jpg
utility/data/Raw_data/training/paper/paper2zzhao.jpg
utility/data/Raw_data/t

### Problem 4

Now, that we have summarized and analyzed the average performance of the models, let's look at individual images. 

**Write-up**  Using your own `testing` set and the better performing version that you identified in the previous problem, which of the three classes get predicted more correctly, which of the classes get mistaken for what other classes more frequently? 

> Hint: you may use the visualization implemented in the *updated version* of  `Lab10_Model` under `5. Evaluate Neural Network on Validation Data` (last code cell).  

In [64]:
import pathlib
from tqdm import tqdm
import pandas as pd
def create_user_testdata(path2folder, foldername):
    dataset_directory = pathlib.Path(path2folder)

    # Now check the data
    ddir=dataset_directory/foldername
    cdirs={}
    cdirs.update({ddir/"rock":0,
                  ddir/"paper":1,
                  ddir/"scissors":2})

    names = ["rock", "paper", "scissors"]

    for cdir,cdir_class in cdirs.items():
        assert cdir.exists()==1, str(cdir)+' does not exist'
        print("Found directory {} containing class {}".format(cdir,names[cdir_class]))

    imagesize = 500
    dataset1=[]
    for cdir,cn in reversed(list(cdirs.items())):

        for f in tqdm(list(cdir.glob("*"))):
            try:
                im=skimage.io.imread(f)
                h,w=im.shape[0:2] # height, width
                sz=min(h,w)
                im=im[(h//2-sz//2):(h//2+sz//2),(w//2-sz//2):(w//2+sz//2),:] # defines the central square
                with warnings.catch_warnings():
                    warnings.simplefilter("ignore")
                    im=skimage.img_as_ubyte(skimage.transform.resize(im,(imagesize,imagesize))) # resize it to 500x500, whatever the original resolution
            except:
                warnings.warn("ignoring "+str(f))
                continue

            dataset1.append({
                "file": f,
                "label": cn,
                "image": im
            })

    print("Done")

    dataset1 = pd.DataFrame(dataset1)
    dataset1["dn"] = dataset1["file"].apply(lambda x: x.parent.parts[-2])
    return dataset1


In [65]:
data_for_evaluation = "testing"
base_directory = pathlib.Path("utility/data")
dataset_test = create_user_testdata(base_directory,data_for_evaluation)

 40%|████      | 2/5 [00:00<00:00, 15.11it/s]

Found directory utility/data/testing/rock containing class rock
Found directory utility/data/testing/paper containing class paper
Found directory utility/data/testing/scissors containing class scissors


100%|██████████| 5/5 [00:00<00:00, 18.57it/s]
100%|██████████| 5/5 [00:00<00:00, 20.40it/s]
100%|██████████| 5/5 [00:00<00:00, 22.52it/s]

Done





In [68]:
# Show results by processing a single validataion or testing image
import matplotlib.pyplot as plt
names = ["rock", "paper", "scissors"]

%matplotlib inline
def resultsShow(i, data, model):
    guide = { 0:"rock",1:"paper",2:"scissor"}
    d = data.iloc[i]
    im = d["image"]
    l = d["label"]
    fig,axs = plt.subplots(nrows=1,ncols=3,figsize=(15,5),gridspec_kw={'width_ratios':[1,1,0.5]})
    
    imt = imr = skimage.transform.resize(im, (model.input_shape[1],model.input_shape[1]))
    axs[0].imshow(im)
    axs[0].set_title("Image (true class: {})".format(names[l]))
    
    axs[1].imshow(imt,interpolation="nearest")
    axs[1].set_title("Network input")
    
    outs = model.predict(np.array([imt]))
    predicted = np.argmax(outs)
    print(outs)
    print("predicted label, %s" % guide.get(predicted))
    print("actual label, %s"% guide.get(l))

    axs[2].bar(np.array(range(len(names)))-0.5, outs[0,:], 1, color="gray")
    axs[2].set_ylim([0,1])
    axs[2].set_xticks(range(len(names)))
    axs[2].set_xticklabels(names)
    axs[2].set_ylabel("probability")
    axs[2].set_xlabel("class")
    axs[2].set_title("Network output")
    fig.tight_layout()
    plt.show()
    #fig.savefig("out_{:05d}_{}.png".format(i,("ok" if predicted==l else "ko")))    
print("Results on individual {} inputs: ".format(dataset_test.loc[0].dn)) 
interact(resultsShow, i=widgets.IntSlider(min=0,max=len(dataset_test)-1, step=1, value=0, continuous_update=False), data=fixed(dataset_test.sample(len(dataset_test))), model=fixed(cse217_v2))



Results on individual testing inputs: 


interactive(children=(IntSlider(value=0, continuous_update=False, description='i', max=14), Output()), _dom_cl…

<function __main__.resultsShow(i, data, model)>

## 4. Model Comparison: original vs augmented

Now, let's investiage whether data augmentation imporves performance. 


### Problem 5

Which of the models `cse217_vx`  or `cse217_vx_augmented` for both versions performs better? You can again use the code provided in the *updated version* of `Lab10_Model` under `5. Evaluate Neural Network on Validation Data` with light modifications. 

**Write-up** Report and compare the accuracy on all three datasets `training`, `validation`, and `testing` of the original and the augmented model for both versions. Summarize your findings. 
- Did data augemntation help? 
- Which of the two NN versions benefited or suffered more from data augmentation? 
- Give an explanation/guestimate why this is the case.

In [101]:
import skimage
import numpy as np
def accuracy(X, y, model):
    acc = 0
    label =  y
    image = X
    patchsize = model.input_shape[1]
    for i in range(0,len(X)):
        image_tran = skimage.transform.resize(image[i], (patchsize,patchsize))
        outs = model.predict(np.array([image_tran]))
        predicted = np.argmax(outs)
        if label[i][predicted] == 1:
            acc+=1
    
    print("Number of pictures predicted correctly by model: %d" % acc)
    print("Number of picutres in the dataset: %d" % len(X))

    return acc/len(X)

# Compute Model Performance
v1_training_augmented = accuracy(X_train_example, y_train_example, cse217_v1_augmented) 
print("Model accuracy on cse217_v1_augmented training data:  ", v1_training_augmented)
v2_training_augmented = accuracy(X_train_example, y_train_example, cse217_v2_augmented) 
print("Model accuracy on cse217_v2_augmented training data:  ", v2_training_augmented)

v1_val_augmented = accuracy(X_val_example, y_val_example, cse217_v1_augmented) 
print("Model accuracy on cse217_v1_augmented validation data:  ", v1_val_augmented)
v2_val_augmented = accuracy(X_val_example, y_val_example, cse217_v2_augmented) 
print("Model accuracy on cse217_v2_augmented validation data:  ", v2_val_augmented)

v1_test_augmented = accuracy(X_test_example, y_test_example, cse217_v1_augmented) 
print("Model accuracy on cse217_v1_augmented test data:  ", v1_test_augmented)
v2_test_augmented = accuracy(X_test_example, y_test_example, cse217_v2_augmented) 
print("Model accuracy on cse217_v2_augmented test data:  ", v2_test_augmented)


Number of pictures predicted correctly by model: 514
Number of picutres in the dataset: 784
Model accuracy on cse217_v1_augmented training data:   0.6556122448979592
Number of pictures predicted correctly by model: 626
Number of picutres in the dataset: 784
Model accuracy on cse217_v2_augmented training data:   0.798469387755102
Number of pictures predicted correctly by model: 123
Number of picutres in the dataset: 195
Model accuracy on cse217_v1_augmented validation data:   0.6307692307692307
Number of pictures predicted correctly by model: 142
Number of picutres in the dataset: 195
Model accuracy on cse217_v2_augmented validation data:   0.7282051282051282
Number of pictures predicted correctly by model: 13
Number of picutres in the dataset: 15
Model accuracy on cse217_v1_augmented test data:   0.8666666666666667
Number of pictures predicted correctly by model: 12
Number of picutres in the dataset: 15
Model accuracy on cse217_v2_augmented test data:   0.8


### Problem 6

Now, let's have some fun! 

Let's explore a _real-time_ version of the model you identified as performing best running with your webcam. Open a new terminal window (on Mac OS you will need to use the built-in terminal app) and navigate to the directory, where you stored the model. Once there, run the following command, substituting `<model_name>` for the name of the file containing your model:

```
$ python(3) realtime.py <model_name>
```

Have fun!

Note, `realtime.py` uses opencv, so you miht need to install it: 

- **opencv**: `pip(3) install opencv-python`


**Write-up**  Summarize the performance of our NN model. 
- When does it work well, when does it have difficulties in predicting the correct gesture? Consider angle, background, and distance in your answer.  
- Which of the three classes get predicted more correctly, which of the classes get mistaken for what other classes more frequently? 

And that's it! Remember to review your work and make sure it is well presented and organized. Not everyting you coded up needs to remain in your submission, infact for this hw, we arenot expecting any code submission. **[Does [this cell] spark joy?](https://i.kinja-img.com/gawker-media/image/upload/s--iW_3HGbT--/c_scale,dpr_2.0,f_auto,fl_progressive,q_80,w_800/oruf4oavtj5vpmvaquew.jpg)** You are always trying to communicate your findings to somebody, _maybe even yourself_.