# Homework 10: Evaluating our Gesture Recognition NNs 🕸

Name: Mahoto Sasaki

Student ID: 467695

Collaborators:


## Instructions

In our _last_ homework (woohoo!), we will be analyzing and evaluating the gesture recognition data and models created in `Lab10`. This is a great opportunity to recap the **Data Science workflow** with all its major aspects: 

- exploratory data analysis (EDA) and data profiling
- machine learning workkflow
- training, validation, testing data
- model comparison
- presenting results (creating plot)

It will be extremely helpful to review **Lab 10 (Gesture Recognition with Neural Networks)** first.

In general, you should feel free to import any package that we have previously used in class. Ensure that all plots have the necessary components that a plot should have (e.g. axes labels, a title, a legend).

Furthermore, in addition to recording your collaborators on this homework, please also remember to cite/indicate all external sources used when finishing this assignment. This includes peers, TAs, and links to online sources. Note that these citations will not free you from your obligation to submit your _own_ code and write-ups, however, they will be taken into account during the grading and regrading process.

### Submission instructions
* Submit this python notebook including your answers in the code cells as homework submission.
* **Feel free to add as many cells as you need to** — just make sure you don't change what we gave you. 
* **Does it spark joy?** Note that you will be partially graded on the presentation (_cleanliness, clarity, comments_) of your notebook so make sure you [Marie Kondo](https://lifehacker.com/marie-kondo-is-not-a-verb-1833373654) your notebook before submitting it.

## 1. Introduction

The data needed for this assignemnt can be found [here](https://wustl.box.com/s/q8mnl1o2zq2bh0ca5zajtk3msnu03ou8). All of it was gathered in `Homework 10 (Part I)`: 
- training
- validation
- augmented
- testing

Here are the neural network models trained on `training`:
- cse217_v1.h5 (still training; watch for announcement on Piazza)
- cse217_v2.h5 (still training; watch for announcement on Piazza)

Here are the neural network models trained on `augmented`:
- cse217_v1_augmented.h5 (still training; watch for announcement on Piazza)
- cse217_v2_augmented.h5 (still training; watch for announcement on Piazza)

Note that to train these models we used the `validation` dataset to determine when to stop the training process. 

## 2. Test Data Collection, Data Profiling, and Model Understanding

In this section, we will get a feel for our data.

### Problem 0

Following the instructions in `Lab10_DataAquisition` take 15 images of rock, paper, and scissors gestures (cf. `1.1 How To Take The Pictures`) and scale them using the provided code (`1.2 Storing, Scaling, and Sharing the Images`). Store them in a folder called `testing` along with the already collected data.

In [1]:
from os import makedirs, mkdir
from os.path import exists

base = 'utility/data'
raw = f'{base}/raw'
dirs = ['rock', 'paper', 'scissors']

if not exists(raw):
    makedirs(raw, exist_ok=True)

for sign in dirs:
    path = f'{raw}/{sign}'
    
    if not exists(path):
        mkdir(path)

**Try this!** Store the images you took of rocks (✊), papers (🤚), and scissors (✌️) in the correct folders in `utility/data/raw`. Then, run the following cell to produced rescaled images, which will be stored in `utility/data/testing`.

In [None]:
import os
import warnings
from utility.util import load_image, resize_image, save_image


testing = f'{base}/testing'

for sign in dirs:
    path = f'{testing}/{sign}'
    
    if not exists(path):
        makedirs(path, exist_ok=True)

for path, _, files in os.walk(raw):
    sign = os.path.basename(path)

    for file in files:
        input_path = f'{path}/{file}'
        output_path = f'{testing}/{sign}/{file}'
        
        # note! warnings about lossy conversion are ok
        image = load_image(input_path)
        image = resize_image(image, (500, 500))

        save_image(output_path, image)

### Problem 1

**Write-up!**  Report the number of images per class in each of the four datasets. Are the dataset balanced? No code submission required.
> Hint: For most of this you can use the code from `Lab10_Model` with light modifications. 

### Problem 2
Now, let's look at our models. 

**Write-up!**  Compare the following statistics for all four models: 
- number of parameters
- number of convolutional layers
- number of dense layers
- size of the model (`.h5`) file 

What are the most surprising aspects of these statistics to you? 

## 3. Model Comparison: v1 vs v2

By now we should know all of the ins and outs about our datasets and models (right?). Let's evaluate and compare the models. 

### Problem 3

First let's investiage which of the two versions `cse217_v1` or `cse217_v2` performs better in the non-augmented setting. You can use the code provided in the *updated version* of  `Lab10_Model` under `5. Evaluate Neural Network on Validation Data` with light modifications. 

**Write-up** For both versions report the accuracy on all three datasets `training`, `validation`, and `testing` and summarize your findings. 
- Which model performs better? Justify your answer based on the presented accuraccies. 
- Argue whether we can be happy with the perfomrance of our model. If yes, justify why, if no, give suggestions on how to imporve the performance. 

In the following cell, we provide an example of how to load the testing. Note the dimensions of the dataset (especially the size of the images).

In [None]:
from utility.util import load_dataset

target_shape = (500, 500)
X_test_example, y_test_example = load_dataset('utility/data/testing', target_shape)

### Problem 4

Now, that we have summarized and analyzed the average performance of the models, let's look at individual images. 

**Write-up**  Using your own `testing` set and the better performing version that you identified in the previous problem, which of the three classes get predicted more correctly, which of the classes get mistaken for what other classes more frequently? 

> Hint: you may use the visualization implemented in the *updated version* of  `Lab10_Model` under `5. Evaluate Neural Network on Validation Data` (last code cell).  

## 4. Model Comparison: original vs augmented

Now, let's investiage whether data augmentation imporves performance. 


### Problem 5

Which of the models `cse217_vx`  or `cse217_vx_augmented` for both versions performs better? You can again use the code provided in the *updated version* of `Lab10_Model` under `5. Evaluate Neural Network on Validation Data` with light modifications. 

**Write-up** Report and compare the accuracy on all three datasets `training`, `validation`, and `testing` of the original and the augmented model for both versions. Summarize your findings. 
- Did data augemntation help? 
- Which of the two NN versions benefited or suffered more from data augmentation? 
- Give an explanation/guestimate why this is the case.

### Problem 6

Now, let's have some fun! 

Let's explore a _real-time_ version of the model you identified as performing best running with your webcam. Open a new terminal window (on Mac OS you will need to use the built-in terminal app) and navigate to the directory, where you stored the model. Once there, run the following command, substituting `<model_name>` for the name of the file containing your model:

```
$ python(3) realtime.py <model_name>
```

Have fun!

Note, `realtime.py` uses opencv, so you miht need to install it: 

- **opencv**: `pip(3) install opencv-python`


**Write-up**  Summarize the performance of our NN model. 
- When does it work well, when does it have difficulties in predicting the correct gesture? Consider angle, background, and distance in your answer.  
- Which of the three classes get predicted more correctly, which of the classes get mistaken for what other classes more frequently? 

And that's it! Remember to review your work and make sure it is well presented and organized. Not everyting you coded up needs to remain in your submission, infact for this hw, we arenot expecting any code submission. **[Does [this cell] spark joy?](https://i.kinja-img.com/gawker-media/image/upload/s--iW_3HGbT--/c_scale,dpr_2.0,f_auto,fl_progressive,q_80,w_800/oruf4oavtj5vpmvaquew.jpg)** You are always trying to communicate your findings to somebody, _maybe even yourself_.