# Lab 4: Reading Sign Language with Convolutional Networks
CNNs have revolutionalized basically every problem that takes an image as input, and the simplest of these is image classification.

This project involves using TensorFlow and Keras to build a program that recognizes numbers 0-9 in sign language by classifying images of hands signing those digits.

## Section 0: Download the data
The data is available on Kaggle, at https://www.kaggle.com/ardamavi/sign-language-digits-dataset/.
You'll need an account to download it; let me know if you can't do this.

Make a directory called `data`, then unzip the data files inside that directory.
Your final directory structure should contain files:
 - `.../lab_4_cnn/data/X.npy`
 - `.../lab_4_cnn/data/Y.npy`

## Section 1: Understand the data
I've taken care of loading the data for you.
Read through the code (especially comments) so you understand what it does, and check out the plots.

In [None]:
%matplotlib inline
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

In [None]:
x_all = np.load('data/X.npy')
y_all = np.load('data/Y.npy')

In [None]:
# For whatever reason, the data's labels aren't the actual
# numbers depicted. This box fixes that.
# Real-world data is usually messy; this is one example.

# Maps dataset-provided label to true label
label_map = {0:9, 1:0, 2:7, 3:6, 4:1, 5:8, 6:4, 7:3, 8:2, 9:5}

# Correct dataset labels
for row in range(y_all.shape[0]):
    dataset_label = np.where(y_all[row])[0][0]
    y_all[row, :] = np.zeros(10)
    y_all[row, label_map[dataset_label]] = 1

In [None]:
# Seed numpy rng for reproducibility
np.random.seed(1337)

# Shuffle features and targets together
# Credit for this technique to:
# https://stackoverflow.com/questions/4601373/
# better-way-to-shuffle-two-numpy-arrays-in-unison
rng_state = np.random.get_state()
np.random.shuffle(x_all)
np.random.set_state(rng_state)
np.random.shuffle(y_all)

# Add a dummy channel axis to input images
x_all = np.expand_dims(x_all, axis=-1)

# Center and rescale data to the range [-1, 1]
x_all = x_all - 0.5
x_all = x_all * 2

# Create a validation set from 30% of the available data
n_points = x_all.shape[0]
n_test = int(n_points * 0.3)
n_train = n_points - n_test
x_train, x_test = np.split(x_all, [n_train], axis=0)
y_train, y_test = np.split(y_all, [n_train], axis=0)

In [None]:
# Print important shapes in the dataset
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)
print('y_train shape:', y_train.shape)
print('y_test shape:', y_test.shape)

In [None]:
### Plot the first example of each digit in the training set.
### You don't need to understand the code, just look at the output.

# Set up plots
plots = plt.subplots(nrows=2, ncols=5, figsize=(10, 4))
axes_list = plots[1].ravel()

for digit in range(10):
    axes = axes_list[digit]
    axes.set_axis_off()
    axes.set_title(digit)
  
    # Find the index of the first appearance of this digit
    idx = np.where(y_train[:, digit] == 1)[0][0]
    
    # Plot the image
    axes.imshow(x_train[idx, :, :, 0],
                cmap='gray')

## Section 2: Build a TensorFlow data pipeline
Set up any `tf.data.Dataset` and `tf.data.Iterator` objects you need.

I used two `Dataset`s and a single reinitializable `Iterator`, but there are multiple ways to solve this problem.

In [None]:
# Your code here

## Section 3: Build a model graph
We'll be building a fairly "traditional" image-processsing CNN: a few layers of 2-D convolutions and max pooling, then flattening, dense layers, and an output layer.

Feel free to experiment with the model architecture.
With a simple network, expect around 90% accuracy.
(Logistic regression gets about 75%).

### 3.1: Input tensors
Get a tensor for the input image and another for the correct label.
Note that the label is already one-hot encoded.

In [None]:
# Your code here

### 3.2: Helper function to make dense layers
Copy the `make_dense_layer()` function you wrote last week here, since this model will need dense layers too.

(In practice, we'd use `tf.layers.Dense`, which does basically the same thing, but using your code from last week gives you more flexibility to do things like plot histograms with minimal extra work).

In [None]:
def make_dense_layer(prev_activations, dim_input, dim_output, 
                     do_activation=True, postfix=''):
    '''
    Adds a dense layer to the model graph.
    
    Parameters
    ----------
    prev_activations: tensor
        The activations of the previous layer, or 
        the input for the first dense layer.
    dim_input: int
        Number of features in the input representation.
    dim_output: int
        Number of features in the output representation.
        Equivalently, number of units in this layer.
    do_activation: bool
        Whether or not to apply ReLU activation.
    postfix: string
        Postfix on name and variable scopes in this layer.
        Used to simplify visualizations.
        
    Returns
    -------
    A tensor representing the activations of this layer.
    '''
    with tf.name_scope('dense' + postfix):
        with tf.variable_scope('dense' + postfix):
            # Define variables here
        # Define operations here

### 3.3: Helper function to make convolutional layers
Write a function, similar to `make_dense_layer()`, called `make_conv_layer()`, which has its signatures and scopes defined as a stub below.
When called, it should:
 1. Add variables named `filters` and `biases`, of appropriate shapes, to the graph.
 2. Compute the 2-D discrete convolution of `filter` over `input_` using `tf.nn.conv2d`, using the correct filter size and strides, and add in the bias.
 3. If `do_activation`, apply ReLU activation using `tf.nn.relu`.
 4. If `add_summary`, then create a new 1-channel `tf.summary.image()`s for each channel of the activation (pre-pooling). This will allow us to visualize the activation maps of various filters throughout training. Note that each image needs a channel dimension, though it should be 1 here.
 5. If `pool_size > 1`, uses `tf.nn.max_pool` to perform max pooling on the width and height axes.
 6. Return the activations if `do_activation`, or the pre-activation otherwise.
 
Hints:
 - Read [the tf.nn.conv2d documentation](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d) extensively! It tells you what shapes your variables need to have.
 - The layer will have one kernel per input channel, and one bias per output channel
 - The `conv2d` strides should always be 1 in the batch and channel axes
 - In `max_pool`, the arguments `ksize` and `strides` will be the same
 - `padding` can be either 'SAME' or 'VALID'. I used 'SAME'. 

In [None]:
def make_conv_layer(input_, input_channels, n_filters,
                    filter_size=3, stride=1, 
                    do_activation=True, pool_size=1,
                    add_summary=False, postfix=''):
    '''
    Adds a convolutional layer to the model graph.
    
    Parameters
    ----------
    input_: tensor
        The activations of the previous layer, or 
        the input for the first dense layer.
    input_channels: int
        Number of channels in the input representation.
    n_filters: int
        Number of channels in the output representation.
        Equivalently, number of filters in this layer.
    filter_size: int
        Width and height of each kernel in the layer's filters.
    stride: int
        Stride to use in the x and y directions for the
        convolution operation.
    do_activation: bool
        Whether or not to apply ReLU activation.
    pool_size: int
        If > 1, does max pooling of this size to the
        width and height axes of the activation.
    add_summary: bool
        Whether or not to log activations as summary images.
    postfix: string
        Postfix on name and variable scopes in this layer.
        Used to simplify visualizations.
        
    Returns
    -------
    A tensor representing the activations of this layer.
    '''
    with tf.name_scope('conv' + postfix):
        with tf.variable_scope('conv' + postfix):
            pass # Define variables here
        pass # Define operations here 

### 3.4: Make the feature-extraction layers
Now for the fun part.
Use `make_conv_layer()` and `make_dense_layer()` to make the main part of the model.
Set `add_summary=True` for at least one convolutional layer.

Hints:
 - Try some layers with convolution and max pooling, then flatten (using `tf.reshape`) and add dense layers
 - The first convolutional layer has 1 input channel
 - The input is 64x64, so do lots of downsampling with strides and max pooling before you switch to dense layers to prevent having a huge number of parameters. (If you don't downsample at all, use 32 filters in the last convolutional layer, and use 128 units in the first dense layer, there will be 64\*64\*32\*128 = ~17 million parameters in that dense layer alone! The whole model should ideally have less than 1 million parameters.)
 - If you're having trouble designing a model, try doing it in Keras first and visualizing the shapes and parameters with `model.summary()`

In [None]:
# Your code here

### 3.5: Compute logits
Use `make_dense_layer()` to make a final dense layer with `dim_output=10` and no activation to compute the final per-class logits.

In [None]:
# Your code here

### 3.6: Compute class probability for output
Use `tf.nn.softmax` to compute the class probabilities.
We will not use this for the loss, just for the output.

In [None]:
# Your code here

### 3.7: Compute cross-entropy loss
Use `tf.nn.softmax_cross_entropy_with_logits_v2()` to compute the per-example loss, then `tf.reduce_mean()` to compute the mean loss for the batch.

Add a summary scalar to plot loss in TensorBoard, and assign it to a variable since this time we'll need it later.

In [None]:
# Your code here

### 3.8: Optimizer and gradients
Make an optimizer (I used `tf.train.MomentumOptimizer` with `lr=1e-3` and `momentum=0.9`) and an operation to apply the gradients (either `optimizer.minimize()`, or compute the gradients manually).

In [None]:
# Your code here

### 3.9: Predicted digit and per-batch accuracy
The model should predict the digit it assigns the highest probability.
Then, add a tensor which represents what fraction of the batch the model predicted correctly (its accuracy, or average 0/1 loss), and a summary operation for accuracy.

Finally, add a tensor to merge the summaries made so far.

Hint: to get the numerical value from the one-hot encoded label, use `tf.argmax`.

In [None]:
# Your code here

## Section 4: train the model
Same training loop as always, with one small modification: you don't want to save every summary every batch because the `tf.summary.image()` operations save images.
Run the loss summary operation every batch.


Only run the full `tf.summary.merge_all()` operation once per every few batches, and when you do, use `feed_dict` to overwrite your input tensors with a "batch" of a single example (the same one every time).
This will let you visualize in the TensorBoard Images tab how the activation maps of various filters on that one example change as the network trains.

In [None]:
# Your code here

## Section 5: visualization
Run TensorBoard, go to the Images tab, and look at how the final activation maps in various layers differ from each other, and (by dragging the slide bar at the top of each) how the activation map of a single filter develops as the network trains.

This is what some activation maps of my first-layer, size-5 convolutions look like:
![First-layer activations](./images/conv_outs.png)

I see some interesting results here.
The first and fourth filters seem to be activating on the background, the second detects the outside edges of the hand, and the third activates for sharp vertical gradients.

Second-layer activation maps are a little more abstract, but still mostly make sense:
![Second-layer activations](./images/conv_outs_2.png)
The first one is really interesting, it seems to detect areas of high complexity in the image.

My fourth-layer activation maps are too abstract to make any sense of:
![Fourth-layer activations](./images/conv_outs_3.png)