# Assignment 7: Intro To Keras With Feed Forward NNs and Theory Recap

### PART I - THEORY RECAP

Answer the following review questions below. The questions are based off material seen in previous lectures/assignments, but some concepts were only briefly mentioned! If you can't find an answer from the lecture slides, a quick search online should be very helpful (i.e. always search online first before panicking :) )

### 1 - Linear vs Polynomial Regression
- Describe both Linear Regression and Polynomial Regression (3 lines or less each).

- Describe overfitting vs underfitting with respect to parameters.  


In [None]:
### YOUR ANSWER HERE - YOUR MAY USE MARKDOWN, LATEX, CODE, DIAGRAMS, ETC

### 2 - Logistic Regression vs. Linear SVM
- Describe how logistic regression works (3 lines or less)
- Describe how linear SVM works. Mention the role(s) of:
    - support vectors
    - margin
    - slack variables
    - kernels
- Plot an example for SVM where the linear kernel is not enough to separate the data, but another kernel works

In [None]:
## YOUR ANSWER HERE - YOUR MAY USE MARKDOWN, LATEX, CODE, DIAGRAMS, ETC

### 3 - Linear SVM vs k-NN
- K-Nearest Neighbours is a popular unsupervised learning algorithm. Explain the difference between supervised and unsupervised learning?
- K-NN is an example of a lazy learning algorithm. Why is it called so. What could be a use case? Justify using a lazy learning algorithm in that case.
- Outline the main steps for the KNN algorithm. Use text, code, plots, diagrams, etc as necessary.  
- Plot a example dataset which works in an SVM classification and not k-NN classification. Repeat for the reverse scenario.

In [None]:
## YOUR ANSWER HERE - YOUR MAY USE MARKDOWN, LATEX, CODE, DIAGRAMS, ETC

### 5 - Ensemble Methods
- Explain bagging and boosting. Clearly illustrate the difference between these methods. When would you use either one?
- What is a decision tree? What is a random forest? Compare them and list 3 pros and cons of each?

In [None]:
## YOUR ANSWER HERE - YOUR MAY USE MARKDOWN, LATEX, CODE, DIAGRAMS, ETC

### 6 - PCA
- Describe how PCA achieves dimensionality reduction. Outline the main steps of the algorithm
- What is the importance of eigenvectors and eigenvalues in the PCA algorithm above.
- When we compute the covariance matrix in PCA, we have to subtract the mean. Why do we do this?

In [None]:
## YOUR ANSWER HERE - YOUR MAY USE MARKDOWN, LATEX, CODE, DIAGRAMS, ETC

### PART II - FEED FORWARD NEURAL NETWORKS WITH KERAS

In this section, we will be using the [Keras Deep Learning Library](https://keras.io/) to train a feed forward neural network. Please refer to class demo and documentation for instructions on how to use the library.

In [None]:
# Import the following libraries
%matplotlib inline

from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense
from keras.datasets import mnist
from keras.layers import Dropout
import numpy as np
import matplotlib.pyplot as plt

#### 1 - Load Dataset
We will be using the MNIST dataset which contains photos of handwritten digits. Keras already comes with this dataset, so we can load it directly with library functions as follows:

In [None]:
# Load MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

#### 2 - Inspect Data
Start by inspecting the data. Write code to output the following: 
- The shape of the training data
- The shape of the testing data
- The total number of outputs
- A lit of the output classes
- Display the first image in the training data
- Display the first image in the training data

In [None]:
### ======== YOUR CODE HERE ========== ###

# Print shape of training data
print('Training data shape : <PRINT SHAPE HERE>')
 
# Print shape of testing data    
print('Testing data shape : <PRINT SHAPE HERE>')
 
# Print the total number of outputs
print('Total number of outputs : <PRINT NUMBER HERE>')

# Print the list of output classes
print('Output classes : <PRINT CLASSES HERE>')
 
# Display the first image in the training data    
 
# Display the first image in the testing data  

#### 3 - Preprocess Data
As usual, we need to preprocess our data. Write code to do the following:
- Reshape the images from a 28x28 matrix to 784 flattend array (28x28=784), so it can be fed into the network as a single feature
- Convert the datatype to 'float32' and then normalie the pixels so the values range between 0 and 1 (Hint: The current values range from 0 to 255).

In [None]:
### ======== YOUR CODE HERE ========== ###

# Flatten the data to a 784 element array for both the train and test sets

# Change to data type to float32
 
# Normalize the data to a scale between 0 and 1

#### 4 - Convert Labels 
Conver the labels from integer to categorical (one-hot encoding). We have to do this conversion, because that is the format required by Keras to perform multiclass classification. One-hot encoding converts the integer to an array of all zeros except a 1 at the index of the integer.

For example, using a one-hot encoding for 10 classes, the integer 5 will be encoded as 0000010000

In [None]:
### ======== YOUR CODE HERE ========== ###

# Change the labels from integer to categorical data and store this in a new variable 
# (Hint: Use the to_categorical function in keras)
 
# Display the original and converted labels the item in the dataset

#### 5 - Create model

Create a sequential model with the following architecture:
- an input dense layer of 512 units using the ReLU activation function, with input dimension of 784
- a dense layer of 512 units with the ReLU activation function
- a output layer of 10 units with the softmax activation function (output layer)

In [None]:
### ======== YOUR CODE HERE ========== ###

# Create sequential model

#### 6 - Compile Model

Compile the model with an **rmsprop optimizer, categorical_crossentropy loss, and accuracy metrics**. You can try other optimizers too such as sgd.

In [None]:
### ======== YOUR CODE HERE ========== ###

# Compile model

#### 7 - Train model
Fit the model and train for **20 epochs** and a **batch size of 256**.

In [None]:
### ======== YOUR CODE HERE ========== ###

# Train model

#### 8 - Evaluate model
Report the loss and accuracy on the test data. (Hint: Use the built in *model.evaluate* function in keras)

In [None]:
### ======== YOUR CODE HERE ========== ###

# Report loss and accuracy on test data

#### 9 - Plot results
- Plot the loss curves for both training and validation
- Plot the accuracy curves for both training and validation

Hint: Use the *.history* function to access these results.

In [None]:
### ======== YOUR CODE HERE ========== ###

# Plot the Loss Curves
 
# Plot the Accuracy Curves

#### 10 - Regularization
If done properly, you will notice there is some overfitting in the model. To overcome this, we will use a regularization method called **dropout** which you will learn in the CNN lecture. This requires simply adding dropout layers into our model. To do so, repeat steps 5-8 with the following architecture.
- an input dense layer of 512 units using the ReLU activation function, with input dimension of 784
- a dropout layer with 0.5 dropout rate
- a dense layer of 512 units with the ReLU activation function
- a dropout layer with 0.5 dropout rate
- a output layer of 10 units with the softmax activation function (output layer)

In [None]:
### ======== YOUR CODE HERE ========== ###

# Repeat step 5 with new architecture

# Repeat step 6 with new model

# Repeat step 7 with new model

# Repeat step 8 with new model

# Repeat step 9 with new model

#### BONUS QUESTION - Modified MNIST
The modified MNIST dataset consists of images that contain multiple written digits with background noise. The labels for this dataset is the largest digit contained in the image. Your challenge will be to train a feed forward network to predict these labels. You can follow the same steps as in this assignment to create the feed forward neural network, but you will likely have to experiment with different data preprocessing techniques and network structures. You will also need to load the dataset yourself and put it into a form that is acceptable by keras.

The dataset can be downloaded from (https://techx.blob.core.windows.net/modified-mnist/modified-mnist.zip)[here].