# Deep Learning Project 2 

* Due latest by Friday 3/8

## Multi-class classification via neural networks

In this project, you will learn how to implement all the basic components of a neural network, including forward propagation, gradient computation, and back propagation, following the development framework presented by Professor Andrew Ng for [binary classification](https://www.youtube.com/watch?v=eqEc66RFY0I&list=PLkDaE6sCZn6Ec-XTbcX1uRg2_u4xOEky0&index=7). The activities in this project will help you gain valuable intuition regarding several of the fundamental programming techniques powering neural network libraries. You will also develop appreciation, through hands-on application, for some of the practical considerations involved in training a neural network. 

As in Project 1, we will use the MNIST dataset to experiment with binary and multi-class classification problems. 

### Learning outcomes
After completing project 2, you will be able to:
* Implement neural networks that use cross-entropy loss for binary classification 
* Apply ideas originating in binary classification to multi-class classification problems
* Describe and apply activation functions, such as sigmoid and softmax
* Understand the role of parameters and hyper parameter initialization

### Multi-class classification: the MNIST dataset
The MNIST dataset consists of 70,000 gray-scale images (samples) of hand-written digits 0 through 9. The multi-class classification problem consists of classifying each sample accurately as belonging to one of ten classes. This dataset is divided into training (60,000) and test (10,000) datasets. 

### 1. Logistic regression using a neural network implementation framework (50% undergrad, 30% grad)
Prof. Andrew Ng's Coursera videos, assigned in Module 2, explain how logistic regression can be implemented as a single neuron that receives images as input and predicts their classification into one of two classes (cats vs. non-cats in his videos). He explains in detail how the process can be separated into a forward pass, calculation of a loss function, and numerical optimization using gradient descent in the back propagation step. 

Your job for this part of the project is to implement the "logistic regression with a neural network mindset" approach described by Professor Ng. For this you, will use a Jupyter notebook provided as part of his Coursera course. A zip file (Logistic Regression as a Neural Network.zip) containing this notebook as well as other files and folders needed can be found as apart of the Google Classroom assignment.

#### Implementation requirements (50%)
* The Jupyter notebook contains step-by-step implementation instructions. Follow these instructions carefully.
* Your code should use the vectorization techniques learned from Prof. Ng's videos. **Pay attention to the order of dimensions of the data matrix X, they are ordered as (features, samples)** 
* You can use the cat/non-cat dataset to debug your implementation, but it's not required.

**Suggestions:** 
* Always keep the size of your matrices and vectors in mind to avoid confusion.
* Include some tests or sanity checks as you've seen in our homework assignment and in Prof. Ng's notebook.
* Avoid loops. Learn to use vectorization.

#### Application (20%)
Can we solve the 10-class MNIST classification problem using our binary classification logistic regression code? The answer is yes. Think about how to reframe the problem so that it can be solved via your binary classification code. Then explain how you are going to do this, discussing the pros and cons of your approach.  


#### Results and Analysis (30%)
Solve the classification problem and obtain performance results, be sure to specify (and experiment with) the value of $\alpha$, your hyperparameter. These results should include plots of the cost function value, the training accuracy, and the test accuracy at each iteration (epoch) of your code. Finally, analyze the results you obtained. Here are a few things you might want to consider.

* How do the learning cost, the train accuracy, and the test accuracy curves change as function of the learning rate assuming a fix number of iterations (say, 2000 iterations)? 
* For what range of $\alpha$ does convergence "fail"? 
* For approximately what values of $\alpha$ you obtain best performance?

**Bonus (5%)**: The weight vector has the same size as the input images, so you can actually reshape it to look like one. What's happening to the weight vector "image" as the code converges? What is the algorithm *learning*?


### 2. Extending the framework for multi-class classification (50% undergrad, 70% grad)

In this part, we want to solve the 10-class MNIST classification problem via a proper neural network with 10 outputs (i.e. a probability for each class). Here are a couple of ways to extend the idea behind binary classification for the purpose of multi-class classification.  
 * Option 1. Implement a neural network consisting of one layer containing 10 nodes. Each node will have its own weight vector and bias values. Then, replace the sigmoid activation function with a softmax activation function. The code must put together all 10 weight vectors $\mathbf{w}_i$ into a matrix $\mathbf{W}$ and all 10 bias values into a vector $\mathbf{b}$ and apply vectorization. Prof. Ng goes over how to do this in his C1W3 lectures. 
 * Option 2. Implement a two-layer neural network. The input image goes into all the nodes in the first layer (which contains, say 100 nodes). The output produced by this layer then goes as input to a second layer consisting of 10 nodes and softmax activation function. This is a much more powerful neural network because of the ability of the first layer (the hidden layer) to learn intermediate information about the problem. Prof. Ng lectures also go over in detail about how to obtain the partial derivatives and vectorize the code.
 

**Undergraduate students** You can choose to implement option 1 or 2. (70%)

**Grad students** You must implement option 2. (70%)

As in part 1, solve the classification problem with your code, obtain and analyze your results. (30%)


### What to turn in 

You will turn in this assignment via Google Classroom. Let me know if you have any issues so that I can fix those accordingly.

What to submit:
* Two Jupyter notebooks containing your code for Projects 1 and 2. Be sure to include the generated by each cell.
* A report of maximum 4 pages in length in **PDF** containing sections describing: the methodology, the results, and the analysis.
    

### PART 1


In [2]:
#packages 
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from tensorflow.keras.datasets import mnist


In [3]:
#load data
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# The number of samples in the training and test sets
number_train = train_images.shape[0]
number_test = test_images.shape[0]
# The dimension of each image
height = train_images.shape[1]
width =train_images.shape[2]
unique_values, counts = np.unique(train_labels, return_counts=True)
_, counts_test = np.unique(test_labels, return_counts=True)
total_samples = counts + counts_test
print (f"Number of training samples = {number_train}")
print (f"Number of testing samples = {number_test}")
print (f"Original sample dimensions: {width} x {height}")
print(f"Labels (classes): {unique_values}")
print (f"Number of samples per class (training): {counts}")
print (f"Number of samples per class (testing): {counts_test}")
print(f"Total number of samples per class (training + testing): {total_samples}")


Number of training samples = 60000
Number of testing samples = 10000
Original sample dimensions: 28 x 28
Labels (classes): [0 1 2 3 4 5 6 7 8 9]
Number of samples per class (training): [5923 6742 5958 6131 5842 5421 5918 6265 5851 5949]
Number of samples per class (testing): [ 980 1135 1032 1010  982  892  958 1028  974 1009]
Total number of samples per class (training + testing): [6903 7877 6990 7141 6824 6313 6876 7293 6825 6958]


In [5]:
#Flattening the data

### START CODE HERE ### (≈ 2 lines of code)
train_set_x_flatten = train_images.reshape(train_images.shape[0], -1).T 
test_set_x_flatten = test_images.reshape(test_images.shape[0], -1).T 
### END CODE HERE ###

print ("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
print ("train_set_y shape: " + str(train_labels.shape))
print ("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
print ("test_set_y shape: " + str(test_labels.shape))


train_set_x_flatten shape: (784, 60000)
train_set_y shape: (60000,)
test_set_x_flatten shape: (784, 10000)
test_set_y shape: (10000,)


In [6]:
#Normalize data
train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.

<h5> Setting up Helper functions </h5>

In [8]:
# Function definition for sigmoid(x)
def sigmoid(x):
    ### START CODE HERE ###
    y = 1 / (1 + np.exp(-x))
    ### END CODE HERE ###
    return y

In [9]:
#check sigmoid function
print ("sigmoid([0, 2]) = " + str(sigmoid(np.array([0,2]))))

sigmoid([0, 2]) = [0.5        0.88079708]
