# (E8) Classification of MNIST Hand-written Digits
In this exercise, you will be given an example of [MNIST classification](http://yann.lecun.com/exdb/mnist/). 
You should be able to replicate the results given here if you have completed (E2)-(E5) correctly.

It would be best if you have a Python IDE (integrated development environment) such as [PyCharm](https://www.jetbrains.com/pycharm/) and [Anaconda](anaconda.com) is installed because they will make your life easier! If not, you may want to work on the assignment using Google Colab. In any cases, what you need to do is 1) to fill in the blanks in .py files; and 2) to import the files (e.g., layer.py, optim.py, model.py, etc) that you have completed for use. Here are some scenarios how you would go about doing the assignment: 

#### Without Google Colab: Python IDE + Anaconda 
If you have a Python IDE and Anaconda installed, you can do one of the following:
- Edit .py files in the IDE. Then, simply open .ipynb file also in the IDE where you can edit and run codes. 
- Your IDE might not support running .ipynb files. However, since you have installed Anaconda, you can just open this notebook using Jupyter Notebook.

In both of these cases, you can simply import .py files in this .ipynb file:
```python
from model import NeuralNetwork
```
 
#### With Google Colab
- Google Colab has an embedded code editor. So, you could simply upload all .py files to Google Colab and edit the files there. Once you upload the files, double click a file that you want to edit. Please **make sure that you download up-to-date files frequently**, otherwise Google Colab might accidentally restart and all your files might be gone.
- If you feel like the above way is cumbersome, you could instead use any online Python editors for completing .py files (e.g., see [repl.it](https://repl.it/languages/python3)). Also, it's not impossible that you edit the files using any text editors, but they don't show you essential Python grammar information, so you'll be prone to make mistakes in that case. Once you are done editing, you can either upload the files to Colab or follow the instruction below. 
 
- If you have *git clone*d the assignment repository to a directory in your Google Drive (or you have the files stored in the Drive anyway), you can do the following:
```jupyterpython
from google.colab import drive
drive.mount('/content/drive/')          # this will direct you to a link where you can get an authorization key
import sys
sys.path.append('/content/drive/My Drive/your-directory-where-the-python-files-exist')
```
Then, you are good to go. When you change a .py file, make sure it is synced to the drive, then you need to re-run the above lines to get access to the latest version of the file. Note that you should give correct path to *sys.path.append* method.

Now, let's get started!

## Dataset
MNIST dataset has been one of the most frequently used dataset. Among the total of 70,000 (28x28) images, 60,000 are used for training, while 10,000 are reserved for testing. The images have only 1 channel (hence, black and white), and each pixel has a value between 0 to 255 (integers). The labels are also integers which indicate the number written in the corresponding images. Often, the class labels are one-hot encoded during preprocessing.

Some simple preprocessing like below is normally done on the dataset:

In [1]:
from sklearn.datasets import fetch_openml
import numpy as np
np.random.seed(100)                         # fix a random seed for reproducibility

# download the dataset (this will take some time)
mnist = fetch_openml('mnist_784', cache=False)
num_train = 60000
image = mnist.data
label = mnist.target.astype('int64')

# normalize pixel values to (-0.5, 0.5) range
image = image / 255 - 0.5

# train test split
train_image, train_label, test_image, test_label = \
        image[:num_train], label[:num_train], image[num_train:], label[num_train:]

# One-hot encoding
train_label, test_label = np.eye(10)[train_label], np.eye(10)[test_label]

## Logistic Regression
Let's define a linear neural network model which has no hidden layers. Since we are solving a classification problem, we need to use the softmax output and the cross entropy loss. Note that this reduces to the logistic regression!

In [2]:
# import files
from model import NeuralNetwork
from layer import FCLayer
from loss import CrossEntropyLoss
from optim import SGD, Adam, RMSProp
from utils import *

nn = NeuralNetwork()
nn.add(FCLayer(train_image.shape[1], train_label.shape[1], initialization='xavier', uniform=True))  # no hidden layers. direct mapping from input images to target labels

In [3]:
# Set loss and link to the model
loss = CrossEntropyLoss()
nn.set_loss(loss)

In [4]:
# Set hyperparamters
lr = 0.001                                  # learning rate
batch_size = 32                             # mini-batch size
epochs = 5                                  # number of epochs

In [5]:
# set optimizer and link to the model
optimizer = Adam(nn.parameters(), lr=lr)
nn.set_optimizer(optimizer)

In [6]:
inds = list(range(train_image.shape[0]))
N = train_image.shape[0]                               # number of training samples

loss_hist = []
for epoch in range(epochs):
    # randomly shuffle the training data at the beginning of each epoch
    inds = np.random.permutation(inds)
    x_train = train_image[inds]
    y_train = train_label[inds]

    loss = 0
    for b in range(0, N, batch_size):
        # get the mini-batch
        x_batch = x_train[b: b + batch_size]
        y_batch = y_train[b: b + batch_size]

        # feed forward
        pred = nn.predict(x_batch)

        # Error
        loss += nn.loss(pred, y_batch) / N

        # Back propagation of errors
        nn.backward(pred, y_batch)

        # Update parameters
        nn.optimizer.step()

    # record loss per epoch
    loss_hist.append(loss)

    print()
    print("Epoch %d/%d\terror=%.5f" % (epoch + 1, epochs, loss), end='\t', flush=True)

    # Test accuracy
    pred = softmax(nn.predict(test_image, mode=False))
    y_pred, y_target = np.argmax(pred, axis=1), np.argmax(test_label, axis=1)
    accuracy = np.mean(y_pred == y_target)
    print("Test accuracy: {:.4f}".format(accuracy), end='')



Epoch 1/5	error=82.90611	Test accuracy: 0.1154
Epoch 2/5	error=225.91645	Test accuracy: 0.1175

  loss = -np.sum([target[i] * np.log(pred[i]) for i in range(len(target))])
  loss = -np.sum([target[i] * np.log(pred[i]) for i in range(len(target))])



Epoch 3/5	error=nan	Test accuracy: 0.1177
Epoch 4/5	error=nan	Test accuracy: 0.1181
Epoch 5/5	error=nan	Test accuracy: 0.1181

## (E8) Your Turn: Non-linear Neural Network
Surprisingly, the model achieved more than 91% test accuracy. However, you can definitely improve the test performance by, for example, introducing nonlinear activation functions, changing the network architecture, adjusting the learning rate, training more epochs, and (or) using a different optimizer. **It's your turn to try different configurations of these!** 

*Experiment with more than 3 configurations of these to get better test performance, and report your trials by summarizing the configurations and performance in a **table**. (You can achieve *at least* 96% accuracy pretty easily.)