# (E8) Classification of MNIST Hand-written Digits
In this exercise, you will be given an example of [MNIST classification](http://yann.lecun.com/exdb/mnist/). 
You should be able to replicate the results given here if you have completed (E2)-(E5) correctly.

It would be best if you have a Python IDE (integrated development environment) such as [PyCharm](https://www.jetbrains.com/pycharm/) and [Anaconda](anaconda.com) is installed because they will make your life easier! If not, you may want to work on the assignment using Google Colab. In any cases, what you need to do is 1) to fill in the blanks in .py files; and 2) to import the files (e.g., layer.py, optim.py, model.py, etc) that you have completed for use. Here are some scenarios how you would go about doing the assignment: 

#### Without Google Colab: Python IDE + Anaconda 
If you have a Python IDE and Anaconda installed, you can do one of the following:
- Edit .py files in the IDE. Then, simply open .ipynb file also in the IDE where you can edit and run codes. 
- Your IDE might not support running .ipynb files. However, since you have installed Anaconda, you can just open this notebook using Jupyter Notebook.

In both of these cases, you can simply import .py files in this .ipynb file:
```python
from model import NeuralNetwork
```
 
#### With Google Colab
- Google Colab has an embedded code editor. So, you could simply upload all .py files to Google Colab and edit the files there. Once you upload the files, double click a file that you want to edit. Please **make sure that you download up-to-date files frequently**, otherwise Google Colab might accidentally restart and all your files might be gone.
- If you feel like the above way is cumbersome, you could instead use any online Python editors for completing .py files (e.g., see [repl.it](https://repl.it/languages/python3)). Also, it's not impossible that you edit the files using any text editors, but they don't show you essential Python grammar information, so you'll be prone to make mistakes in that case. Once you are done editing, you can either upload the files to Colab or follow the instruction below. 
 
- If you have *git clone*d the assignment repository to a directory in your Google Drive (or you have the files stored in the Drive anyway), you can do the following:
```jupyterpython
from google.colab import drive
drive.mount('/content/drive/')          # this will direct you to a link where you can get an authorization key
import sys
sys.path.append('/content/drive/My Drive/your-directory-where-the-python-files-exist')
```
Then, you are good to go. When you change a .py file, make sure it is synced to the drive, then you need to re-run the above lines to get access to the latest version of the file. Note that you should give correct path to *sys.path.append* method.

Now, let's get started!

## Dataset
MNIST dataset has been one of the most frequently used dataset. Among the total of 70,000 (28x28) images, 60,000 are used for training, while 10,000 are reserved for testing. The images have only 1 channel (hence, black and white), and each pixel has a value between 0 to 255 (integers). The labels are also integers which indicate the number written in the corresponding images. Often, the class labels are one-hot encoded during preprocessing.

Some simple preprocessing like below is normally done on the dataset:

In [13]:
from sklearn.datasets import fetch_openml
import numpy as np
np.random.seed(100)                         # fix a random seed for reproducibility

# download the dataset (this will take some time)
mnist = fetch_openml('mnist_784', cache=False)
num_train = 60000
image = mnist.data
label = mnist.target.astype('int64')

# normalize pixel values to (-0.5, 0.5) range
image = image / 255 - 0.5

# train test split
train_image, train_label, test_image, test_label = \
        image[:num_train], label[:num_train], image[num_train:], label[num_train:]

# One-hot encoding
train_label, test_label = np.eye(10)[train_label], np.eye(10)[test_label]

## Logistic Regression
Let's define a linear neural network model which has no hidden layers. Since we are solving a classification problem, we need to use the softmax output and the cross entropy loss. Note that this reduces to the logistic regression!

In [14]:
# import files
from model import NeuralNetwork
from layer import FCLayer
from loss import CrossEntropyLoss
from optim import SGD, Adam, RMSProp
from utils import *

nn = NeuralNetwork()
nn.add(FCLayer(train_image.shape[1], train_label.shape[1], initialization='xavier', uniform=True))  # no hidden layers. direct mapping from input images to target labels

In [15]:
# Set loss and link to the model
loss = CrossEntropyLoss()
nn.set_loss(loss)

In [16]:
# Set hyperparamters
lr = 0.001                                  # learning rate
batch_size = 32                             # mini-batch size
epochs = 5                                  # number of epochs

In [17]:
# set optimizer and link to the model
optimizer = Adam(nn.parameters(), lr=lr)
nn.set_optimizer(optimizer)

In [18]:
inds = list(range(train_image.shape[0]))
N = train_image.shape[0]                               # number of training samples

loss_hist = []
for epoch in range(epochs):
    # randomly shuffle the training data at the beginning of each epoch
    inds = np.random.permutation(inds)
    x_train = train_image[inds]
    y_train = train_label[inds]

    loss = 0
    for b in range(0, N, batch_size):
        # get the mini-batch
        x_batch = x_train[b: b + batch_size]
        y_batch = y_train[b: b + batch_size]

        # feed forward
        pred = nn.predict(x_batch)

        # Error
        loss += nn.loss(pred, y_batch) / N

        # Back propagation of errors
        nn.backward(pred, y_batch)

        # Update parameters
        nn.optimizer.step()

    # record loss per epoch
    loss_hist.append(loss)

    print()
    print("Epoch %d/%d\terror=%.5f" % (epoch + 1, epochs, loss), end='\t', flush=True)

    # Test accuracy
    pred = softmax(nn.predict(test_image, mode=False))
    y_pred, y_target = np.argmax(pred, axis=1), np.argmax(test_label, axis=1)
    accuracy = np.mean(y_pred == y_target)
    print("Test accuracy: {:.4f}".format(accuracy), end='')



Epoch 1/5	error=0.01474	
Epoch 1/5	error=0.01474	Test accuracy: 0.8972Test accuracy: 0.8972
Epoch 2/5	error=0.00996	
Epoch 2/5	error=0.00996	Test accuracy: 0.9128Test accuracy: 0.9128
Epoch 3/5	error=0.00990	
Epoch 3/5	error=0.00990	Test accuracy: 0.9146Test accuracy: 0.9146
Epoch 4/5	error=0.00921	
Epoch 4/5	error=0.00921	Test accuracy: 0.9168Test accuracy: 0.9168
Epoch 5/5	error=0.01014	
Epoch 5/5	error=0.01014	Test accuracy: 0.9151Test accuracy: 0.9151

## (E8) Your Turn: Non-linear Neural Network
Surprisingly, the model achieved more than 91% test accuracy. However, you can definitely improve the test performance by, for example, introducing nonlinear activation functions, changing the network architecture, adjusting the learning rate, training more epochs, and (or) using a different optimizer. **It's your turn to try different configurations of these!** 

*Experiment with more than 3 configurations of these to get better test performance, and report your trials by summarizing the configurations and performance in a **table**. (You can achieve *at least* 96% accuracy pretty easily.)

### Increasing epochs

In [41]:
from activation import Activation
from utils import *

nn = NeuralNetwork()

nn.add(FCLayer(train_image.shape[1], train_label.shape[1], initialization='xavier', uniform=True))  # no hidden layers. direct mapping from input images to target labels

# Set loss and link to the model
loss = CrossEntropyLoss()
nn.set_loss(loss)

# Set hyperparamters
lr = 0.001                                  # learning rate
batch_size = 32                             # mini-batch size
epochs = 10                                  # number of epochs

# set optimizer and link to the model
optimizer = Adam(nn.parameters(), lr=lr)
nn.set_optimizer(optimizer)

In [42]:
inds = list(range(train_image.shape[0]))
N = train_image.shape[0]                               # number of training samples

loss_hist_1 = []
for epoch in range(epochs):
    # randomly shuffle the training data at the beginning of each epoch
    inds = np.random.permutation(inds)
    x_train = train_image[inds]
    y_train = train_label[inds]

    loss = 0
    for b in range(0, N, batch_size):
        # get the mini-batch
        x_batch = x_train[b: b + batch_size]
        y_batch = y_train[b: b + batch_size]

        # feed forward
        pred = nn.predict(x_batch)

        # Error
        loss += nn.loss(pred, y_batch) / N

        # Back propagation of errors
        nn.backward(pred, y_batch)

        # Update parameters
        nn.optimizer.step()

    # record loss per epoch
    loss_hist_1.append(loss)

    print()
    print("Epoch %d/%d\terror=%.5f" % (epoch + 1, epochs, loss), end='\t', flush=True)

    # Test accuracy
    pred = softmax(nn.predict(test_image, mode=False))
    y_pred, y_target = np.argmax(pred, axis=1), np.argmax(test_label, axis=1)
    accuracy = np.mean(y_pred == y_target)
    print("Test accuracy: {:.4f}".format(accuracy), end='')




Epoch 1/10	error=0.01592	
Epoch 1/10	error=0.01592	Test accuracy: 0.9083Test accuracy: 0.9083
Epoch 2/10	error=0.01015	
Epoch 2/10	error=0.01015	Test accuracy: 0.9078Test accuracy: 0.9078
Epoch 3/10	error=0.00947	
Epoch 3/10	error=0.00947	Test accuracy: 0.9141Test accuracy: 0.9141
Epoch 4/10	error=0.00902	
Epoch 4/10	error=0.00902	Test accuracy: 0.9203Test accuracy: 0.9203
Epoch 5/10	error=0.00905	
Epoch 5/10	error=0.00905	Test accuracy: 0.9142Test accuracy: 0.9142
Epoch 6/10	error=0.00839	
Epoch 6/10	error=0.00839	Test accuracy: 0.9079Test accuracy: 0.9079
Epoch 7/10	error=0.00932	
Epoch 7/10	error=0.00932	Test accuracy: 0.9177Test accuracy: 0.9177
Epoch 8/10	error=0.00865	
Epoch 8/10	error=0.00865	Test accuracy: 0.9164Test accuracy: 0.9164
Epoch 9/10	error=0.00872	
Epoch 9/10	error=0.00872	Test accuracy: 0.9175Test accuracy: 0.9175
Epoch 10/10	error=0.00876	
Epoch 10/10	error=0.00876	Test accuracy: 0.9204Test accuracy: 0.9204

### Adding activation function

In [29]:
from activation import Activation
from utils import *

nn = NeuralNetwork()

nn.add(FCLayer(train_image.shape[1], train_label.shape[1], initialization='xavier', uniform=True))  # no hidden layers. direct mapping from input images to target labels
nn.add(Activation(relu, relu_prime))

# Set loss and link to the model
loss = CrossEntropyLoss()
nn.set_loss(loss)

# Set hyperparamters
lr = 0.001                                  # learning rate
batch_size = 32                             # mini-batch size
epochs = 10                                  # number of epochs

# set optimizer and link to the model
optimizer = Adam(nn.parameters(), lr=lr)
nn.set_optimizer(optimizer)

In [30]:
inds = list(range(train_image.shape[0]))
N = train_image.shape[0]                               # number of training samples

loss_hist_1 = []
for epoch in range(epochs):
    # randomly shuffle the training data at the beginning of each epoch
    inds = np.random.permutation(inds)
    x_train = train_image[inds]
    y_train = train_label[inds]

    loss = 0
    for b in range(0, N, batch_size):
        # get the mini-batch
        x_batch = x_train[b: b + batch_size]
        y_batch = y_train[b: b + batch_size]

        # feed forward
        pred = nn.predict(x_batch)

        # Error
        loss += nn.loss(pred, y_batch) / N

        # Back propagation of errors
        nn.backward(pred, y_batch)

        # Update parameters
        nn.optimizer.step()

    # record loss per epoch
    loss_hist_1.append(loss)

    print()
    print("Epoch %d/%d\terror=%.5f" % (epoch + 1, epochs, loss), end='\t', flush=True)

    # Test accuracy
    pred = softmax(nn.predict(test_image, mode=False))
    y_pred, y_target = np.argmax(pred, axis=1), np.argmax(test_label, axis=1)
    accuracy = np.mean(y_pred == y_target)
    print("Test accuracy: {:.4f}".format(accuracy), end='')





Epoch 1/10	error=0.05199	
Epoch 1/10	error=0.05199	Test accuracy: 0.4254Test accuracy: 0.4254
Epoch 2/10	error=0.04843	
Epoch 2/10	error=0.04843	Test accuracy: 0.4347Test accuracy: 0.4347
Epoch 3/10	error=0.04989	
Epoch 3/10	error=0.04989	Test accuracy: 0.4407Test accuracy: 0.4407
Epoch 4/10	error=0.04896	
Epoch 4/10	error=0.04896	Test accuracy: 0.4397Test accuracy: 0.4397
Epoch 5/10	error=0.04811	
Epoch 5/10	error=0.04811	Test accuracy: 0.4374Test accuracy: 0.4374
Epoch 6/10	error=0.04900	
Epoch 6/10	error=0.04900	Test accuracy: 0.4445Test accuracy: 0.4445
Epoch 7/10	error=0.04808	
Epoch 7/10	error=0.04808	Test accuracy: 0.4475Test accuracy: 0.4475
Epoch 8/10	error=0.04779	
Epoch 8/10	error=0.04779	Test accuracy: 0.4481Test accuracy: 0.4481
Epoch 9/10	error=0.04826	
Epoch 9/10	error=0.04826	Test accuracy: 0.4494Test accuracy: 0.4494
Epoch 10/10	error=0.04803	
Epoch 10/10	error=0.04803	Test accuracy: 0.4477Test accuracy: 0.4477

### Adding Tanh as activation function

In [31]:
from activation import Activation
from utils import *

nn = NeuralNetwork()

nn.add(FCLayer(train_image.shape[1], train_label.shape[1], initialization='xavier', uniform=True))  # no hidden layers. direct mapping from input images to target labels
nn.add(Activation(tanh, tanh_prime))

# Set loss and link to the model
loss = CrossEntropyLoss()
nn.set_loss(loss)

# Set hyperparamters
lr = 0.001                                  # learning rate
batch_size = 32                             # mini-batch size
epochs = 10                                  # number of epochs

# set optimizer and link to the model
optimizer = Adam(nn.parameters(), lr=lr)
nn.set_optimizer(optimizer)

In [32]:
inds = list(range(train_image.shape[0]))
N = train_image.shape[0]                               # number of training samples

loss_hist_3 = []
for epoch in range(epochs):
    # randomly shuffle the training data at the beginning of each epoch
    inds = np.random.permutation(inds)
    x_train = train_image[inds]
    y_train = train_label[inds]

    loss = 0
    for b in range(0, N, batch_size):
        # get the mini-batch
        x_batch = x_train[b: b + batch_size]
        y_batch = y_train[b: b + batch_size]

        # feed forward
        pred = nn.predict(x_batch)

        # Error
        loss += nn.loss(pred, y_batch) / N

        # Back propagation of errors
        nn.backward(pred, y_batch)

        # Update parameters
        nn.optimizer.step()

    # record loss per epoch
    loss_hist_3.append(loss)

    print()
    print("Epoch %d/%d\terror=%.5f" % (epoch + 1, epochs, loss), end='\t', flush=True)

    # Test accuracy
    pred = softmax(nn.predict(test_image, mode=False))
    y_pred, y_target = np.argmax(pred, axis=1), np.argmax(test_label, axis=1)
    accuracy = np.mean(y_pred == y_target)
    print("Test accuracy: {:.4f}".format(accuracy), end='')






Epoch 1/10	error=0.03523	
Epoch 1/10	error=0.03523	Test accuracy: 0.9009Test accuracy: 0.9009
Epoch 2/10	error=0.03193	
Epoch 2/10	error=0.03193	Test accuracy: 0.9005Test accuracy: 0.9005
Epoch 3/10	error=0.03166	
Epoch 3/10	error=0.03166	Test accuracy: 0.9038Test accuracy: 0.9038
Epoch 4/10	error=0.03205	
Epoch 4/10	error=0.03205	Test accuracy: 0.9088Test accuracy: 0.9088
Epoch 5/10	error=0.03111	
Epoch 5/10	error=0.03111	Test accuracy: 0.9054Test accuracy: 0.9054
Epoch 6/10	error=0.03119	
Epoch 6/10	error=0.03119	Test accuracy: 0.9118Test accuracy: 0.9118
Epoch 7/10	error=0.03079	
Epoch 7/10	error=0.03079	Test accuracy: 0.9144Test accuracy: 0.9144
Epoch 8/10	error=0.03083	
Epoch 8/10	error=0.03083	Test accuracy: 0.9136Test accuracy: 0.9136
Epoch 9/10	error=0.03065	
Epoch 9/10	error=0.03065	Test accuracy: 0.9087Test accuracy: 0.9087
Epoch 10/10	error=0.03056	
Epoch 10/10	error=0.03056	Test accuracy: 0.9088Test accuracy: 0.9088

### Increasing Learning rate

In [35]:
from activation import Activation
from utils import *

nn = NeuralNetwork()

nn.add(FCLayer(train_image.shape[1], train_label.shape[1], initialization='xavier', uniform=True))  # no hidden layers. direct mapping from input images to target labels

# Set loss and link to the model
loss = CrossEntropyLoss()
nn.set_loss(loss)

# Set hyperparamters
lr = 0.0001                                  # learning rate
batch_size = 32                             # mini-batch size
epochs = 20                                  # number of epochs

# set optimizer and link to the model
optimizer = Adam(nn.parameters(), lr=lr)
nn.set_optimizer(optimizer)

In [36]:
inds = list(range(train_image.shape[0]))
N = train_image.shape[0]                               # number of training samples

loss_hist_4 = []
for epoch in range(epochs):
    # randomly shuffle the training data at the beginning of each epoch
    inds = np.random.permutation(inds)
    x_train = train_image[inds]
    y_train = train_label[inds]

    loss = 0
    for b in range(0, N, batch_size):
        # get the mini-batch
        x_batch = x_train[b: b + batch_size]
        y_batch = y_train[b: b + batch_size]

        # feed forward
        pred = nn.predict(x_batch)

        # Error
        loss += nn.loss(pred, y_batch) / N

        # Back propagation of errors
        nn.backward(pred, y_batch)

        # Update parameters
        nn.optimizer.step()

    # record loss per epoch
    loss_hist_4.append(loss)

    print()
    print("Epoch %d/%d\terror=%.5f" % (epoch + 1, epochs, loss), end='\t', flush=True)

    # Test accuracy
    pred = softmax(nn.predict(test_image, mode=False))
    y_pred, y_target = np.argmax(pred, axis=1), np.argmax(test_label, axis=1)
    accuracy = np.mean(y_pred == y_target)
    print("Test accuracy: {:.4f}".format(accuracy), end='')







Epoch 1/20	error=0.03436	
Epoch 1/20	error=0.03436	Test accuracy: 0.8543Test accuracy: 0.8543
Epoch 2/20	error=0.01695	
Epoch 2/20	error=0.01695	Test accuracy: 0.8822Test accuracy: 0.8822
Epoch 3/20	error=0.01476	
Epoch 3/20	error=0.01476	Test accuracy: 0.8935Test accuracy: 0.8935
Epoch 4/20	error=0.01253	
Epoch 4/20	error=0.01253	Test accuracy: 0.8993Test accuracy: 0.8993
Epoch 5/20	error=0.01170	
Epoch 5/20	error=0.01170	Test accuracy: 0.9046Test accuracy: 0.9046
Epoch 6/20	error=0.01163	
Epoch 6/20	error=0.01163	Test accuracy: 0.9047Test accuracy: 0.9047
Epoch 7/20	error=0.01098	
Epoch 7/20	error=0.01098	Test accuracy: 0.9079Test accuracy: 0.9079
Epoch 8/20	error=0.01074	
Epoch 8/20	error=0.01074	Test accuracy: 0.9095Test accuracy: 0.9095
Epoch 9/20	error=0.01025	
Epoch 9/20	error=0.01025	Test accuracy: 0.9111Test accuracy: 0.9111
Epoch 10/20	error=0.01018	
Epoch 10/20	error=0.01018	Test accuracy: 0.9124Test accuracy: 0.9124
Epoch 11/20	error=0.00965	
Epoch 11/20	error=0.00965	Test

### Adding Softmax Layer

In [46]:
from activation import Activation
from activation import Dropout

from utils import *

nn = NeuralNetwork()

nn.add(FCLayer(train_image.shape[1], train_label.shape[1], initialization='xavier', uniform=True))  # no hidden layers. direct mapping from input images to target labels
nn.add(Activation(sigmoid, sigmoid_prime))

# Set loss and link to the model
loss = CrossEntropyLoss()
nn.set_loss(loss)

# Set hyperparamters
lr = 0.0001                                  # learning rate
batch_size = 32                             # mini-batch size
epochs = 20                                  # number of epochs

# set optimizer and link to the model
optimizer = Adam(nn.parameters(), lr=lr)
nn.set_optimizer(optimizer)

In [47]:
inds = list(range(train_image.shape[0]))
N = train_image.shape[0]                               # number of training samples

loss_hist_4 = []
for epoch in range(epochs):
    # randomly shuffle the training data at the beginning of each epoch
    inds = np.random.permutation(inds)
    x_train = train_image[inds]
    y_train = train_label[inds]

    loss = 0
    for b in range(0, N, batch_size):
        # get the mini-batch
        x_batch = x_train[b: b + batch_size]
        y_batch = y_train[b: b + batch_size]

        # feed forward
        pred = nn.predict(x_batch)

        # Error
        loss += nn.loss(pred, y_batch) / N

        # Back propagation of errors
        nn.backward(pred, y_batch)

        # Update parameters
        nn.optimizer.step()

    # record loss per epoch
    loss_hist_4.append(loss)

    print()
    print("Epoch %d/%d\terror=%.5f" % (epoch + 1, epochs, loss), end='\t', flush=True)

    # Test accuracy
    pred = softmax(nn.predict(test_image, mode=False))
    y_pred, y_target = np.argmax(pred, axis=1), np.argmax(test_label, axis=1)
    accuracy = np.mean(y_pred == y_target)
    print("Test accuracy: {:.4f}".format(accuracy), end='')








Epoch 1/20	error=0.06086	
Epoch 1/20	error=0.06086	Test accuracy: 0.8227Test accuracy: 0.8227
Epoch 2/20	error=0.05428	
Epoch 2/20	error=0.05428	Test accuracy: 0.8547Test accuracy: 0.8547
Epoch 3/20	error=0.05229	
Epoch 3/20	error=0.05229	Test accuracy: 0.8709Test accuracy: 0.8709
Epoch 4/20	error=0.05152	
Epoch 4/20	error=0.05152	Test accuracy: 0.8787Test accuracy: 0.8787
Epoch 5/20	error=0.05156	
Epoch 5/20	error=0.05156	Test accuracy: 0.8840Test accuracy: 0.8840
Epoch 6/20	error=0.05093	
Epoch 6/20	error=0.05093	Test accuracy: 0.8873Test accuracy: 0.8873
Epoch 7/20	error=0.05060	
Epoch 7/20	error=0.05060	Test accuracy: 0.8889Test accuracy: 0.8889
Epoch 8/20	error=0.05060	
Epoch 8/20	error=0.05060	Test accuracy: 0.8910Test accuracy: 0.8910
Epoch 9/20	error=0.05020	
Epoch 9/20	error=0.05020	Test accuracy: 0.8948Test accuracy: 0.8948
Epoch 10/20	error=0.04989	
Epoch 10/20	error=0.04989	Test accuracy: 0.8939Test accuracy: 0.8939
Epoch 11/20	error=0.05012	
Epoch 11/20	error=0.05012	Test

### Changing optimizer 

In [50]:
from activation import Activation
from activation import Dropout

from utils import *

nn = NeuralNetwork()

nn.add(FCLayer(train_image.shape[1], train_label.shape[1], initialization='xavier', uniform=True))  # no hidden layers. direct mapping from input images to target labels

# Set loss and link to the model
loss = CrossEntropyLoss()
nn.set_loss(loss)

# Set hyperparamters
lr = 0.0001                                  # learning rate
batch_size = 32                             # mini-batch size
epochs = 10                                  # number of epochs

# set optimizer and link to the model
optimizer = SGD(nn.parameters(), lr=lr, momentum=True)
nn.set_optimizer(optimizer)

In [51]:
inds = list(range(train_image.shape[0]))
N = train_image.shape[0]                               # number of training samples

loss_hist_5 = []
for epoch in range(epochs):
    # randomly shuffle the training data at the beginning of each epoch
    inds = np.random.permutation(inds)
    x_train = train_image[inds]
    y_train = train_label[inds]

    loss = 0
    for b in range(0, N, batch_size):
        # get the mini-batch
        x_batch = x_train[b: b + batch_size]
        y_batch = y_train[b: b + batch_size]

        # feed forward
        pred = nn.predict(x_batch)

        # Error
        loss += nn.loss(pred, y_batch) / N

        # Back propagation of errors
        nn.backward(pred, y_batch)

        # Update parameters
        nn.optimizer.step()

    # record loss per epoch
    loss_hist_5.append(loss)

    print()
    print("Epoch %d/%d\terror=%.5f" % (epoch + 1, epochs, loss), end='\t', flush=True)

    # Test accuracy
    pred = softmax(nn.predict(test_image, mode=False))
    y_pred, y_target = np.argmax(pred, axis=1), np.argmax(test_label, axis=1)
    accuracy = np.mean(y_pred == y_target)
    print("Test accuracy: {:.4f}".format(accuracy), end='')









Epoch 1/10	error=0.01653	
Epoch 1/10	error=0.01653	Test accuracy: 0.8949Test accuracy: 0.8949
Epoch 2/10	error=0.01148	
Epoch 2/10	error=0.01148	Test accuracy: 0.9111Test accuracy: 0.9111
Epoch 3/10	error=0.01085	
Epoch 3/10	error=0.01085	Test accuracy: 0.9132Test accuracy: 0.9132
Epoch 4/10	error=0.00992	
Epoch 4/10	error=0.00992	Test accuracy: 0.9139Test accuracy: 0.9139
Epoch 5/10	error=0.01034	
Epoch 5/10	error=0.01034	Test accuracy: 0.9192Test accuracy: 0.9192
Epoch 6/10	error=0.00922	
Epoch 6/10	error=0.00922	Test accuracy: 0.9185Test accuracy: 0.9185
Epoch 7/10	error=0.00925	
Epoch 7/10	error=0.00925	Test accuracy: 0.9165Test accuracy: 0.9165
Epoch 8/10	error=0.00827	
Epoch 8/10	error=0.00827	Test accuracy: 0.9199Test accuracy: 0.9199
Epoch 9/10	error=0.00828	
Epoch 9/10	error=0.00828	Test accuracy: 0.9174Test accuracy: 0.9174
Epoch 10/10	error=0.00876	
Epoch 10/10	error=0.00876	Test accuracy: 0.9198Test accuracy: 0.9198

### Changing batch size

In [53]:
from activation import Activation
from activation import Dropout

from utils import *

nn = NeuralNetwork()

nn.add(FCLayer(train_image.shape[1], train_label.shape[1], initialization='xavier', uniform=True))  # no hidden layers. direct mapping from input images to target labels

# Set loss and link to the model
loss = CrossEntropyLoss()
nn.set_loss(loss)

# Set hyperparamters
lr = 0.001                                  # learning rate
batch_size = 16                             # mini-batch size
epochs = 10                                  # number of epochs

# set optimizer and link to the model
optimizer = SGD(nn.parameters(), lr=lr, momentum=True)
nn.set_optimizer(optimizer)

In [54]:
inds = list(range(train_image.shape[0]))
N = train_image.shape[0]                               # number of training samples

loss_hist_6 = []
for epoch in range(epochs):
    # randomly shuffle the training data at the beginning of each epoch
    inds = np.random.permutation(inds)
    x_train = train_image[inds]
    y_train = train_label[inds]

    loss = 0
    for b in range(0, N, batch_size):
        # get the mini-batch
        x_batch = x_train[b: b + batch_size]
        y_batch = y_train[b: b + batch_size]

        # feed forward
        pred = nn.predict(x_batch)

        # Error
        loss += nn.loss(pred, y_batch) / N

        # Back propagation of errors
        nn.backward(pred, y_batch)

        # Update parameters
        nn.optimizer.step()

    # record loss per epoch
    loss_hist_6.append(loss)

    print()
    print("Epoch %d/%d\terror=%.5f" % (epoch + 1, epochs, loss), end='\t', flush=True)

    # Test accuracy
    pred = softmax(nn.predict(test_image, mode=False))
    y_pred, y_target = np.argmax(pred, axis=1), np.argmax(test_label, axis=1)
    accuracy = np.mean(y_pred == y_target)
    print("Test accuracy: {:.4f}".format(accuracy), end='')










Epoch 1/10	error=0.02338	
Epoch 1/10	error=0.02338	Test accuracy: 0.8944Test accuracy: 0.8944
Epoch 2/10	error=0.02076	
Epoch 2/10	error=0.02076	Test accuracy: 0.9075Test accuracy: 0.9075
Epoch 3/10	error=0.02029	
Epoch 3/10	error=0.02029	Test accuracy: 0.9061Test accuracy: 0.9061
Epoch 4/10	error=0.01921	
Epoch 4/10	error=0.01921	Test accuracy: 0.9111Test accuracy: 0.9111
Epoch 5/10	error=0.01853	
Epoch 5/10	error=0.01853	Test accuracy: 0.9079Test accuracy: 0.9079
Epoch 6/10	error=0.01947	
Epoch 6/10	error=0.01947	Test accuracy: 0.9127Test accuracy: 0.9127
Epoch 7/10	error=0.01867	
Epoch 7/10	error=0.01867	Test accuracy: 0.9088Test accuracy: 0.9088
Epoch 8/10	error=0.01774	
Epoch 8/10	error=0.01774	Test accuracy: 0.9094Test accuracy: 0.9094
Epoch 9/10	error=0.01804	
Epoch 9/10	error=0.01804	Test accuracy: 0.9095Test accuracy: 0.9095
Epoch 10/10	error=0.02044	
Epoch 10/10	error=0.02044	Test accuracy: 0.9109Test accuracy: 0.9109

### Changing layer structure

In [55]:
from activation import Activation
from activation import Dropout

from utils import *

nn = NeuralNetwork()
num_layers = 50

nn.add(FCLayer(train_image.shape[1], num_layers, initialization='xavier', uniform=True))  # no hidden layers. direct mapping from input images to target labels
nn.add(Activation(tanh, tanh_prime))
nn.add(FCLayer(num_layers, train_label.shape[1], initialization='xavier', uniform=True))  # no hidden layers. direct mapping from input images to target labels


# Set loss and link to the model
loss = CrossEntropyLoss()
nn.set_loss(loss)

# Set hyperparamters
lr = 0.001                                  # learning rate
batch_size = 16                             # mini-batch size
epochs = 10                                  # number of epochs

# set optimizer and link to the model
optimizer = SGD(nn.parameters(), lr=lr, momentum=True)
nn.set_optimizer(optimizer)

In [None]:
inds = list(range(train_image.shape[0]))
N = train_image.shape[0]                               # number of training samples

loss_hist_6 = []
for epoch in range(epochs):
    # randomly shuffle the training data at the beginning of each epoch
    inds = np.random.permutation(inds)
    x_train = train_image[inds]
    y_train = train_label[inds]

    loss = 0
    for b in range(0, N, batch_size):
        # get the mini-batch
        x_batch = x_train[b: b + batch_size]
        y_batch = y_train[b: b + batch_size]

        # feed forward
        pred = nn.predict(x_batch)

        # Error
        loss += nn.loss(pred, y_batch) / N

        # Back propagation of errors
        nn.backward(pred, y_batch)

        # Update parameters
        nn.optimizer.step()

    # record loss per epoch
    loss_hist_6.append(loss)

    print()
    print("Epoch %d/%d\terror=%.5f" % (epoch + 1, epochs, loss), end='\t', flush=True)

    # Test accuracy
    pred = softmax(nn.predict(test_image, mode=False))
    y_pred, y_target = np.argmax(pred, axis=1), np.argmax(test_label, axis=1)
    accuracy = np.mean(y_pred == y_target)
    print("Test accuracy: {:.4f}".format(accuracy), end='')











Epoch 1/10	error=0.02047	
Epoch 1/10	error=0.02047	Test accuracy: 0.9274Test accuracy: 0.9274
Epoch 2/10	error=0.01266	
Epoch 2/10	error=0.01266	Test accuracy: 0.9399Test accuracy: 0.9399
Epoch 3/10	error=0.01020	
Epoch 3/10	error=0.01020	Test accuracy: 0.9585Test accuracy: 0.9585
Epoch 4/10	error=0.00912	
Epoch 4/10	error=0.00912	Test accuracy: 0.9515Test accuracy: 0.9515
Epoch 5/10	error=0.00880	
Epoch 5/10	error=0.00880	Test accuracy: 0.9601Test accuracy: 0.9601
Epoch 6/10	error=0.00661	
Epoch 6/10	error=0.00661	Test accuracy: 0.9600Test accuracy: 0.9600
Epoch 7/10	error=0.00710	
Epoch 7/10	error=0.00710	Test accuracy: 0.9636Test accuracy: 0.9636

Layers	Optimizer	Epochs	Learning Rate	Batch Size	Accuracy
FCN	Adam	10	0.001	32	0.9213
FCN, Relu 	Adam	10	0.001	32	0.4477
FCN, Tanh	Adam	10	0.001	32	0.9088
FCN	Adam	10	0.0001	32	0.9129
FCN	Adam	20	0.0001	32	0.9175
FCN, Sigmoid	Adam	20	0.0001	32	0.9033
FCN	SGD	10	0.0001	32	0.9198
FCN, tanh, FCN	SGD	10	0.001	16	0.9636

|	Layers	|	Optimizer	|	Epochs	|	Learning Rate	|	Batch Size	|	Accuracy
|	---------------	|	---------------	|	---------------	|	---------------	|	---------------	|	---------------
|	FCN	|	Adam	|	10	|	0.001	|	32	|	0.9213
|	FCN, Relu 	|	Adam	|	10	|	0.001	|	32	|	0.4477
|	FCN, Tanh	|	Adam	|	10	|	0.001	|	32	|	0.9088
|	FCN	|	Adam	|	10	|	0.0001	|	32	|	0.9129
|	FCN	|	Adam	|	20	|	0.0001	|	32	|	0.9175
|	FCN, Sigmoid	|	Adam	|	20	|	0.0001	|	32	|	0.9033
|	FCN	|	SGD	|	10	|	0.0001	|	32	|	0.9198
|	FCN, tanh, FCN	|	SGD	|	10	|	0.001	|	16	|	0.9636