# (Optional) Colab Setup
If you aren't using Colab, you can delete the following code cell. This is just to help students with mounting to Google Drive to access the other .py files and downloading the data, which is a little trickier on Colab than on your local machine using Jupyter. 

In [1]:
# you will be prompted with a window asking to grant permissions
# from google.colab import drive
# drive.mount("/content/drive")

In [2]:
# fill in the path in your Google Drive in the string below. Note: do not escape slashes or spaces
# import os
# datadir = "/content/assignment1"
# if not os.path.exists(datadir):
#   !ln -s "/content/drive/My Drive/YOUR PATH HERE/assignment1/" $datadir
# os.chdir(datadir)
# !pwd

In [3]:
# downloading Fashion-MNIST
# import os
# os.chdir(os.path.join(datadir,"fashion-mnist/"))
# !chmod +x ./get_data.sh
# !./get_data.sh
# os.chdir(datadir)

# Imports

In [4]:
import random
import numpy as np
from data_process import get_FASHION_data, get_RICE_data
from scipy.spatial import distance
from models import Perceptron, SVM, Softmax, Logistic
from kaggle_submission import output_submission_csv
%matplotlib inline
np.random.seed(441)
# For auto-reloading external modules
# See http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

# Loading Fashion-MNIST

In the following cells we determine the number of images for each split and load the images.
<br /> 
TRAIN_IMAGES + VAL_IMAGES = (0, 60000]
, TEST_IMAGES = 10000

In [5]:
# You can change these numbers for experimentation
# For submission we will use the default values 
TRAIN_IMAGES = 50000
VAL_IMAGES = 10000
normalize = True

In [6]:
data = get_FASHION_data(TRAIN_IMAGES, VAL_IMAGES, normalize=normalize)
X_train_fashion, y_train_fashion = data['X_train'], data['y_train']
X_val_fashion, y_val_fashion = data['X_val'], data['y_val']
X_test_fashion, y_test_fashion = data['X_test'], data['y_test']
n_class_fashion = len(np.unique(y_test_fashion))

# Loading Rice

In [7]:
# loads train / test / val splits of 80%, 20%, 20% 
data = get_RICE_data()
X_train_RICE, y_train_RICE = data['X_train'], data['y_train']
X_val_RICE, y_val_RICE = data['X_val'], data['y_val']
X_test_RICE, y_test_RICE = data['X_test'], data['y_test']
n_class_RICE = len(np.unique(y_test_RICE))

print("Number of train samples: ", X_train_RICE.shape[0])
print("Number of val samples: ", X_val_RICE.shape[0])
print("Number of test samples: ", X_test_RICE.shape[0])

Number of train samples:  10911
Number of val samples:  3637
Number of test samples:  3637


### Get Accuracy

This function computes how well your model performs using accuracy as a metric.

In [8]:
def get_acc(pred, y_test):
    return np.sum(y_test == pred) / len(y_test) * 100

# Perceptron

Perceptron has 2 hyperparameters that you can experiment with:
### Learning rate
The learning rate controls how much we change the current weights of the classifier during each update. We set it at a default value of 0.5, but you should experiment with different values. Here is a guide to help you find a right learning rate: 
- Try values ranging from 5.0 to 0.0005 to see the impact on model accuracy. 
- If the accuracy fluctuates a lot or diverges, the learning rate is too high. Try decreasing it by a factor of 10 (e.g. from 0.5 to 0.05). 
- If the accuracy is changing very slowly, the learning rate may be too low. Try increasing it by a factor of 10.
- You can also try adding a learning rate decay to slowly reduce the learning rate over each training epoch. For example, multiply the learning rate by 0.95 after each epoch.
- Plot training and validation accuracy over epochs for different learning rates. This will help you visualize the impact of the learning rate.
- [Here](https://towardsdatascience.com/https-medium-com-dashingaditya-rakhecha-understanding-learning-rate-dd5da26bb6de) is a detailed guide to learning rate.

### Number of Epochs
An epoch is a complete iterative pass over all of the data in the dataset. During an epoch we predict a label using the classifier and then update the weights of the classifier according to the perceptron update rule for each sample in the training set. You should try different values for the number of training epochs and report your results.

You will implement the Perceptron classifier in the **models/perceptron.py**

The following code: 
- Creates an instance of the Perceptron classifier class 
- The train function of the Perceptron class is trained on the training data
- We use the predict function to find the training accuracy as well as the testing accuracy


## Train Perceptron on Fashion-MNIST

In [70]:
lr = 0.05
n_epochs = 50

percept_fashion = Perceptron(n_class_fashion, lr, n_epochs)
percept_fashion.train(X_train_fashion, y_train_fashion)

Epoch 0, error: 0.37646, lr: 0.0485
Epoch 1, error: 0.32118, lr: 0.047045000000000003
Epoch 2, error: 0.31028, lr: 0.045633650000000005
Epoch 3, error: 0.30388, lr: 0.0442646405
Epoch 4, error: 0.29798, lr: 0.042936701285
Epoch 5, error: 0.29342, lr: 0.04164860024644999
Epoch 6, error: 0.2898, lr: 0.040399142239056496
Epoch 7, error: 0.28584, lr: 0.0391871679718848
Epoch 8, error: 0.28362, lr: 0.03801155293272825
Epoch 9, error: 0.28184, lr: 0.0368712063447464
Epoch 10, error: 0.27844, lr: 0.035765070154404006
Epoch 11, error: 0.27762, lr: 0.03469211804977188
Epoch 12, error: 0.2765, lr: 0.033651354508278726
Epoch 13, error: 0.27254, lr: 0.03264181387303036
Epoch 14, error: 0.27084, lr: 0.03166255945683945
Epoch 15, error: 0.27098, lr: 0.030712682673134265
Epoch 16, error: 0.26722, lr: 0.029791302192940235
Epoch 17, error: 0.26642, lr: 0.028897563127152026
Epoch 18, error: 0.26778, lr: 0.028030636233337465
Epoch 19, error: 0.26708, lr: 0.02718971714633734
Epoch 20, error: 0.26462, lr: 

In [73]:
pred_percept = percept_fashion.predict(X_train_fashion)
print('The training accuracy is given by: %f' % (get_acc(pred_percept, y_train_fashion)))

The training accuracy is given by: 80.110000


### Validate Perceptron on Fashion-MNIST

In [74]:
pred_percept = percept_fashion.predict(X_val_fashion)
print('The validation accuracy is given by: %f' % (get_acc(pred_percept, y_val_fashion)))

The validation accuracy is given by: 77.760000


### Test Perceptron on Fashion-MNIST

In [75]:
pred_percept = percept_fashion.predict(X_test_fashion)
print('The testing accuracy is given by: %f' % (get_acc(pred_percept, y_test_fashion)))

The testing accuracy is given by: 76.580000


### Perceptron_Fashion-MNIST Kaggle Submission

Once you are satisfied with your solution and test accuracy, output a file to submit your test set predictions to the Kaggle for Assignment 1 Fashion-MNIST. Use the following code to do so:

In [26]:
# output_submission_csv('kaggle/perceptron_submission_fashion.csv', percept_fashion.predict(X_test_fashion))

## Train Perceptron on Rice

In [64]:
lr = 0.0001
n_epochs = 15

percept_RICE = Perceptron(n_class_RICE, lr, n_epochs)
percept_RICE.train(X_train_RICE, y_train_RICE)

Epoch 0, error: 0.2803592704610027, lr: 0.0001
Epoch 1, error: 0.26047108422692694, lr: 0.0001
Epoch 2, error: 0.23719182476399964, lr: 0.0001
Epoch 3, error: 0.18458436440289616, lr: 0.0001
Epoch 4, error: 0.14737420951333516, lr: 0.0001
Epoch 5, error: 0.0959582073137201, lr: 0.0001
Epoch 6, error: 0.08761800018330125, lr: 0.0001
Epoch 7, error: 0.08816790395014205, lr: 0.0001
Epoch 8, error: 0.05902300430757951, lr: 0.0001
Epoch 9, error: 0.057098341123636695, lr: 0.0001
Epoch 10, error: 0.04967464027128586, lr: 0.0001
Epoch 11, error: 0.047200073320502245, lr: 0.0001
Epoch 12, error: 0.03702685363394739, lr: 0.0001
Epoch 13, error: 0.038951516817890205, lr: 0.0001
Epoch 14, error: 0.032535972871414166, lr: 0.0001


In [65]:
pred_percept = percept_RICE.predict(X_train_RICE)
print('The training accuracy is given by: %f' % (get_acc(pred_percept, y_train_RICE)))

The training accuracy is given by: 99.193474


### Validate Perceptron on Rice

In [66]:
pred_percept = percept_RICE.predict(X_val_RICE)
print('The validation accuracy is given by: %f' % (get_acc(pred_percept, y_val_RICE)))

The validation accuracy is given by: 99.257630


### Test Perceptron on Rice

In [67]:
pred_percept = percept_RICE.predict(X_test_RICE)
print('The testing accuracy is given by: %f' % (get_acc(pred_percept, y_test_RICE)))

The testing accuracy is given by: 99.120154


# Support Vector Machines (with SGD)

Next, you will implement a "soft margin" SVM. In this formulation you will maximize the margin between positive and negative training examples and penalize margin violations using a hinge loss.

We will optimize the SVM loss using SGD. This means you must compute the loss function with respect to model weights. You will use this gradient to update the model weights.

SVM optimized with SGD has 3 hyperparameters that you can experiment with:
- **Learning rate** - similar to as defined above in Perceptron, this parameter scales by how much the weights are changed according to the calculated gradient update. 
- **Epochs** - similar to as defined above in Perceptron.
- **Regularization constant** - Hyperparameter to determine the strength of regularization. In this case it is a coefficient on the term which maximizes the margin. You could try different values. The default value is set to 0.05.

You will implement the SVM using SGD in the **models/svm.py**

The following code: 
- Creates an instance of the SVM classifier class 
- The train function of the SVM class is trained on the training data
- We use the predict function to find the training accuracy as well as the testing accuracy

## Train SVM on Fashion-MNIST

In [155]:
lr = 0.004
n_epochs = 50
reg_const = 0.03

svm_fashion = SVM(n_class_fashion, lr, n_epochs, reg_const)
svm_fashion.train(X_train_fashion, y_train_fashion)

epoch 1 / 50, error: 0.88572, lr: 0.004
epoch 2 / 50, error: 0.32577999999999996, lr: 0.00392
epoch 3 / 50, error: 0.28706, lr: 0.0038415999999999997
epoch 4 / 50, error: 0.26912, lr: 0.0037647679999999995
epoch 5 / 50, error: 0.25622, lr: 0.0036894726399999992
epoch 6 / 50, error: 0.24768, lr: 0.003615683187199999
epoch 7 / 50, error: 0.24097999999999997, lr: 0.0035433695234559992
epoch 8 / 50, error: 0.23582000000000003, lr: 0.003472502132986879
epoch 9 / 50, error: 0.23185999999999996, lr: 0.0034030520903271413
epoch 10 / 50, error: 0.22792, lr: 0.0033349910485205984
epoch 11 / 50, error: 0.22528000000000004, lr: 0.003268291227550186
epoch 12 / 50, error: 0.22306000000000004, lr: 0.0032029254029991823
epoch 13 / 50, error: 0.22065999999999997, lr: 0.003138866894939199
epoch 14 / 50, error: 0.21840000000000004, lr: 0.003076089557040415
epoch 15 / 50, error: 0.21667999999999998, lr: 0.0030145677658996064
epoch 16 / 50, error: 0.21475999999999995, lr: 0.0029542764105816143
epoch 17 / 5

In [156]:
pred_svm = svm_fashion.predict(X_train_fashion)
print('The training accuracy is given by: %f' % (get_acc(pred_svm, y_train_fashion)))

The training accuracy is given by: 80.868000


### Validate SVM on Fashion-MNIST

In [157]:
pred_svm = svm_fashion.predict(X_val_fashion)
print('The validation accuracy is given by: %f' % (get_acc(pred_svm, y_val_fashion)))

The validation accuracy is given by: 79.600000


### Test SVM on Fashion-MNIST

In [158]:
pred_svm = svm_fashion.predict(X_test_fashion)
print('The testing accuracy is given by: %f' % (get_acc(pred_svm, y_test_fashion)))

The testing accuracy is given by: 78.350000


### SVM_Fashion-MNIST Kaggle Submission

Once you are satisfied with your solution and test accuracy output a file to submit your test set predictions to the Kaggle for Assignment 1 Fashion-MNIST. Use the following code to do so:

In [154]:
# output_submission_csv('kaggle/svm_submission_fashion.csv', svm_fashion.predict(X_test_fashion))

## Train SVM on Rice

In [161]:
lr = 0.003
n_epochs = 50
reg_const = 0.05

svm_RICE = SVM(n_class_RICE, lr, n_epochs, reg_const)
svm_RICE.train(X_train_RICE, y_train_RICE)

epoch 1 / 50, error: 0.5477958024012464, lr: 0.003
epoch 2 / 50, error: 0.10374851067729818, lr: 0.00294
epoch 3 / 50, error: 0.03226102098799377, lr: 0.0028812
epoch 4 / 50, error: 0.03244432224360738, lr: 0.002823576
epoch 5 / 50, error: 0.03345247914948213, lr: 0.00276710448
epoch 6 / 50, error: 0.03299422601044821, lr: 0.0027117623904
epoch 7 / 50, error: 0.03299422601044821, lr: 0.002657527142592
epoch 8 / 50, error: 0.0328109247548346, lr: 0.00260437659974016
epoch 9 / 50, error: 0.03299422601044821, lr: 0.0025522890677453568
epoch 10 / 50, error: 0.03207771973238016, lr: 0.0025012432863904498
epoch 11 / 50, error: 0.03244432224360738, lr: 0.002451218420662641
epoch 12 / 50, error: 0.03235267161580058, lr: 0.002402194052249388
epoch 13 / 50, error: 0.03207771973238016, lr: 0.0023541501712044
epoch 14 / 50, error: 0.03189441847676655, lr: 0.002307067167780312
epoch 15 / 50, error: 0.03235267161580058, lr: 0.002260925824424706
epoch 16 / 50, error: 0.03262762349922099, lr: 0.002215

In [162]:
pred_svm = svm_RICE.predict(X_train_RICE)
print('The training accuracy is given by: %f' % (get_acc(pred_svm, y_train_RICE)))

The training accuracy is given by: 96.920539


### Validate SVM on Rice

In [78]:
pred_svm = svm_RICE.predict(X_val_RICE)
print('The validation accuracy is given by: %f' % (get_acc(pred_svm, y_val_RICE)))

The validation accuracy is given by: 99.862524


## Test SVM on Rice

In [79]:
pred_svm = svm_RICE.predict(X_test_RICE)
print('The testing accuracy is given by: %f' % (get_acc(pred_svm, y_test_RICE)))

The testing accuracy is given by: 99.862524


# Softmax Classifier (with SGD)

Next, you will train a Softmax classifier. This classifier consists of a linear function of the input data followed by a softmax function which outputs a vector of dimension C (number of classes) for each data point. Each entry of the softmax output vector corresponds to a confidence in one of the C classes, and like a probability distribution, the entries of the output vector sum to 1. We use a cross-entropy loss on this sotmax output to train the model. 

Check the following link as an additional resource on softmax classification: http://cs231n.github.io/linear-classify/#softmax

Once again we will train the classifier with SGD. This means you need to compute the gradients of the softmax cross-entropy loss function according to the weights and update the weights using this gradient. Check the following link to help with implementing the gradient updates: https://deepnotes.io/softmax-crossentropy

The softmax classifier has 3 hyperparameters that you can experiment with:
- **Learning rate** - As above, this controls how much the model weights are updated with respect to their gradient.
- **Number of Epochs** - As described for perceptron.
- **Regularization constant** - Hyperparameter to determine the strength of regularization. In this case, we minimize the L2 norm of the model weights as regularization, so the regularization constant is a coefficient on the L2 norm in the combined cross-entropy and regularization objective.

You will implement a softmax classifier using SGD in the **models/softmax.py**

The following code: 
- Creates an instance of the Softmax classifier class 
- The train function of the Softmax class is trained on the training data
- We use the predict function to find the training accuracy as well as the testing accuracy

## Train Softmax on Fashion-MNIST

In [209]:
lr = 0.001
n_epochs = 24
reg_const = 0.2

softmax_fashion = Softmax(n_class_fashion, lr, n_epochs, reg_const)
softmax_fashion.train(X_train_fashion, y_train_fashion)

epoch: 1 / 24, error: 0.89044
epoch: 2 / 24, error: 0.25732
epoch: 3 / 24, error: 0.23106000000000004
epoch: 4 / 24, error: 0.21814
epoch: 5 / 24, error: 0.20996000000000004
epoch: 6 / 24, error: 0.20308000000000004
epoch: 7 / 24, error: 0.19628
epoch: 8 / 24, error: 0.18984
epoch: 9 / 24, error: 0.18272
epoch: 10 / 24, error: 0.17606
epoch: 11 / 24, error: 0.17098000000000002
epoch: 12 / 24, error: 0.16596
epoch: 13 / 24, error: 0.1612
epoch: 14 / 24, error: 0.15759999999999996
epoch: 15 / 24, error: 0.15361999999999998
epoch: 16 / 24, error: 0.15159999999999996
epoch: 17 / 24, error: 0.14873999999999998
epoch: 18 / 24, error: 0.14708
epoch: 19 / 24, error: 0.14624000000000004
epoch: 20 / 24, error: 0.14605999999999997
epoch: 21 / 24, error: 0.14632
epoch: 22 / 24, error: 0.14576
epoch: 23 / 24, error: 0.14542
epoch: 24 / 24, error: 0.14542


In [210]:
pred_softmax = softmax_fashion.predict(X_train_fashion)
print('The training accuracy is given by: %f' % (get_acc(pred_softmax, y_train_fashion)))

The training accuracy is given by: 85.448000


### Validate Softmax on Fashion-MNIST

In [211]:
pred_softmax = softmax_fashion.predict(X_val_fashion)
print('The validation accuracy is given by: %f' % (get_acc(pred_softmax, y_val_fashion)))

The validation accuracy is given by: 83.710000


### Testing Softmax on Fashion-MNIST

In [212]:
pred_softmax = softmax_fashion.predict(X_test_fashion)
print('The testing accuracy is given by: %f' % (get_acc(pred_softmax, y_test_fashion)))

The testing accuracy is given by: 82.930000


### Softmax_Fashion-MNIST Kaggle Submission

Once you are satisfied with your solution and test accuracy output a file to submit your test set predictions to the Kaggle for Assignment 1 Fashion-MNIST. Use the following code to do so:

In [213]:
output_submission_csv('kaggle/softmax_submission_fashion.csv', softmax_fashion.predict(X_test_fashion))

## Train Softmax on Rice

In [168]:
lr = 0.5
n_epochs = 10
reg_const = 0.05

softmax_RICE = Softmax(n_class_RICE, lr, n_epochs, reg_const)
softmax_RICE.train(X_train_RICE, y_train_RICE)

epoch: 1 / 10
epoch: 2 / 10
epoch: 3 / 10
epoch: 4 / 10
epoch: 5 / 10
epoch: 6 / 10
epoch: 7 / 10
epoch: 8 / 10
epoch: 9 / 10
epoch: 10 / 10


In [169]:
pred_softmax = softmax_RICE.predict(X_train_RICE)
print('The training accuracy is given by: %f' % (get_acc(pred_softmax, y_train_RICE)))

The training accuracy is given by: 96.728073


### Validate Softmax on Rice

In [170]:
pred_softmax = softmax_RICE.predict(X_val_RICE)
print('The validation accuracy is given by: %f' % (get_acc(pred_softmax, y_val_RICE)))

The validation accuracy is given by: 96.755568


### Testing Softmax on Rice

In [171]:
pred_softmax = softmax_RICE.predict(X_test_RICE)
print('The testing accuracy is given by: %f' % (get_acc(pred_softmax, y_test_RICE)))

The testing accuracy is given by: 97.195491


# Logistic Classifier

The Logistic Classifier has 2 hyperparameters that you can experiment with:
- **Learning rate** - similar to as defined above in Perceptron, this parameter scales by how much the weights are changed according to the calculated gradient update. 
- **Number of Epochs** - As described for perceptron.
- **Threshold** - The decision boundary of the classifier.


You will implement the Logistic Classifier in the **models/logistic.py**

The following code: 
- Creates an instance of the Logistic classifier class 
- The train function of the Logistic class is trained on the training data
- We use the predict function to find the training accuracy as well as the testing accuracy

### Training Logistic Classifer

In [None]:
learning_rate = 0.5
n_epochs = 10
threshold = 0.5

lr = Logistic(learning_rate, n_epochs, threshold)
lr.train(X_train_RICE, y_train_RICE)

In [None]:
pred_lr = lr.predict(X_train_RICE)
print('The training accuracy is given by: %f' % (get_acc(pred_lr, y_train_RICE)))

### Validate Logistic Classifer

In [None]:
pred_lr = lr.predict(X_val_RICE)
print('The validation accuracy is given by: %f' % (get_acc(pred_lr, y_val_RICE)))

### Test Logistic Classifier

In [None]:
pred_lr = lr.predict(X_test_RICE)
print('The testing accuracy is given by: %f' % (get_acc(pred_lr, y_test_RICE)))