import libraries

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd;
from scipy.stats import zscore
import torch as torch
import torch.nn as nn

Excercise #1: 

Linear Classifier

In [3]:
fulldataset = pd.read_csv('./iris.csv')

np_x=fulldataset[list(fulldataset.columns)[0:-1]].apply(zscore).to_numpy();
np_y = pd.get_dummies(fulldataset['variety']).to_numpy();

n_classes = 3;
n_features = np_x.shape[1];

x_train=np_x
y_train=np_y



Now to create our tensor of input variables.

In [4]:
t_x_train=torch.tensor(x_train,requires_grad=False,dtype=torch.float64,device='cpu');
t_y_train=torch.tensor(y_train,requires_grad=False,dtype=torch.float64,device='cpu');

Initialize variables for Gradient Descent:

In [5]:
init_std_dev = 0.01;
initialW=init_std_dev*np.random.randn(n_features,n_classes)

Creating variables for weights:

In [6]:
W = torch.tensor(initialW,requires_grad=True,device='cpu');
b = torch.zeros((1,n_classes),requires_grad=True,device='cpu');

I've heard ADAM was the best optimizer so I'm going to stick with that.

I think the smartest approach to evaluating the different learning rates is to test different learning rates until they reach a specified level of precision in the accuracy of predictions. This way we don't have to modulate the epoch numbers AND the learning rate each trial. We simply have to modulate the learning rate and count the number of iterations to achieve a level of precision.

To prevent runaway looping, I've set a limit to the maximum number of iterations before the loop "gives up".

In [7]:
learning_rate = [0.5, 0.05, 0.005, 0.0005]
for rate in learning_rate:
    W = torch.tensor(initialW,requires_grad=True,device='cpu');
    b = torch.zeros((1,n_classes),requires_grad=True,device='cpu');
    optimizer = torch.optim.Adam([W,b],lr=rate)
    iteration_limit = 100000; #Desired maximum iterations
    tol = 0.95 # Desired Accuracy
    i = 0
    accuracy = 0
    while accuracy < tol and i < iteration_limit:
        # clear previous gradient calculations
        optimizer.zero_grad();
        # calculate model predictions
        linear_predictions = torch.matmul(t_x_train,W)+b
        activations = 1.0 / (1.0 + torch.exp(-linear_predictions));
        #calculate loss
        prediction_error = t_y_train-activations
        risk = torch.mean(torch.pow(prediction_error,2));
        #calculate gradients of risk w.r.t. W,b and propagate them back
        risk.backward();
        # use the gradient to change W, b
        optimizer.step();
        # calculate accuracy (on the training set!)
        true_class = np.argmax(t_y_train.detach().cpu().numpy(),axis=1)
        pred_class = np.argmax(activations.detach().cpu().numpy(),axis=1)
        accuracy = np.count_nonzero(true_class == pred_class)/pred_class.shape[0];
        prediction_error = np.abs(np.mean(t_y_train.detach().numpy()-activations.detach().numpy()))
        i = i+1

    print('End of loop results:')
    print('Completed in '+str(i)+' iterations with '+str(round(accuracy*100,4))+' percent accuracy and an error of '+str(round(prediction_error,4))+' on training data.')
    print('--------------------------------')

End of loop results:
Completed in 50 iterations with 95.3333 percent accuracy and an error of 0.0025 on training data.
--------------------------------
End of loop results:
Completed in 143 iterations with 95.3333 percent accuracy and an error of 0.0112 on training data.
--------------------------------
End of loop results:
Completed in 1400 iterations with 95.3333 percent accuracy and an error of 0.0119 on training data.
--------------------------------
End of loop results:
Completed in 8244 iterations with 95.3333 percent accuracy and an error of 0.0123 on training data.
--------------------------------


Aiming for 95% accuracy results in convergence in 50 iterations using the fastest learning rate. Its interesting to note that the fastest learning rate results in the lowest prediction error. 

This is due to how we've defined the stopping condition of our loop. The slower the learning rate, the more iterations it takes to attain the same level of accuracy. However, the slower the learning rate the more precise the theoretically attainable solution becomes.

For example, if the accuracy tolerance is set to 99.9%, accuracy you will see that the fastest learning rate simply cannot attain an accuracy of 99%. However the slower learning rate does posses the granularity attain such performance. As such it is a matter of the programmer's preference to determine what level of precision is required for their problem.

Warning: This one will take a bit to go through 100,000 iterations. It will complete.

In [16]:
learning_rate = [0.5, 0.0005]
for rate in learning_rate:
    W = torch.tensor(initialW,requires_grad=True,device='cpu');
    b = torch.zeros((1,n_classes),requires_grad=True,device='cpu');
    optimizer = torch.optim.Adam([W,b],lr=rate)
    iteration_limit = 100000; #Desired maximum iterations
    tol = 0.9866 # Desired Accuracy
    i = 0
    accuracy = 0
    while accuracy < tol and i < iteration_limit:
        # clear previous gradient calculations
        optimizer.zero_grad();
        # calculate model predictions
        linear_predictions = torch.matmul(t_x_train,W)+b
        activations = 1.0 / (1.0 + torch.exp(-linear_predictions));
        #calculate loss
        prediction_error = t_y_train-activations
        risk = torch.mean(torch.pow(prediction_error,2));
        #calculate gradients of risk w.r.t. W,b and propagate them back
        risk.backward();
        # use the gradient to change W, b
        optimizer.step();
        # calculate accuracy (on the training set!)
        true_class = np.argmax(t_y_train.detach().cpu().numpy(),axis=1)
        pred_class = np.argmax(activations.detach().cpu().numpy(),axis=1)
        accuracy = np.count_nonzero(true_class == pred_class)/pred_class.shape[0];
        prediction_error = np.abs(np.mean(t_y_train.detach().numpy()-activations.detach().numpy()))
        i = i+1

    print('End of loop results:')
    print('Completed in '+str(i)+' iterations with '+str(round(accuracy*100,4))+' percent accuracy and an error of '+str(round(prediction_error,4))+' on training data.')
    print('--------------------------------')

End of loop results:
Completed in 2662 iterations with 98.6667 percent accuracy and an error of 0.009 on training data.
--------------------------------
End of loop results:
Completed in 91323 iterations with 98.6667 percent accuracy and an error of 0.0083 on training data.
--------------------------------


As you can see, for this run the algorithm ran for ~91,000 iterations on the slower learning rate. But we have a much lower prediction error. However the computational cost to attain this increased precision does not seem worth it.