<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Neural-Network-Classifier" data-toc-modified-id="Neural-Network-Classifier-1">Neural Network Classifier</a></span></li><li><span><a href="#Apply-NeuralNetworkClassifier-to-Handwritten-Digits" data-toc-modified-id="Apply-NeuralNetworkClassifier-to-Handwritten-Digits-2">Apply <code>NeuralNetworkClassifier</code> to Handwritten Digits</a></span></li><li><span><a href="#Experiments" data-toc-modified-id="Experiments-3">Experiments</a></span></li><li><span><a href="#Grading" data-toc-modified-id="Grading-4">Grading</a></span></li><li><span><a href="#Extra-Credit" data-toc-modified-id="Extra-Credit-5">Extra Credit</a></span></li></ul></div>

# Neural Network Classifier

You may start with your `NeuralNetwork` class from A2, or start with the [implementation defined here](https://www.cs.colostate.edu/~anderson/cs545/notebooks/A2solution.tar) in which all functions meant be called by other functions in this class start with an underscore character. Implement the subclass `NeuralNetworkClassifier` that extends `NeuralNetwork` as discussed in class.  Your `NeuralNetworkClassifier` implementation should rely on inheriting functions from `NeuralNetwork` as much as possible. 

Your `neuralnetworks.py` file (notice it is plural) will now contain two classes, `NeuralNetwork` and `NeuralNetworkClassifier`.

In `NeuralNetworkClassifier` replace the `error_f` function with one called `_neg_log_likelihood_f` and pass it instead of `error_f` into the optimization functions.

In [1]:
%%writefile neuralnetworks.py

import numpy as np
import optimizers as opt
import sys  # for sys.float_info.epsilon

######################################################################
## class NeuralNetwork()
######################################################################

class NeuralNetwork():

    def __init__(self, n_inputs, n_hidden_units_by_layers, n_outputs):
        '''
        n_inputs: int
        n_hidden_units_by_layers: list of ints, or empty
        n_outputs: int
        '''

        self.n_inputs = n_inputs
        self.n_hidden_units_by_layers = n_hidden_units_by_layers
        self.n_outputs = n_outputs

        # Build list of shapes for weight matrices in each layera
        shapes = []
        n_in = n_inputs
        for nu in self.n_hidden_units_by_layers + [n_outputs]:
            shapes.append((n_in + 1, nu))
            n_in = nu

        self.all_weights, self.Ws = self._make_weights_and_views(shapes)
        self.all_gradients, self.Grads = self._make_weights_and_views(shapes)

        self.total_epochs = 0
        self.error_trace = []
        self.X_means = None
        self.X_stds = None
        self.T_means = None
        self.T_stds = None

    def _make_weights_and_views(self, shapes):
        '''
        shapes: list of pairs of ints for number of rows and columns
                in each layer
        Returns vector of all weights, and views into this vector
                for each layer
        '''
        all_weights = np.hstack([np.random.uniform(size=shape).flat
                                 / np.sqrt(shape[0])
                                 for shape in shapes])
        # Build list of views by reshaping corresponding elements
        # from vector of all weights into correct shape for each layer.
        views = []
        first_element = 0
        for shape in shapes:
            n_elements = shape[0] * shape[1]
            last_element = first_element + n_elements
            views.append(all_weights[first_element:last_element]
                         .reshape(shape))
            first_element = last_element

        return all_weights, views

    def __repr__(self):
        return f'NeuralNetwork({self.n_inputs}, ' + \
            f'{self.n_hidden_units_by_layers}, {self.n_outputs})'

    def __str__(self):
        s = self.__repr__()
        if self.total_epochs > 0:
            s += f'\n Trained for {self.total_epochs} epochs.'
            s += f'\n Final standardized training error {self.error_trace[-1]:.4g}.'
        return s
 
    def train(self, X, T, n_epochs, method='sgd', learning_rate=None, verbose=True):
        '''
        X: n_samples x n_inputs matrix of input samples, one per row
        T: n_samples x n_outputs matrix of target output values,
            one sample per row
        n_epochs: number of passes to take through all samples
            updating weights each pass
        method: 'sgd', 'adam', or 'scg'
        learning_rate: factor controlling the step size of each update
        '''

        # Setup standardization parameters
        # Setup standardization parameters
        if self.X_means is None:
            self.X_means = X.mean(axis=0)
            self.X_stds = X.std(axis=0)
            self.X_stds[self.X_stds == 0] = 1
            self.T_means = T.mean(axis=0)
            self.T_stds = T.std(axis=0)

        # Standardize X and T
        X = (X - self.X_means) / self.X_stds
        T = (T - self.T_means) / self.T_stds

        # Instantiate Optimizers object by giving it vector of all weights
        optimizer = opt.Optimizers(self.all_weights)

        _error_convert_f = lambda err: (np.sqrt(err) * self.T_stds)[0]

        if method == 'sgd':

            error_trace = optimizer.sgd(self._error_f, self._gradient_f,
                                        fargs=[X, T], n_epochs=n_epochs,
                                        learning_rate=learning_rate,
                                        error_convert_f=_error_convert_f,
                                        verbose=verbose)

        elif method == 'adam':

            error_trace = optimizer.adam(self._error_f, self._gradient_f,
                                         fargs=[X, T], n_epochs=n_epochs,
                                         learning_rate=learning_rate,
                                         error_convert_f=_error_convert_f,
                                         verbose=verbose)

        elif method == 'scg':

            error_trace = optimizer.scg(self._error_f, self._gradient_f,
                                        fargs=[X, T], n_epochs=n_epochs,
                                        error_convert_f=_error_convert_f,
                                        verbose=verbose)

        else:
            raise Exception("method must be 'sgd', 'adam', or 'scg'")

        self.total_epochs += len(error_trace)
        self.error_trace += error_trace

        # Return neural network object to allow applying other methods
        # after training, such as:    Y = nnet.train(X, T, 100, 0.01).use(X)

        return self

    def _forward(self, X):
        '''
        X assumed to be standardized and with first column of 1's
        '''
        self.Ys = [X]
        for W in self.Ws[:-1]:  # forward through all but last layer
            self.Ys.append(np.tanh(self.Ys[-1] @ W[1:, :] + W[0:1, :]))
        last_W = self.Ws[-1]
        self.Ys.append(self.Ys[-1] @ last_W[1:, :] + last_W[0:1, :])
        return self.Ys

    # Function to be minimized by optimizer method, mean squared error
    def _error_f(self, X, T):
        Ys = self._forward(X)
        mean_sq_error = np.mean((T - Ys[-1]) ** 2)
        return mean_sq_error

    # Gradient of function to be minimized for use by optimizer method
    def _gradient_f(self, X, T):
        # Assumes forward_pass just called with layer outputs saved in self.Ys.
        n_samples = X.shape[0]
        n_outputs = T.shape[1]

        # D is delta matrix to be back propagated
        D = -(T - self.Ys[-1]) / (n_samples * n_outputs)
        self._backpropagate(D)

        return self.all_gradients

    def _backpropagate(self, D):
        # Step backwards through the layers to back-propagate the error (D)
        n_layers = len(self.n_hidden_units_by_layers) + 1
        for layeri in range(n_layers - 1, -1, -1):
            # gradient of all but bias weights
            self.Grads[layeri][1:, :] = self.Ys[layeri].T @ D
            # gradient of just the bias weights
            self.Grads[layeri][0:1, :] = np.sum(D, axis=0)
            # Back-propagate this layer's delta to previous layer
            if layeri > 0:
                D = D @ self.Ws[layeri][1:, :].T * (1 - self.Ys[layeri] ** 2)

    def use(self, X):
        '''X assumed to not be standardized'''
        # Standardize X
        X = (X - self.X_means) / self.X_stds
        Ys = self._forward(X)
        # Unstandardize output Y before returning it
        return Ys[-1] * self.T_stds + self.T_means

    def get_error_trace(self):
        return self.error_trace


'''sub class neural Network classifier that extends NeuralNetwork '''    
class NeuralNetworkClassifier(NeuralNetwork):
    
    '''constructor which calls the super class constructor and intializes all the variables'''
    def __init__(self, n_inputs, n_hidden_units_by_layers, n_outputs):
        super().__init__(n_inputs, n_hidden_units_by_layers, n_outputs)
    
    def __str__(self):
        s = self.__repr__()
        if self.total_epochs > 0:
            s += f'\n Trained for {self.total_epochs} epochs.'
            s += f'\n Final data likelihood {self.error_trace[-1]:.4g}.'
        return s
    
    def __repr__(self):
        return f'NeuralNetworkClassifier({self.n_inputs}, ' + \
            f'{self.n_hidden_units_by_layers}, {self.n_outputs})'
    
    
    def _softmax(self, Y):
        '''Apply to final layer weighted sum outputs'''
        # Trick to avoid overflow
        maxY = Y.max()
        expY = np.exp(Y - maxY)
        denom = expY.sum(1).reshape((-1, 1))
        Y = expY / (denom + sys.float_info.epsilon)
        return Y
    
    def makeIndicatorVars(self,T):
        if T.ndim == 1:
            T = T.reshape((-1, 1))
        return (T == np.unique(T)).astype(int)
    
    
    def _neg_log_likelihood(self,X,T_Indicator):
        Ys=self._forward(X)
        Y_softMax = self._softmax(Ys[-1])
        return - np.mean(T_Indicator * np.log(Y_softMax + sys.float_info.epsilon))
    
    def _gradient_f(self,X,T_indicator):
        n_samples = X.shape[0]
        n_outputs = T_indicator.shape[1]
        G_I = self._softmax(self.Ys[-1])
        D = -(T_indicator - G_I) / (n_samples * n_outputs)
        self._backpropagate(D)
        return self.all_gradients
    
    
    def train(self, X, T, n_epochs, method='sgd', learning_rate=None, verbose=True):
        
        # calculate stadardization parameters for X
        if self.X_means is None:
            self.X_means = X.mean(axis=0)
            self.X_stds = X.std(axis=0)
            self.X_stds[self.X_stds == 0] = 1
        
        #for storing the reference of unique values in T and use them at last in use funtion to return back.
        self.T_unique = np.unique(T)

        # Standardize X
        X = (X - self.X_means) / self.X_stds
        
        #create indicator variable matrix for T and no need of standardizing T
        T_I = self.makeIndicatorVars(T)
        
        # Instantiate Optimizers object by giving it vector of all weights
        optimizer = opt.Optimizers(self.all_weights)

        #_error_convert_f = lambda err: (np.sqrt(err) * self.T_stds)[0]
        _error_convert_f = lambda nll: np.exp(-nll)          
            
        if method == 'adam':
            error_trace = optimizer.adam(self._neg_log_likelihood, self._gradient_f,fargs=[X, T_I], n_epochs=n_epochs,learning_rate=learning_rate,error_convert_f=_error_convert_f,verbose=verbose)

        elif method == 'scg':
            error_trace = optimizer.scg(self._neg_log_likelihood, self._gradient_f,fargs=[X, T_I], n_epochs=n_epochs,error_convert_f=_error_convert_f,verbose=verbose)

        elif method == 'sgd':
            error_trace = optimizer.sgd(self._neg_log_likelihood, self._gradient_f,fargs=[X, T_I], n_epochs=n_epochs,learning_rate=learning_rate,error_convert_f=_error_convert_f,verbose=verbose)

        else:
            raise Exception("method must be 'sgd', 'adam', or 'scg'")

        self.total_epochs += len(error_trace)
        self.error_trace += error_trace

        return self
    
    def use(self, X):
        X = (X - self.X_means) / self.X_stds
        Ys = self._forward(X)
        G_I = self._softmax(Ys[-1])
        out = np.argmax(G_I,axis=1).reshape(-1,1)
        return self.T_unique[out],G_I
    

Overwriting neuralnetworks.py


Here are some example tests.

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import numpy as np
import neuralnetworks as nn
import matplotlib.pyplot as plt
%matplotlib notebook

In [4]:
X = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
T = np.array([[0], [1], [1], [0]])
X, T

(array([[0, 0],
        [1, 0],
        [0, 1],
        [1, 1]]),
 array([[0],
        [1],
        [1],
        [0]]))

In [5]:
np.random.seed(111)
nnet = nn.NeuralNetworkClassifier(2, [10], 2)

In [6]:
print(nnet)

NeuralNetworkClassifier(2, [10], 2)


In [7]:
nnet.Ws

[array([[0.35343662, 0.09761247, 0.25175879, 0.4441339 , 0.17050614,
         0.08611927, 0.01297787, 0.24261672, 0.1378032 , 0.19494589],
        [0.57198811, 0.13725143, 0.0468766 , 0.38659388, 0.35867477,
         0.15834035, 0.26917306, 0.06833965, 0.04269942, 0.52006221],
        [0.4583945 , 0.48530311, 0.47066025, 0.57212805, 0.3332892 ,
         0.46982855, 0.24324799, 0.01584709, 0.26219591, 0.06081004]]),
 array([[0.24640112, 0.21037283],
        [0.17043996, 0.08268264],
        [0.30105116, 0.04162125],
        [0.18555481, 0.14624395],
        [0.12212025, 0.21945476],
        [0.09733206, 0.12076903],
        [0.09617199, 0.28559813],
        [0.27700099, 0.24538331],
        [0.01027463, 0.28443762],
        [0.28656819, 0.24319635],
        [0.14511079, 0.29148887]])]

The `softmax` function can produce errors if the denominator is close to zero.  Here is an implentation you may use to avoid some of those errors.  This assumes you have the following import in your `neuralnetworks.py` file.

`sys.float_info.epsilon` is also useful in your `_neg_log_likehood_f` function to avoid taking the `log` of zero.

In [8]:
import sys  # for sys.float_info.epsilon 

In [9]:
    def _softmax(self, Y):
        '''Apply to final layer weighted sum outputs'''
        # Trick to avoid overflow
        maxY = Y.max()
        expY = np.exp(Y - maxY)
        denom = expY.sum(1).reshape((-1, 1))
        Y = expY / (denom + sys.float_info.epsilon)
        return Y

Replace the `error_f` function with `neg_log_likelihood`.  If you add some print statements in `_neg_log_likelihood` functions, you can compare your output to the following results.

In [10]:
nnet.train(X, T, n_epochs=1, method='sgd', learning_rate=0.01)

sgd: Epoch 1 ObjectiveF=0.70718


NeuralNetworkClassifier(2, [10], 2)

In [11]:
print(nnet)

NeuralNetworkClassifier(2, [10], 2)
 Trained for 1 epochs.
 Final data likelihood 0.7072.


Now if you comment out those print statements, you can run for more epochs without tons of output.

In [12]:
np.random.seed(111)
nnet = nn.NeuralNetworkClassifier(2, [10], 2)

In [13]:
nnet.train(X, T, 100, method='scg')

SCG: Epoch 10 ObjectiveF=0.99632
SCG: Epoch 20 ObjectiveF=0.99996
SCG: Epoch 30 ObjectiveF=1.00000


NeuralNetworkClassifier(2, [10], 2)

The `use()` function returns two `numpy` arrays. The first one are the class predictions for eachs sample, containing values from the set of unique values in `T` passed into the `train()` function.

The second value are the probabilities of each class for each sample. This should a column for each unique value in `T`.

In [14]:
nnet.use(X)

(array([[0],
        [1],
        [1],
        [0]]),
 array([[9.99999991e-01, 9.38228261e-09],
        [1.13209730e-08, 9.99999989e-01],
        [8.63072013e-09, 9.99999991e-01],
        [9.99999990e-01, 9.87925278e-09]]))

In [15]:
def percent_correct(Y, T):
    return np.mean(T == Y) * 100

In [16]:
percent_correct(nnet.use(X)[0], T)

100.0

Works!  The XOR problem was used early in the history of neural networks as a problem that cannot be solved with a linear model.  Let's try it.  It turns out our neural network code can do this if we use an empty list for the hidden unit structure!

In [17]:
nnet = nn.NeuralNetworkClassifier(2, [], 2)
nnet.train(X, T, 100, method='scg')

NeuralNetworkClassifier(2, [], 2)

In [18]:
nnet.use(X)

(array([[0],
        [1],
        [0],
        [1]]),
 array([[0.5, 0.5],
        [0.5, 0.5],
        [0.5, 0.5],
        [0.5, 0.5]]))

In [19]:
percent_correct(nnet.use(X)[0], T)

50.0

A second way to evaluate a classifier is to calculate a confusion matrix. This shows the percent accuracy for each class, and also shows which classes are predicted in error.

Here is a function you can use to show a confusion matrix.

In [20]:
import pandas

def confusion_matrix(Y_classes, T):
    class_names = np.unique(T)
    table = []
    for true_class in class_names:
        row = []
        for Y_class in class_names:
            row.append(100 * np.mean(Y_classes[T == true_class] == Y_class))
        table.append(row)
    conf_matrix = pandas.DataFrame(table, index=class_names, columns=class_names)
    # cf.style.background_gradient(cmap='Blues').format("{:.1f} %")
    print('Percent Correct')
    return conf_matrix.style.background_gradient(cmap='Blues').format("{:.1f}")

In [21]:
confusion_matrix(nnet.use(X)[0], T)

Percent Correct


Unnamed: 0,0,1
0,50.0,50.0
1,50.0,50.0


# Apply `NeuralNetworkClassifier` to Handwritten Digits

Apply your `NeuralNetworkClassifier` to the [MNIST digits dataset](https://www.cs.colostate.edu/~anderson/cs545/notebooks/mnist.pkl.gz).

In [27]:
import pickle
import gzip

In [28]:
with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f, encoding='latin1')

Xtrain = train_set[0]
Ttrain = train_set[1].reshape(-1, 1)

Xval = valid_set[0]
Tval = valid_set[1].reshape(-1, 1)

Xtest = test_set[0]
Ttest = test_set[1].reshape(-1, 1)

print(Xtrain.shape, Ttrain.shape,  Xval.shape, Tval.shape,  Xtest.shape, Ttest.shape)

(50000, 784) (50000, 1) (10000, 784) (10000, 1) (10000, 784) (10000, 1)


In [24]:
28*28

784

In [25]:
def draw_image(image, label):
    plt.imshow(-image.reshape(28, 28), cmap='gray')
    plt.xticks([])
    plt.yticks([])
    plt.axis('off')
    plt.title(label)

In [26]:
plt.figure(figsize=(7, 7))
for i in range(100):
    plt.subplot(10, 10, i+1)
    draw_image(Xtrain[i], Ttrain[i,0])
plt.tight_layout()

<IPython.core.display.Javascript object>

In [27]:
nnet = nn.NeuralNetworkClassifier(784, [], 10)
nnet.train(Xtrain, Ttrain, n_epochs=40, method='scg')

SCG: Epoch 4 ObjectiveF=0.95819
SCG: Epoch 8 ObjectiveF=0.96994
SCG: Epoch 12 ObjectiveF=0.97146
SCG: Epoch 16 ObjectiveF=0.97146
SCG: Epoch 20 ObjectiveF=0.97146
SCG: Epoch 24 ObjectiveF=0.97273
SCG: Epoch 28 ObjectiveF=0.97421
SCG: Epoch 32 ObjectiveF=0.97475
SCG: Epoch 36 ObjectiveF=0.97494
SCG: Epoch 40 ObjectiveF=0.97496


NeuralNetworkClassifier(784, [], 10)

In [28]:
print(nnet)

NeuralNetworkClassifier(784, [], 10)
 Trained for 40 epochs.
 Final data likelihood 0.975.


In [29]:
[percent_correct(nnet.use(X)[0], T) for X, T in zip([Xtrain, Xval, Xtest], [Ttrain, Tval, Ttest])]

[93.138, 92.65, 92.28]

In [30]:
nnet = nn.NeuralNetworkClassifier(784, [20], 10)
nnet.train(Xtrain, Ttrain, n_epochs=40, method='scg')

SCG: Epoch 4 ObjectiveF=0.88560
SCG: Epoch 8 ObjectiveF=0.95820
SCG: Epoch 12 ObjectiveF=0.97325
SCG: Epoch 16 ObjectiveF=0.97961
SCG: Epoch 20 ObjectiveF=0.98322
SCG: Epoch 24 ObjectiveF=0.98575
SCG: Epoch 28 ObjectiveF=0.98762
SCG: Epoch 32 ObjectiveF=0.98896
SCG: Epoch 36 ObjectiveF=0.98981
SCG: Epoch 40 ObjectiveF=0.99067


NeuralNetworkClassifier(784, [20], 10)

In [31]:
[percent_correct(nnet.use(X)[0], T) for X, T in zip([Xtrain, Xval, Xtest],
                                                    [Ttrain, Tval, Ttest])]

[97.47200000000001, 94.21000000000001, 93.62]

# Experiments

For each method, try various hidden layer structures, learning rates, and numbers of epochs.  Use the validation percent accuracy to pick the best hidden layers, learning rates and numbers of epochs for each method (ignore learning rates for scg).  Report training, validation and test accuracy for your best validation results for each of the three methods.

Include plots of data likelihood versus epochs, and confusion matrices, for best results for each method.

Write at least 10 sentences about what you observe in the likelihood plots, the train, validation and test accuracies, and the confusion matrices.

In [32]:
nnet = nn.NeuralNetworkClassifier(784, [50,100], 10)
nnet.train(Xtrain, Ttrain, n_epochs=40, method='scg')

SCG: Epoch 4 ObjectiveF=0.80443
SCG: Epoch 8 ObjectiveF=0.80675
SCG: Epoch 12 ObjectiveF=0.80675
SCG: Epoch 16 ObjectiveF=0.82281
SCG: Epoch 20 ObjectiveF=0.83298
SCG: Epoch 24 ObjectiveF=0.84843
SCG: Epoch 28 ObjectiveF=0.90844
SCG: Epoch 32 ObjectiveF=0.94241
SCG: Epoch 36 ObjectiveF=0.96009
SCG: Epoch 40 ObjectiveF=0.97036


NeuralNetworkClassifier(784, [50, 100], 10)

In [33]:
import neuralnetworks as nn
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

errors = []
method_rhos_epoches_structure = [('sgd', 0.001, 50,[50,100]),
               ('sgd', 0.005, 10,[10,10]),
               ('sgd', 0.01, 5,[30,30,30]),
               ('sgd', 0.1, 100,[]),
               ('sgd', 0.05, 30,[10]),
               ('adam', 0.001,50,[20]),
               ('adam', 0.01,50,[20,10]),
               ('adam', 0.005,10,[30,30,30]),
               ('adam', 0.05,30,[]),
               ('adam', 0.01,35,[]),
               ('scg', None,50,[20]) ,
               ('scg', None,30,[10,10]),
               ('scg', None,30,[]),
                ('scg', None,10,[10]),
               ('scg', None,40,[30,30,30])]

for method, rho, epoches, structure in method_rhos_epoches_structure:
    nnet = nn.NeuralNetworkClassifier(784, structure, 10)
    print(f' Running {method} with number of Epoch {epoches} and learning rate of {rho} Structure of hidden layers ={structure}')
    nnet.train(Xtrain, Ttrain, epoches, method=method, learning_rate=rho)
    result=[percent_correct(nnet.use(X)[0], T) for X, T in zip([Xtrain, Xval, Xtest],[Ttrain, Tval, Ttest])]
    print(f' percentage correct for training set {result[0]}')
    print(f' percentage correct for validation set {result[1]}')
    print(f' percentage correct for test set {result[2]}')
    errors.append(nnet.get_error_trace())

 Running sgd with number of Epoch 50 and learning rate of 0.001 Structure of hidden layers =[50, 100]
sgd: Epoch 5 ObjectiveF=0.79195
sgd: Epoch 10 ObjectiveF=0.79247
sgd: Epoch 15 ObjectiveF=0.79318
sgd: Epoch 20 ObjectiveF=0.79395
sgd: Epoch 25 ObjectiveF=0.79474
sgd: Epoch 30 ObjectiveF=0.79550
sgd: Epoch 35 ObjectiveF=0.79622
sgd: Epoch 40 ObjectiveF=0.79689
sgd: Epoch 45 ObjectiveF=0.79751
sgd: Epoch 50 ObjectiveF=0.79807
 percentage correct for training set 14.416
 percentage correct for validation set 13.309999999999999
 percentage correct for test set 14.39
 Running sgd with number of Epoch 10 and learning rate of 0.005 Structure of hidden layers =[10, 10]
sgd: Epoch 1 ObjectiveF=0.79344
sgd: Epoch 2 ObjectiveF=0.79344
sgd: Epoch 3 ObjectiveF=0.79346
sgd: Epoch 4 ObjectiveF=0.79348
sgd: Epoch 5 ObjectiveF=0.79350
sgd: Epoch 6 ObjectiveF=0.79353
sgd: Epoch 7 ObjectiveF=0.79356
sgd: Epoch 8 ObjectiveF=0.79359
sgd: Epoch 9 ObjectiveF=0.79363
sgd: Epoch 10 ObjectiveF=0.79367
 perce

<h1>For SGD</h1>

From the above result the following structure seems to return good results for <b>sgd</b> :

Structure: []
learning_rate: 0.1
epoches : 40

percentage correct for training set 89.56400000000001<br>
percentage correct for validation set 91.22<br>
percentage correct for test set 88.61999999999999<br>


<h5>Confusion Matrix</h5>

In [34]:
import numpy as np
import neuralnetworks as nn
import matplotlib.pyplot as plt
%matplotlib notebook



nnet = nn.NeuralNetworkClassifier(784, [], 10)
nnet.train(Xtrain, Ttrain, n_epochs=100, method='sgd', learning_rate=0.1)
confusion_matrix(nnet.use(Xtest)[0], Ttest)

sgd: Epoch 10 ObjectiveF=0.91043
sgd: Epoch 20 ObjectiveF=0.94566
sgd: Epoch 30 ObjectiveF=0.95573
sgd: Epoch 40 ObjectiveF=0.95937
sgd: Epoch 50 ObjectiveF=0.96157
sgd: Epoch 60 ObjectiveF=0.96303
sgd: Epoch 70 ObjectiveF=0.96408
sgd: Epoch 80 ObjectiveF=0.96490
sgd: Epoch 90 ObjectiveF=0.96557
sgd: Epoch 100 ObjectiveF=0.96614
Percent Correct


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,97.6,0.0,0.2,0.1,0.0,0.8,1.0,0.1,0.2,0.0
1,0.0,97.4,0.2,0.3,0.1,0.3,0.4,0.0,1.5,0.0
2,1.4,1.1,86.8,1.9,1.6,0.1,1.1,1.9,3.7,0.5
3,0.4,0.5,2.0,89.3,0.0,3.2,0.6,1.7,1.7,0.7
4,0.2,0.5,0.4,0.0,93.4,0.2,1.2,0.2,0.4,3.5
5,1.5,0.4,0.4,4.3,1.7,84.4,2.1,1.5,2.9,0.8
6,1.5,0.5,0.6,0.0,1.0,1.7,94.3,0.1,0.3,0.0
7,0.1,1.9,2.1,0.5,1.3,0.0,0.1,90.6,0.0,3.4
8,1.3,1.8,0.9,2.9,1.8,3.2,1.2,1.8,83.5,1.4
9,1.6,0.9,0.2,1.2,4.9,0.4,0.1,2.4,0.5,87.9


<h5>Plot of likelihood vs epoches</h5>

In [35]:
plt.clf()
error=[]
error.append(nnet.get_error_trace())
for e in error:
    plt.plot(e)
plt.ylabel('likelihood')
plt.xlabel('Epoch')
plt.legend("SGD")

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x210f00d17f0>

<h1>For Adam</h1>

From the above result the following structure seems to return good results for <b>adam</b> :

Structure: [20,10]
learning_rate: 0.01
epoches : 50

percentage correct for training set 93.35000000000001<br>
percentage correct for validation set 91.0<br>
percentage correct for test set 94.11<br>


<h5>Confusion Matrix</h5>

In [36]:
import numpy as np
import neuralnetworks as nn
import matplotlib.pyplot as plt
%matplotlib notebook

nnet = nn.NeuralNetworkClassifier(784, [20,10], 10)
nnet.train(Xtrain, Ttrain, n_epochs=50, method='adam', learning_rate=0.01)
confusion_matrix(nnet.use(Xtest)[0], Ttest)

Adam: Epoch 5 ObjectiveF=0.81488
Adam: Epoch 10 ObjectiveF=0.83517
Adam: Epoch 15 ObjectiveF=0.85213
Adam: Epoch 20 ObjectiveF=0.86821
Adam: Epoch 25 ObjectiveF=0.88329
Adam: Epoch 30 ObjectiveF=0.89708
Adam: Epoch 35 ObjectiveF=0.90933
Adam: Epoch 40 ObjectiveF=0.92026
Adam: Epoch 45 ObjectiveF=0.92976
Adam: Epoch 50 ObjectiveF=0.93782
Percent Correct


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,92.3,0.2,0.7,0.6,0.3,2.0,3.4,0.0,0.4,0.0
1,0.0,96.1,0.7,1.2,0.1,0.1,0.5,0.4,0.4,0.4
2,2.3,0.4,87.4,3.6,1.2,0.2,1.7,0.4,2.1,0.7
3,1.0,0.8,2.9,87.9,0.0,1.7,0.4,1.2,3.5,0.7
4,0.1,0.2,0.6,0.1,89.9,0.1,1.1,1.0,0.6,6.2
5,3.5,0.7,0.6,5.8,1.0,79.1,1.7,0.7,6.4,0.6
6,4.8,0.2,0.9,0.0,0.9,1.7,91.2,0.0,0.2,0.0
7,0.1,2.8,1.8,1.4,1.7,0.0,0.2,87.7,0.1,4.3
8,1.8,0.3,1.2,3.5,1.7,5.0,0.6,0.9,83.8,1.0
9,0.6,0.3,0.3,0.9,9.0,0.9,0.1,6.0,2.2,79.7


<h5>Plot of likelihood vs epoches</h5>

In [37]:
plt.clf()
error=[]
error.append(nnet.get_error_trace())
for e in error:
    plt.plot(e)
plt.ylabel('likelihood')
plt.xlabel('Epoch')
plt.legend("ADAM")

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x210efd23100>

<h1>For SCG</h1>

From the above result the following structure seems to return good results for <b>scg</b> :

Structure: [10]
learning_rate: None
epoches : 50

percentage correct for training set 98.008<br>
percentage correct for validation set 94.28999999999999<br>
percentage correct for test set 93.87<br>


<h5>Confusion Matrix</h5>

In [38]:
import numpy as np
import neuralnetworks as nn
import matplotlib.pyplot as plt
%matplotlib notebook

nnet = nn.NeuralNetworkClassifier(784, [10], 10)
nnet.train(Xtrain, Ttrain, n_epochs=50, method='scg')
confusion_matrix(nnet.use(Xtest)[0], Ttest)

SCG: Epoch 5 ObjectiveF=0.92669
SCG: Epoch 10 ObjectiveF=0.96419
SCG: Epoch 15 ObjectiveF=0.97086
SCG: Epoch 20 ObjectiveF=0.97434
SCG: Epoch 25 ObjectiveF=0.97636
SCG: Epoch 30 ObjectiveF=0.97812
SCG: Epoch 35 ObjectiveF=0.97948
SCG: Epoch 40 ObjectiveF=0.98053
SCG: Epoch 45 ObjectiveF=0.98140
SCG: Epoch 50 ObjectiveF=0.98205
Percent Correct


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,96.4,0.0,0.4,0.1,0.4,0.7,1.2,0.4,0.3,0.0
1,0.0,97.6,0.8,0.3,0.0,0.2,0.3,0.0,0.8,0.1
2,1.2,0.6,91.8,1.0,0.6,0.6,1.4,1.2,1.6,0.2
3,0.4,0.1,3.0,90.6,0.0,3.0,0.4,0.8,1.4,0.4
4,0.1,0.4,0.5,0.1,94.1,0.2,1.4,0.6,0.5,2.0
5,1.5,0.1,0.6,4.4,0.8,84.1,1.9,1.1,4.4,1.2
6,0.7,0.3,0.9,0.1,0.7,1.4,95.5,0.0,0.2,0.1
7,0.3,0.9,2.0,1.3,0.5,0.3,0.0,92.2,0.4,2.1
8,0.6,1.3,1.1,1.7,1.0,4.0,1.3,1.0,87.1,0.7
9,0.4,0.2,0.3,1.2,3.0,0.7,0.1,1.8,0.7,91.7


<h5>plot of likelihood vs epoches</h5>

In [39]:
plt.clf()
error=[]
error.append(nnet.get_error_trace())
for e in error:
    plt.plot(e)
plt.ylabel('likelihood')
plt.xlabel('Epoch')
plt.legend("SCG")

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x210ee0fb1c0>

<h2>Discussion</h2>
    
    The confusion matrix helps us understand the accuracy of the classification model.It provides the view of the false positives,false negative, true negative, true positive. The rows indicate the probability of occurance of that particular cell.
    We can see the predicted probabilities in dark blue and the error probabilities in light blue in the confusion matrix.
    In often cases we find the training dataset provides the best results as the model is trained to that particular data and the weights are correctly optimized and adjusted for that dataset. But validation dataset is something which we use to determine the properties of the model like finding the suitable algorithm yeilding better results. The validation dataset is also yeilding similar results to the training data with some deviation for this dataset. From the validation result the scg is performing better compared to the other 2 algorithms. When trying to lower the learning rate as low as 0.001 the model is taking more number of epoches to reach the global maxima in gradient ascent which is not providing better results for all the algorithms . However when we run with a learning rate of 0.01 then we are able to yield better results in less number of epoches. The hidden layers also has a direct impact to the model. In case of sgd when running with no hidden layers the model is performing better than taking multiple hidden layers. For adam, the no hidden layers is performing poor and yeilding better result on multiple hidden layer. However, on increasing the hidden layers for adam is the again decresing the performing. So for adam it is advised to pick limited number of hidden layers and keeping the learning rate around 0.01 - 0.05 with 40-50 epoches for better results. For Scg, On increasing the hidden layers is reducing the performance of the model and keeping it limited to 1-2 layer with number of units less or equal to the number of the input neurons is yeilding better result. for scg the epoches are directly proportional to the result till one point and then gets saturated at one  point where increasing the epoches also doesn't promise for better results. On comparing the likelihood graphs the sgd and scg has arrived the maximum probability in less number of epoches however, adam is still increasing linearly and may provide better results after increasing the epoches.  

# Grading

Download [A3grader.tar](https://www.cs.colostate.edu/~anderson/cs545/notebooks/A3grader.tar), extract `A3grader.py` before running the following cell.

In [40]:
%run -i A3grader.py



Extracting python code from notebook named 'Mallampati-A3.ipynb' and storing in notebookcode.py
Removing all statements that are not function or class defs or import statements.

## Testing inheritance ####################################################################

    correct = issubclass(NeuralNetworkClassifier, NeuralNetwork)


--- 10/10 points. NeuralNetworkClassifier correctly extends NeuralNetwork.

## Testing inheritance ####################################################################

    import inspect
    forward_func = [f for f in inspect.classify_class_attrs(NeuralNetworkClassifier) if (f.name == 'forward' or f.name == '_forward')]
    correct = forward_func[0].defining_class == NeuralNetwork


--- 5/5 points. NeuralNetworkClassifier forward function correctly inherited from NeuralNetwork.

## Testing inheritance ####################################################################

    import inspect
    str_func = [f for f in inspect.classify_class_attrs(Neural

# Extra Credit

Repeat the above experiments with a different data set.  Randonly partition your data into training, validaton and test parts if not already provided.  Write in markdown cells descriptions of the data and your results.

Firstly we try to download data using the curl command.

In [49]:
!curl -O http://archive.ics.uci.edu/ml/machine-learning-databases/00537/sobar-72.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  4102  100  4102    0     0  22662      0 --:--:-- --:--:-- --:--:-- 22662


we try to read the data using the pandas

In [16]:
import pandas as pd
data = pd.read_csv('sobar-72.csv', delimiter=',')
data

Unnamed: 0,behavior_sexualRisk,behavior_eating,behavior_personalHygine,intention_aggregation,intention_commitment,attitude_consistency,attitude_spontaneity,norm_significantPerson,norm_fulfillment,perception_vulnerability,perception_severity,motivation_strength,motivation_willingness,socialSupport_emotionality,socialSupport_appreciation,socialSupport_instrumental,empowerment_knowledge,empowerment_abilities,empowerment_desires,ca_cervix
0,10,13,12,4,7,9,10,1,8,7,3,14,8,5,7,12,12,11,8,1
1,10,11,11,10,14,7,7,5,5,4,2,15,13,7,6,5,5,4,4,1
2,10,15,3,2,14,8,10,1,4,7,2,7,3,3,6,11,3,3,15,1
3,10,11,10,10,15,7,7,1,5,4,2,15,13,7,4,4,4,4,4,1
4,8,11,7,8,10,7,8,1,5,3,2,15,5,3,6,12,5,4,7,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
67,10,14,14,10,15,6,7,5,15,14,10,15,13,9,8,12,12,11,9,0
68,10,12,15,10,15,8,8,5,15,14,8,12,14,11,7,13,15,11,14,0
69,10,8,11,6,10,6,4,3,13,9,8,14,12,9,7,11,12,10,10,0
70,9,12,13,10,13,6,6,5,14,13,10,13,12,11,8,12,11,13,15,0


there are no strings to convert to classes . The dataset already has class defined in ca_cervix

In [2]:
cols = data.columns
data[cols] = data[cols].apply(pd.to_numeric, errors='coerce')

Now let us drop all the nulls 

In [17]:
data = data.dropna(axis=1,how='all')
data

Unnamed: 0,behavior_sexualRisk,behavior_eating,behavior_personalHygine,intention_aggregation,intention_commitment,attitude_consistency,attitude_spontaneity,norm_significantPerson,norm_fulfillment,perception_vulnerability,perception_severity,motivation_strength,motivation_willingness,socialSupport_emotionality,socialSupport_appreciation,socialSupport_instrumental,empowerment_knowledge,empowerment_abilities,empowerment_desires,ca_cervix
0,10,13,12,4,7,9,10,1,8,7,3,14,8,5,7,12,12,11,8,1
1,10,11,11,10,14,7,7,5,5,4,2,15,13,7,6,5,5,4,4,1
2,10,15,3,2,14,8,10,1,4,7,2,7,3,3,6,11,3,3,15,1
3,10,11,10,10,15,7,7,1,5,4,2,15,13,7,4,4,4,4,4,1
4,8,11,7,8,10,7,8,1,5,3,2,15,5,3,6,12,5,4,7,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
67,10,14,14,10,15,6,7,5,15,14,10,15,13,9,8,12,12,11,9,0
68,10,12,15,10,15,8,8,5,15,14,8,12,14,11,7,13,15,11,14,0
69,10,8,11,6,10,6,4,3,13,9,8,14,12,9,7,11,12,10,10,0
70,9,12,13,10,13,6,6,5,14,13,10,13,12,11,8,12,11,13,15,0


let's identify the target and source columns

In [19]:
T = data['ca_cervix']
X = data.to_numpy()[:,0:19]
T = T.to_numpy().reshape(-1,1)
X

array([[10, 13, 12, ..., 12, 11,  8],
       [10, 11, 11, ...,  5,  4,  4],
       [10, 15,  3, ...,  3,  3, 15],
       ...,
       [10,  8, 11, ..., 12, 10, 10],
       [ 9, 12, 13, ..., 11, 13, 15],
       [10, 14, 14, ..., 13, 15, 15]], dtype=int64)

In [5]:
T.shape

(72, 1)

In [20]:
Xtrain= X[:50,:]
Xtrain.shape
Ttrain=T[:50,:]
Ttrain.shape

(50, 1)

In [21]:
Xtest= X[50:,:]
Xtest.shape
Ttrain=T[50:,:]
Ttrain.shape

(22, 1)

Here we try to partition data into 2 parts based on division factor which is 0.8 i.e 80%

In [25]:
np.unique(Ttrain)

array([0], dtype=int64)

Now we run our data through different models and observe the lkelihood

In [15]:
np.unique(Ttrain)

array([0], dtype=int64)

In [30]:
import neuralnetworks as nn
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

errors = []
method_rhos_epoches_structure = [('sgd', 0.001, 50,[50,100]),
               ('sgd', 0.005, 10,[10,10]),
               ('sgd', 0.01, 5,[30,30,30]),
               ('sgd', 0.1, 100,[]),
               ('sgd', 0.05, 30,[10]),
               ('adam', 0.001,50,[20]),
               ('adam', 0.01,50,[20,10]),
               ('adam', 0.005,10,[30,30,30]),
               ('adam', 0.05,30,[]),
               ('adam', 0.01,35,[]),
               ('scg', None,50,[20]) ,
               ('scg', None,30,[10,10]),
               ('scg', None,30,[]),
                ('scg', None,10,[10]),
               ('scg', None,40,[30,30,30])]

for method, rho, epoches, structure in method_rhos_epoches_structure:
    nnet = nn.NeuralNetworkClassifier(Xtrain.shape[1], structure, Ttrain[1])
    print(f' Running {method} with number of Epoch {epoches} and learning rate of {rho} Structure of hidden layers ={structure}')
    nnet.train(Xtrain, Ttrain, epoches, method=method, learning_rate=rho)
    result=[percent_correct(nnet.use(X)[0], T) for X, T in zip([Xtrain, Xval, Xtest],[Ttrain, Tval, Ttest])]
    print(f' percentage correct for training set {result[0]}')
    print(f' percentage correct for validation set {result[1]}')
    print(f' percentage correct for test set {result[2]}')
    errors.append(nnet.get_error_trace())

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
