## Hands-on Challenge



We are going to implement gradient descent and train a model. Let's get started. 



In [None]:
import numpy as np

You will need all the code that we have written so far. Here are the base classes:



In [None]:
class FFNN:
    def __init__(self):
        self.net = []
        self.output = None
    def forward(self, x):
        for layer in self.net:
            x = layer.forward(x)
        self.output = x
        return x
    def backward(self, error):
        for layer in reversed(self.net):
            error = layer.backward(error)
    def zero_grad(self):
        for layer in self.net:
            layer.zero_grad()

class layer:
    def __init__(self, node_dim):
        self.input = np.zeros(node_dim)
        self.input_grad = np.zeros(node_dim)
        self.params = False
    def forward(self, x):
        self.input = x
    def backward(self):
        pass
    def parameters(self):
        pass
    def zero_grad(self):
        self.input_grad.fill(0.)

### Exercise: implement the backward pass



Remember from last week, your layers will need to conform to the template below. You will need to implement the `backward` and `parameters` methods. 

**EXERCISE:** Do this for the `linear` and `sigmoid` layers. 
OPTIONAL: Do the same for any other layers



In [None]:
class my_layer(layer):
    def __init__(self, node_dim):
        super(mylayer, self).__init__(node_dim)
        # if there are parameters: 
        # self.params = True

        # TODO instantiate parameters and also include gradient tensors of the same shape, ie.
        # self.parameter = np.array((some shape))
        # self.parameter_grad = np.zeros((same shape))

    def forward(self, x):
        self.input = x
        return # your results

    def backward(self, error):
        self.input_grad = # TODO compute the gradient of 'x', the layers input 

        # TODO compute any other gradients 

        # back propagate loss to previous layer 
        return self.input_grad

    def parameters(self):
        # TODO bundle parameters and their gradients together and return an iterable over them. 
        # for example:
        return zip([self.parameterA, self.parameterA_grad], [self.parameterB, self.parameterB_grad])

Copy and paste your code here:



In [None]:
# TODO paste your code here

### Exercise: implement a loss function



Here is the gradient descent code for our framework. 
**EXERCISE:** Implement one loss function and its derivate. I recommend mean-squared error (MSE).



In [None]:
class SGD():
    def __init__(self, net, batch_size, lr=0.05):
        # reference to your neural network
        self.net = net

        # the number of samples in your training set, per epoch
        self.N = batch_size

        # the learning rate
        self.lr = lr

        # small number, might come in handy.
        self.eps = np.finfo(float).eps

    # send back the mean loss through the neural network
    def backward(self, error):
        self.net.backward(error / self.N)


    # update any parameters with their respective gradients
    def step(self):
        for layer in self.net.net:
            if layer.params:
                for p, p_grad in layer.parameters():
                    p -= self.lr * p_grad


    # clear all the gradients for the next epoch
    def zero_grad(self):
        self.net.zero_grad()


    # TODO fill out one loss function and its derivate
    # mean squared error
    def MSE(self, y, yhat):
        pass

    # gradient of mean squared error
    def MSE_grad(self, y, yhat):
        pass

    # cross entropy
    def CE(self, y, yhat):
        pass

    # gradient of cross entropy
    def CE_grad(self, y, yhat):
        pass

    # logloss 
    def LogLoss(self, y, yhat):
        pass

    # gradient of logloss 
    def LogLoss_grad(self, y, yhat):
        pass

### Prepare data



We will use the iris dataset for our task. The labels will be converted to one-hot encoding. That means we will want our model to have an output vector of size 3 (Setosa, Versicolour, Virginica).



In [None]:
def prep_data():
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split

    # load iris data, split and shuffle data sets
    iris = load_iris()

    x_train, x_test, y_train, y_test = train_test_split(
        iris.data, iris.target, test_size=0.2, shuffle=True)

    # Convert all data to numpy arrays
    for x in chain(x_train, y_train, x_test, y_test):
        x = np.array(x)

    # Convert labels to one-hot representation
    y_test = np.eye(3)[y_test]
    y_train = np.eye(3)[y_train]

    return x_train, x_test, y_train, y_test


x_train, x_test, y_train, y_test = prep_data()

### Exercise: Instantiate Model



Next we will instantiate a model. We will use linear layers for the trainable part. For activation functions you can try any, but the last layer should be either sigmoid, tanh, or softmax.  
There is no right answer here, the architecture of the network is left open for you to select.



In [None]:
class ournet(FFNN):
    def __init__(self):
        super(ournet, self).__init__()
        self.net.append(linear())
        self.net.append(sigmoid())
        self.net.append(linear())
        self.net.append(sigmoid())

### Training



Now let's train our model and see how it does. You may need to play with the model architecture to get it right.



In [None]:
from tqdm import trange
from sklearn.utils import shuffle

model = ournet()
optimize = SGD(model, len(train_y), lr=0.5)

In [None]:
t = trange(150)
for epoch in t:
    for x, y in zip(x_train, y_train):
        out = model.forward(x)
        loss = optimize.MSE_grad(y, out)
        optimize.backward(loss)

    optimize.step()
    optimize.zero_grad()

    # shuffle data for next epoch
    x_train, y_train = shuffle(x_train, y_train)

    # Test the results for every epoch 
    correct = 0
    for x, y in zip(x_test, y_test):
        # Convert output to argmax
        out = np.eye(3)[np.argmax(net.forward(x))]
        if np.array_equal(y, out):
            correct += 1

    t.set_description(f"(acc={correct/len(y_test)})")))

Try some samples from the test set



In [None]:
def predict(data_index):
    target_name = ({0: 'setosa', 1: 'versicolor', 2: 'virginica'})
    out = np.argmax(net.forward(x_test[data_index]))
    y_out = target_name[y_test[data_index]]
    print(f"idx: {data_index},  Prediction: {target_name[out]}   Actual: {y_out}")

In [None]:
predict(10)