# LELA60331 Week 9 Seminar

This week we are going to look at multiclass classification and multilayer networks

### Multiclass classification problems

While logistic regression is great for binary classification tasks, many classification problems have more than two possible outcomes.  We can simulate such a situation as follows. I have just generalised sentiment analysis to a three class problem - negative, neutral and positive.



In [None]:
import numpy as np
## Create simulated data
np.random.seed(10)
w1_center = (1, 3)
w2_center = (3, 1)
w3_center = (1, 1)
w4_center = (3, 3)

x=np.concatenate((np.random.normal(loc=w1_center,size=(20,2)),np.random.normal(loc=w2_center,size=(20,2)),np.random.normal(loc=w3_center,size=(10,2)),np.random.normal(loc=w4_center,size=(10,2))))
labs=np.repeat([0,1,2],[20,20,20],axis=0)
y=np.repeat(np.diag((1,1,1)),[20,20,20],axis=0)
x=x.T
y=y.T

In [None]:
plt.scatter(x[0][labs==0], x[1][labs==0], marker='*', s=100)
plt.scatter(x[0][labs==1], x[1][labs==1], marker='o', s=100)
plt.scatter(x[0][labs==2], x[1][labs==2], marker='x', s=100)
plt.xlabel("log count of negative words")
plt.ylabel("log count of positive words")
plt.xlim((0,5))
plt.ylim((0,5))



### Softmax
In such circumstances we need to use multinomial logistic (aka softmax) regression.

In logistic regression we take the dot product between our feature vector for each data point and our weight vector. We then add the bias to give us a single z value which we feed through the sigmoid function. We can have only one z values because there are only two outcomes and the following relationship holds:
p(y=0|x) = 1-p(y-1)

In multinomial regression we instead have a z value for each of our possible outcomes. We can use these collectively to calculate probabilties for each of our possible outcomes. For example if we had three possible outcomes, 0, 1 or 2 then we would calculate their probabilities as follows:

$p(y=0|x) = \frac{exp(z_{0})}{\sum_{i,N} exp(z_i)}$ \\
$p(y=1|x) = \frac{exp(z_{1})}{\sum_{i,N} exp(z_i)}$ \\
$p(y=2|x) = \frac{exp(z_{2})}{\sum_{i,N} exp(z_i)}$ \\


Problem 1: A fitted model might return the following weights. In Python calculate the probabilites of each of the output classes for the following inputs. \\


a) x[0] (positive words) = 10, x[1] (negative words) = 3 \\
a) x[0] (positive words) = 3, x[1] (negative words) = 3 \\
a) x[0] (positive words) = 1, x[1] (negative words) = 6 \\


In [None]:
bias_negative=-0.82031125
bias_positive=-0.451126
bias_neutral = 1.27143725

weights_negative = np.array([-0.69900716, 1.81182487])
weights_positive = np.array([1.7979912 , -0.74611263])
weights_neutral = np.array([0.80449184, -0.07135976])

Note: for convenience you can print a float with scientific notation with the  function np.format_float_positional, as in the following:

In [None]:
x=1/783618
x

In [None]:
np.format_float_positional(x)

### Representing multinomial logistic regression problems

In multinomial logistic regression we have multiple outcome classes. In place of the single 0 or 1 that we used as outcome in binary logistic regression, we represent the outcome using a vector of 0s and 1, with each position in the vector corresponding to one of the output classes. \\

positive = [1,0,0] \\
negative = [0,1,0] \\
neutral = [0,0,1] \\

This is how the y variable looks in our simulated data:

In [None]:
y

In [None]:
y.T[1:20]

#Fitting multinomial logistic regression models

The relationship between this representation and the way we represent binary logistic regression is helpful in generalising the process of model fitting:

Instead of using a regression equation to predict a single z value, in an n-class classifying we use n regression equations to predict n z values. These are then converted to n probabilties using the softmax function.

When calculating loss and gradients in binary logistic regression we look at
the difference between a single probability estimate and a single binary value for each datapoint for each of the m weights/input features. So that the gradient that we use to update a weight i is the mean of the following value over all datapoints:

$ g_i =  (q - y) * x_{i}  \\ =     (p(y = 1|x) - y) * x_{i}$

In multinomial logistic regression, we compare each of our n probabilities to each of n binary values when updating each of our n x k weights for each of our k input features.  So for m classes 0, 1 and 2 we would calculate the average of the following over all datapoints:

$ g_{i}^{0} = (q^{0} - y^{0}) * x_{i}    =     (p(y^{0} = 1|x) - y^{0}) * x_{i}$

$ g_{i}^{1} = (q^{1} - y^{1}) * x_{i}    =     (p(y^{1} = 1|x) - y^{0}) * x_{i}$  

$ g_{i}^{2} = (q^{2} - y^{2}) * x_{i}    =     (p(y^{2} = 1|x) - y^{0}) * x_{i}$  


Problem 2: Complete the code below so that it fits a softmax regression to our multiclass data

In [None]:
np.random.seed(10)
n_iters = 2500
num_features=2
num_classes=3
num_samples = len(y[0])
weights = np.random.rand(num_classes,num_features)
bias=np.zeros(num_classes)
lr=0.1
logistic_loss=[]
z=np.zeros((num_samples,num_classes))
q=np.zeros((num_samples,num_classes))

for i in range(n_iters):
    z[:,0]=???
    z[:,1]=???
    z[:,2]=???

    q[:,0] = ???
    q[:,1] = ???
    q[:,2] = ???

    loss = sum(-(y[0]*np.log2(q[:,0])+(1-y[0])*np.log2(1-q[:,0])))/num_samples
    loss += sum(-(y[1]*np.log2(q[:,1])+(1-y[1])*np.log2(1-q[:,1])))/num_samples
    loss += sum(-(y[2]*np.log2(q[:,2])+(1-y[2])*np.log2(1-q[:,2])))/num_samples
    logistic_loss.append(loss)

    dw01 = ???
    dw02 = ???

    dw11 = ???
    dw12 = ???

    dw21 = ???
    dw22 = ???

    db0 = ???
    db1 = ???
    db2 = ???

    weights[0,0] = weights[0,0] - dw01*lr
    weights[0,1] = weights[0,1] - dw02*lr

    weights[1,0] = weights[1,0] - dw11*lr
    weights[1,1] = weights[1,1] - dw12*lr

    weights[2,0] = weights[2,0] - dw21*lr
    weights[2,1] = weights[2,1] - dw22*lr

    bias[0] = bias[0] - db0*lr
    bias[1] = bias[1] - db1*lr
    bias[2] = bias[2] - db2*lr

plt.plot(range(1,n_iters),logistic_loss[1:])
plt.xlabel("number of epochs")
plt.ylabel("loss")

Problem 3 (difficult addition to try in your own time if you feel able). Rewrite your softmax regression so that it calculates z, q, gradients and weights in single lines using operations over whole matrices rather than subsetting for particular rows, columns or elements. See assigned readings on linear algebra to remind you of numpy operations if needed.

In [None]:
np.random.seed(10)
n_iters = 2500
num_features=2
num_classes=3
num_samples = len(y[0])
weights = np.random.rand(num_classes,num_features)
bias=np.zeros(num_classes)
lr=0.1
logistic_loss=[]
z=np.zeros((num_samples,num_classes))
q=np.zeros((num_samples,num_classes))

for i in range(n_iters):
    z=???

    q = ???
    q = ???
    q = ???

    loss = sum(-(y[0]*np.log2(q[:,0])+(1-y[0])*np.log2(1-q[:,0])))/num_samples
    loss += sum(-(y[1]*np.log2(q[:,1])+(1-y[1])*np.log2(1-q[:,1])))/num_samples
    loss += sum(-(y[2]*np.log2(q[:,2])+(1-y[2])*np.log2(1-q[:,2])))/num_samples
    logistic_loss.append(loss)

    dw = ??
    db = ??


    weights = ??
    bias = ??

plt.plot(range(1,n_iters),logistic_loss[1:])
plt.xlabel("number of epochs")
plt.ylabel("loss")

### Multilayer neural networks

Problem 4: Given the weights and the input vector below, write the code to calculate the predicted output value y_hat

In [None]:

import numpy as np
weights_0_1=np.array([[-.59,.75,-.95],[.34,-.17,.12],[-.72,-.6,.6]])
weights_1_2=np.array([[.93],[-.37],[.38]])
alpha=0.2
layer_0 = np.array([[ 0, 1, 1 ]])



Problem 5: Given the true label y=1 as specified below, write the code to perform the backwards pass and update the weights.

In [None]:
true_label = np.array([[1]])

Problem 6. Given the training data below, complete the code to train the network to solve the problem. You can use the code in the block below the training block to test that the models predictions from the training inputs are approximately correct.

In [None]:
inputs = np.array( [[ 0, 0, 1 ],
                          [ 0, 1, 1 ],
                          [ 1, 0, 1 ],
                          [ 1, 1, 1 ] ] )

true_labels = np.array([ [0], [1], [1], [0]])

In [None]:
import matplotlib.pyplot as plt

alpha = 0.2
num_features=3
hidden_size = 3
np.random.seed(1)
weights_0_1 = np.random.rand(num_features,hidden_size)
np.random.seed(1)
weights_1_2 = np.random.rand(hidden_size,1)

loss = []
n_iters=1000
for iteration in range(n_iters):
   layer_2_error = 0
   for i in range(len(inputs)):
      layer_0 = inputs[i]
      ## Add forward pass
      ## Add backward pass and update weights

   loss.append(layer_2_error)
plt.plot(range(1,n_iters),loss[1:])
plt.xlabel("number of epochs")
plt.ylabel("loss")

In [None]:
for k in range(4):
  layer_0 = inputs[k]
  layer_1 = np.maximum(np.dot(layer_0,weights_0_1),0)
  layer_2 = np.dot(layer_1,weights_1_2)
  print(layer_2)