# Week 8 Seminar: From Binary Logistic Regression to Multiclass Multilayer Neural Networks - part 1





This week we are going to look at multiclass classification and start looking at multilayer networks

### Multiclass classification problems

While logistic regression is great for binary classification tasks, many classification problems have more than two possible outcomes.  We can simulate such a situation as follows. I have just generalised sentiment analysis to a three class problem - negative, neutral and positive.



In [1]:
import numpy as np
## Create simulated data
np.random.seed(10)
w1_center = (1, 3)
w2_center = (3, 1)
w3_center = (1, 1)
w4_center = (3, 3)

x=np.concatenate((np.random.normal(loc=w1_center,size=(20,2)),np.random.normal(loc=w2_center,size=(20,2)),np.random.normal(loc=w3_center,size=(10,2)),np.random.normal(loc=w4_center,size=(10,2))))
labs=np.repeat([0,1,2],[20,20,20],axis=0)
y=np.repeat(np.diag((1,1,1)),[20,20,20],axis=0)
x=x.T
y=y.T

In [None]:
plt.scatter(x[0][labs==0], x[1][labs==0], marker='*', s=100)
plt.scatter(x[0][labs==1], x[1][labs==1], marker='o', s=100)
plt.scatter(x[0][labs==2], x[1][labs==2], marker='x', s=100)
plt.xlabel("log count of negative words")
plt.ylabel("log count of positive words")
plt.xlim((0,5))
plt.ylim((0,5))



### Softmax
In such circumstances we need to use multinomial logistic (aka softmax) regression.

In logistic regression we take the dot product between our feature vector for each data point and our weight vector. We then add the bias to give us a single z value which we feed through the sigmoid function. We can have only one z values because there are only two outcomes and the following relationship holds:
p(y=0|x) = 1-p(y-1)

In multinomial regression we instead have a z value for each of our possible outcomes. We can use these collectively to calculate probabilties for each of our possible outcomes. For example if we had three possible outcomes, 0, 1 or 2 then we would calculate their probabilities as follows:

$p(y=0|x) = \frac{exp(z_{0})}{\sum_{i,N} exp(z_i)}$ \\
$p(y=1|x) = \frac{exp(z_{1})}{\sum_{i,N} exp(z_i)}$ \\
$p(y=2|x) = \frac{exp(z_{2})}{\sum_{i,N} exp(z_i)}$ \\


Problem 1: A fitted model might return the following weights. In Python calculate the probabilites of each of the output classes for the following inputs. \\


a) x[0] (positive words) = 10, x[1] (negative words) = 3 \\
a) x[0] (positive words) = 3, x[1] (negative words) = 3 \\
a) x[0] (positive words) = 1, x[1] (negative words) = 6 \\


In [None]:
bias_negative=-0.82031125
bias_positive=-0.451126
bias_neutral = 1.27143725

weights_negative = np.array([-0.69900716, 1.81182487])
weights_positive = np.array([1.7979912 , -0.74611263])
weights_neutral = np.array([0.80449184, -0.07135976])

Note: for convenience you can print a float with scientific notation with the  function np.format_float_positional, as in the following:

In [None]:
x=1/783618
x

In [None]:
np.format_float_positional(x)

### Representing multinomial logistic regression problems

In multinomial logistic regression we have multiple outcome classes. In place of the single 0 or 1 that we used as outcome in binary logistic regression, we represent the outcome using a vector of 0s and 1, with each position in the vector corresponding to one of the output classes.

positive = [1,0,0] \
negative = [0,1,0] \
neutral = [0,0,1]

This is how the y variable looks in our simulated data:

In [None]:
y

In [None]:
y.T[1:20]

### Exclusive OR problem

Problem 2: Create the data for the AND, OR, and XOR functions. Fit a logistic regression to these problem using the code you have developed in previous problems and then inspect the output. What do you see?

In [None]:
x=np.array([])
y=np.array([])

### Your first multilayer network

Problem 3: Enter the weights from the example multilayer network from the lecture and demonstrate that it can solve the XOR problem
Remember that you have two sets of weights - those from layer 0 to layer 1 and those from layer 1 to layer 2 - and that the former is a matrix not a vector.
