# Multi-Layer Neural Network using Sigmoid & Bias on Toy Problem 3

In exercise 670, we built a multi-layer classifier for Toy Problem 3 and used **the ReLU as the activation function**.

We saw that the classifier could get to 97% accuracy on Toy Problem 3.

In exercise 680, we saw that if we **used a sigmoid** instead of the ReLU, the classifier would not learn anything.

Now, we use a sigmoid again, but also use a bias in addition to the weights.

We see that the bias gives the classifier more power (it learns the XOR function or Toy Problem 3).

We've provided a utility class 'Data' (in data_reader.py) to load the training data (it works for all the toy problems).

In [1]:
import torch
import torch.nn.functional as F
from data_reader import Data

data = Data("data/toy_problem_3_train.txt")

labels, features = data.get_sample()

print("Labels:\n"+str(labels))

print("Features:\n"+str(features))
    
target = torch.autograd.Variable(torch.LongTensor(labels))
#print("Labels Tensor:\n"+str(target))

features = torch.autograd.Variable(torch.Tensor(features))
#print("Features Tensor:\n"+str(features))

Labels:
[0, 0, 0, 1, 1, 0, 1, 1, 1, 1]
Features:
[[-30, -49], [57, 97], [16, 5], [80, -1], [32, -49], [-73, -82], [28, -85], [89, -29], [44, -46], [83, -49]]


We initialize the weights and biases (one set of weights and biases per layer) randomly.

In [2]:
middle = 4

weights1 = torch.nn.Parameter(torch.rand(2, middle))
print("Weights1 => "+str(weights1))

bias1 = torch.nn.Parameter(torch.rand(1, middle))
print("Bias1 => "+str(bias1))

weights2 = torch.nn.Parameter(torch.rand(middle, 2))
print("Weights2 => "+str(weights2))

bias2 = torch.nn.Parameter(torch.rand(1,2))
print("Bias2 => "+str(bias2))



Weights1 => Parameter containing:
tensor([[ 0.7704,  0.6615,  0.6316,  0.8493],
        [ 0.6410,  0.6116,  0.6457,  0.7928]])
Bias1 => Parameter containing:
tensor([[ 0.9962,  0.6279,  0.1134,  0.6034]])
Weights2 => Parameter containing:
tensor([[ 0.7390,  0.6564],
        [ 0.2585,  0.0130],
        [ 0.3158,  0.5311],
        [ 0.8083,  0.1721]])
Bias2 => Parameter containing:
tensor([[ 0.4459,  0.6267]])


We can now perform 1000 learning iterations below as many times as we want.

Notice that the code for the learning iterations is almost identical to that of exercise 630 but that we've used the Adam optimizer class in Pytorch to nudge the weights in the direction they must go.

In [3]:
optimizer = torch.optim.Adam([weights1, weights2, bias1, bias2], lr=0.01)

for i in range(1001):
    optimizer.zero_grad()   # zero the gradient buffers
    
    labels, features = data.get_sample(1000)
    
    features = torch.autograd.Variable(torch.Tensor(features))
    #print("Features: "+str(features))
    
    target = torch.autograd.Variable(torch.LongTensor(labels))
    #print("Target: "+str(target))
    
    result = features.mm(weights1) + bias1
    result1 = F.sigmoid(result)
    result2 = result1.mm(weights2) + bias2
    
    loss = F.cross_entropy(result2, target)
    #print("Cross entropy loss: "+str(loss))

    loss.backward()
    
    optimizer.step()
        
    if i % 10 == 0:
        print("The loss is now "+str(loss.data.item()))

torch.save(weights1, "models/toy_problem_3_trained_deep_model_weights1.bin")
torch.save(weights2, "models/toy_problem_3_trained_deep_model_weights2.bin")
torch.save(bias1, "models/toy_problem_3_trained_deep_model_bias1.bin")
torch.save(bias2, "models/toy_problem_3_trained_deep_model_bias2.bin")

The loss is now 0.722926914691925
The loss is now 0.6985739469528198
The loss is now 0.6916922330856323
The loss is now 0.6909358501434326
The loss is now 0.6848089694976807
The loss is now 0.68853360414505
The loss is now 0.688930094242096
The loss is now 0.6906450986862183
The loss is now 0.6858261823654175
The loss is now 0.6870893836021423
The loss is now 0.6844713687896729
The loss is now 0.686631977558136
The loss is now 0.6817602515220642
The loss is now 0.6831540465354919
The loss is now 0.6796778440475464
The loss is now 0.6721691489219666
The loss is now 0.6748241782188416
The loss is now 0.671863853931427
The loss is now 0.6672892570495605
The loss is now 0.6181325316429138
The loss is now 0.611628532409668
The loss is now 0.5935599207878113
The loss is now 0.5560938715934753
The loss is now 0.5563618540763855
The loss is now 0.5367292165756226
The loss is now 0.5396347641944885
The loss is now 0.5033348798751831
The loss is now 0.5050330758094788
The loss is now 0.459596604

## The Loss

Observe the loss that is printed at the end of every 10 iterations.

The loss does decrease this time.

This tells us that the machine learning algorithm is learning.

## Parameters

We can now print the weights and the biases.

In [4]:
print("The first layer weights are now "+str(weights1.data))
print("and the second layer's weights are now "+str(weights2.data))
print("The first layer bias is now "+str(bias1.data))
print("and the second layer's bias is now "+str(bias2.data))

The first layer weights are now tensor([[ 0.0815,  0.0523,  0.0835,  0.2602],
        [ 0.0701,  0.0548,  0.4841,  0.7988]])
and the second layer's weights are now tensor([[-1.9810,  3.3764],
        [ 3.3219, -3.0504],
        [-1.2035,  2.0503],
        [ 2.1574, -1.1770]])
The first layer bias is now tensor([[ 4.7902, -4.3313,  5.1664, -5.1487]])
and the second layer's bias is now tensor([[ 2.1386, -1.0660]])


## Classifier Test - Toy Problem 3

We have just trained a multilayer classifier for Toy Problem 3.

It seems to be learning (the loss on the training data decreases).

Let us evaluate the performance of the classifier on the test data.

In [5]:
data = Data("data/toy_problem_3_test.txt")

weights1 = torch.load("models/toy_problem_3_trained_deep_model_weights1.bin")
print(weights1)
weights2 = torch.load("models/toy_problem_3_trained_deep_model_weights2.bin")
print(weights2)
bias1 = torch.load("models/toy_problem_3_trained_deep_model_bias1.bin")
print(bias1)
bias2 = torch.load("models/toy_problem_3_trained_deep_model_bias2.bin")
print(bias2)

labels, features = data.get_all()

features = torch.autograd.Variable(torch.Tensor(features))
#print(features)

target = torch.autograd.Variable(torch.LongTensor(labels))
#print(target)

result = torch.mm(features, weights1) + bias1
result1 = F.sigmoid(result)
result2 = torch.mm(result1, weights2) + bias2
#print(result2)

maxv, observed = torch.max(result2, 1)

total = 0
correct = 0
for i in range(len(labels)):
    total += 1
    #print(str(target.data[i]) + " " + str(observed.data[i]))
    if target.data[i] == observed.data[i]:
        correct += 1
accuracy = correct / total
print("Accuracy: "+str(accuracy))

tensor([[ 0.0815,  0.0523,  0.0835,  0.2602],
        [ 0.0701,  0.0548,  0.4841,  0.7988]])
tensor([[-1.9810,  3.3764],
        [ 3.3219, -3.0504],
        [-1.2035,  2.0503],
        [ 2.1574, -1.1770]])
tensor([[ 4.7902, -4.3313,  5.1664, -5.1487]])
tensor([[ 2.1386, -1.0660]])
Accuracy: 0.843


As you can see, the accuracy is better than 50%.

The classifier has fared better than before when using bias terms.

It tells us that the multi-layer neural network (with a bias term) **is able to learn the non-linear XOR function using the sigmoid activation function**.
