# Multi-Layer Neural Network using Sigmoid & Bias on Toy Problem 3

In exercise 670, we built a multi-layer classifier for Toy Problem 3 and used **the ReLU as the activation function**.

We saw that the classifier could get to 97% accuracy on Toy Problem 3.

In exercise 680, we saw that if we **used a sigmoid** instead of the ReLU, the classifier would not learn anything.

Now, we use a sigmoid again, but also use a bias in addition to the weights.

We see that the bias gives the classifier more power (it learns the XOR function or Toy Problem 3).

We've provided a utility class 'Data' (in data_reader.py) to load the training data (it works for all the toy problems).

In [1]:
import torch
import torch.nn.functional as F
from data_reader import Data

data = Data("data/toy_problem_3_train.txt")

labels, features = data.get_sample()

print("Labels:\n"+str(labels))

print("Features:\n"+str(features))
    
target = torch.autograd.Variable(torch.LongTensor(labels))
#print("Labels Tensor:\n"+str(target))

features = torch.autograd.Variable(torch.Tensor(features))
#print("Features Tensor:\n"+str(features))

Labels:
[0, 0, 1, 1, 0, 1, 0, 0, 1, 1]
Features:
[[-92, -88], [-18, -16], [-86, 96], [-7, 8], [-29, -85], [38, -54], [23, 1], [-11, -49], [77, -44], [-12, 16]]


We initialize the weights and biases (one set of weights and biases per layer) randomly.

In [2]:
middle = 4

weights1 = torch.nn.Parameter(torch.rand(2, middle))
print("Weights1 => "+str(weights1))

bias1 = torch.nn.Parameter(torch.rand(1, middle))
print("Bias1 => "+str(bias1))

weights2 = torch.nn.Parameter(torch.rand(middle, 2))
print("Weights2 => "+str(weights2))

bias2 = torch.nn.Parameter(torch.rand(1,2))
print("Bias2 => "+str(bias2))



Weights1 => Parameter containing:
 0.3077  0.7570  0.8574  0.7307
 0.1103  0.6138  0.7258  0.7506
[torch.FloatTensor of size 2x4]

Bias1 => Parameter containing:
 0.9492  0.6303  0.3055  0.0644
[torch.FloatTensor of size 1x4]

Weights2 => Parameter containing:
 0.0210  0.0807
 0.9764  0.7614
 0.7940  0.0360
 0.3574  0.7422
[torch.FloatTensor of size 4x2]

Bias2 => Parameter containing:
 0.0214  0.4595
[torch.FloatTensor of size 1x2]



We can now perform 1000 learning iterations below as many times as we want.

Notice that the code for the learning iterations is almost identical to that of exercise 630 but that we've used the Adam optimizer class in Pytorch to nudge the weights in the direction they must go.

In [3]:
optimizer = torch.optim.Adam([weights1, weights2, bias1, bias2], lr=0.01)

for i in range(1001):
    optimizer.zero_grad()   # zero the gradient buffers
    
    labels, features = data.get_sample(1000)
    
    features = torch.autograd.Variable(torch.Tensor(features))
    #print("Features: "+str(features))
    
    target = torch.autograd.Variable(torch.LongTensor(labels))
    #print("Target: "+str(target))
    
    result = features.mm(weights1) + bias1
    result1 = F.sigmoid(result)
    result2 = result1.mm(weights2) + bias2
    
    loss = F.cross_entropy(result2, target)
    #print("Cross entropy loss: "+str(loss))

    loss.backward()
    
    optimizer.step()
        
    if i % 10 == 0:
        print("The loss is now "+str(loss.data[0]))

torch.save(weights1, "models/toy_problem_3_trained_deep_model_weights1.bin")
torch.save(weights2, "models/toy_problem_3_trained_deep_model_weights2.bin")
torch.save(bias1, "models/toy_problem_3_trained_deep_model_bias1.bin")
torch.save(bias2, "models/toy_problem_3_trained_deep_model_bias2.bin")

The loss is now 0.7097412943840027
The loss is now 0.694894552230835
The loss is now 0.6865906119346619
The loss is now 0.6885678172111511
The loss is now 0.6736515164375305
The loss is now 0.6638016104698181
The loss is now 0.6568567156791687
The loss is now 0.638590395450592
The loss is now 0.6303065419197083
The loss is now 0.6224498152732849
The loss is now 0.6118046045303345
The loss is now 0.5993473529815674
The loss is now 0.5863654017448425
The loss is now 0.5827643871307373
The loss is now 0.5612202882766724
The loss is now 0.560285210609436
The loss is now 0.5578759908676147
The loss is now 0.5514324307441711
The loss is now 0.5636094808578491
The loss is now 0.5348323583602905
The loss is now 0.5476241707801819
The loss is now 0.5421059727668762
The loss is now 0.5163143277168274
The loss is now 0.5056638121604919
The loss is now 0.48068585991859436
The loss is now 0.4549301266670227
The loss is now 0.4588942527770996
The loss is now 0.4278877377510071
The loss is now 0.4427

## The Loss

Observe the loss that is printed at the end of every 10 iterations.

The loss does decrease this time.

This tells us that the machine learning algorithm is learning.

## Parameters

We can now print the weights and the biases.

In [4]:
print("The first layer weights are now "+str(weights1.data))
print("and the second layer's weights are now "+str(weights2.data))
print("The first layer bias is now "+str(bias1.data))
print("and the second layer's bias is now "+str(bias2.data))

The first layer weights are now 
 0.0454  0.0642  0.3122  0.2164
 0.0561  0.0568  0.2685  0.1900
[torch.FloatTensor of size 2x4]

and the second layer's weights are now 
-2.6972  2.7989
 3.8796 -2.1417
 1.4229 -0.5929
-0.5432  1.6427
[torch.FloatTensor of size 4x2]

The first layer bias is now 
 4.7242 -5.3183 -4.5289  4.4326
[torch.FloatTensor of size 1x4]

and the second layer's bias is now 
 2.3795 -1.8987
[torch.FloatTensor of size 1x2]



## Classifier Test - Toy Problem 3

We have just trained a multilayer classifier for Toy Problem 3.

It seems to be learning (the loss on the training data decreases).

Let us evaluate the performance of the classifier on the test data.

In [6]:
data = Data("data/toy_problem_3_test.txt")

weights1 = torch.load("models/toy_problem_3_trained_deep_model_weights1.bin")
print(weights1)
weights2 = torch.load("models/toy_problem_3_trained_deep_model_weights2.bin")
print(weights2)
bias1 = torch.load("models/toy_problem_3_trained_deep_model_bias1.bin")
print(bias1)
bias2 = torch.load("models/toy_problem_3_trained_deep_model_bias2.bin")
print(bias2)

labels, features = data.get_all()

features = torch.autograd.Variable(torch.Tensor(features))
#print(features)

target = torch.autograd.Variable(torch.LongTensor(labels))
#print(target)

result = torch.mm(features, weights1) + bias1
result1 = F.sigmoid(result)
result2 = torch.mm(result1, weights2) + bias2
#print(result2)

maxv, observed = torch.max(result2, 1)

total = 0
correct = 0
for i in range(len(labels)):
    total += 1
    #print(str(target.data[i]) + " " + str(observed.data[i]))
    if target.data[i] == observed.data[i]:
        correct += 1
accuracy = correct / total
print("Accuracy: "+str(accuracy))

Parameter containing:
 0.0454  0.0642  0.3122  0.2164
 0.0561  0.0568  0.2685  0.1900
[torch.FloatTensor of size 2x4]

Parameter containing:
-2.6972  2.7989
 3.8796 -2.1417
 1.4229 -0.5929
-0.5432  1.6427
[torch.FloatTensor of size 4x2]

Parameter containing:
 4.7242 -5.3183 -4.5289  4.4326
[torch.FloatTensor of size 1x4]

Parameter containing:
 2.3795 -1.8987
[torch.FloatTensor of size 1x2]

Accuracy: 0.861


As you can see, the accuracy is better than 50%.

The classifier has fared better than before when using bias terms.

It tells us that the multi-layer neural network (with a bias term) **is able to learn the non-linear XOR function using the sigmoid activation function**.
