# ReLU & Bias on Toy Problem 3

In exercise 690, we saw that if we **used a sigmoid** with bias parameters, the classifier could learn the XOR function.

Now, we use a ReLU instead of the sigmoid, with the bias terms in addition to the weights.

We expect the bias to give the classifier more power.

We've provided a utility class 'Data' (in data_reader.py) to load the training data (it works for all the toy problems).

In [1]:
import torch
import torch.nn.functional as F
from data_reader import Data

data = Data("data/toy_problem_3_train.txt")

labels, features = data.get_sample()

print("Labels:\n"+str(labels))

print("Features:\n"+str(features))
    
target = torch.autograd.Variable(torch.LongTensor(labels))
#print("Labels Tensor:\n"+str(target))

features = torch.autograd.Variable(torch.Tensor(features))
#print("Features Tensor:\n"+str(features))

Labels:
[0, 1, 0, 0, 1, 0, 1, 1, 0, 0]
Features:
[[48, 81], [-39, 75], [-48, -94], [81, 5], [-5, 50], [-92, -26], [-24, 22], [-10, 94], [-75, -92], [78, 62]]


We initialize the weights and biases (one set of weights and biases per layer) randomly.

In [2]:
middle = 4

weights1 = torch.nn.Parameter(torch.rand(2, middle))
print("Weights1 => "+str(weights1))

bias1 = torch.nn.Parameter(torch.rand(1, middle))
print("Bias1 => "+str(bias1))

weights2 = torch.nn.Parameter(torch.rand(middle, 2))
print("Weights2 => "+str(weights2))

bias2 = torch.nn.Parameter(torch.rand(1,2))
print("Bias2 => "+str(bias2))



Weights1 => Parameter containing:
tensor([[ 0.5236,  0.9937,  0.8793,  0.5058],
        [ 0.6404,  0.0323,  0.4522,  0.2071]])
Bias1 => Parameter containing:
tensor([[ 0.5174,  0.8738,  0.8567,  0.4814]])
Weights2 => Parameter containing:
tensor([[ 0.4804,  0.7107],
        [ 0.9554,  0.6189],
        [ 0.0527,  0.9612],
        [ 0.8901,  0.3731]])
Bias2 => Parameter containing:
tensor([[ 0.8208,  0.0828]])


We can now perform 1000 learning iterations below as many times as we want.

Notice that the code for the learning iterations is almost identical to that of exercise 630 but that we've used the Adam optimizer class in Pytorch to nudge the weights in the direction they must go.

In [3]:
optimizer = torch.optim.Adam([weights1, weights2, bias1, bias2], lr=0.01)

for i in range(1001):
    optimizer.zero_grad()   # zero the gradient buffers
    
    labels, features = data.get_sample(1000)
    
    features = torch.autograd.Variable(torch.Tensor(features))
    #print("Features: "+str(features))
    
    target = torch.autograd.Variable(torch.LongTensor(labels))
    #print("Target: "+str(target))
    
    result = features.mm(weights1) + bias1
    result1 = F.relu(result)
    result2 = result1.mm(weights2) + bias2
    
    loss = F.cross_entropy(result2, target)
    #print("Cross entropy loss: "+str(loss))

    loss.backward()
    
    optimizer.step()
        
    if i % 10 == 0:
        print("The loss is now "+str(loss.data.item()))

torch.save(weights1, "models/toy_problem_3_trained_deep_model_weights1.bin")
torch.save(weights2, "models/toy_problem_3_trained_deep_model_weights2.bin")
torch.save(bias1, "models/toy_problem_3_trained_deep_model_bias1.bin")
torch.save(bias2, "models/toy_problem_3_trained_deep_model_bias2.bin")

The loss is now 9.415506362915039
The loss is now 3.044528007507324
The loss is now 1.3942745923995972
The loss is now 1.148698091506958
The loss is now 0.7427251935005188
The loss is now 0.5695450901985168
The loss is now 0.5076305270195007
The loss is now 0.4617869555950165
The loss is now 0.4477054178714752
The loss is now 0.42276179790496826
The loss is now 0.39894604682922363
The loss is now 0.3705463111400604
The loss is now 0.33704569935798645
The loss is now 0.28844645619392395
The loss is now 0.25356411933898926
The loss is now 0.24803462624549866
The loss is now 0.22278311848640442
The loss is now 0.2283671498298645
The loss is now 0.21863271296024323
The loss is now 0.21036402881145477
The loss is now 0.2114981859922409
The loss is now 0.1935742348432541
The loss is now 0.18528889119625092
The loss is now 0.17361214756965637
The loss is now 0.18092025816440582
The loss is now 0.16048189997673035
The loss is now 0.15003038942813873
The loss is now 0.15122634172439575
The loss

## The Loss

Observe the loss that is printed at the end of every 10 iterations.

The loss decrease a lot more than it did when we used the sigmoid activation function.

The loss has not ceased decreasing after 1000 iterations (you can train it for thousands more iterations and improve it further).

## Parameters

We can now print the weights and the biases.

In [4]:
print("The first layer weights are now "+str(weights1.data))
print("and the second layer's weights are now "+str(weights2.data))
print("The first layer bias is now "+str(bias1.data))
print("and the second layer's bias is now "+str(bias2.data))

The first layer weights are now tensor([[-0.1011,  0.9002,  1.0588,  0.5838],
        [ 1.0819,  0.3983, -0.0719,  0.4183]])
and the second layer's weights are now tensor([[ 0.3297,  0.8613],
        [ 1.0011,  0.5731],
        [ 0.0570,  0.9569],
        [ 1.1003,  0.1630]])
The first layer bias is now tensor([[ 0.9092, -0.7460,  0.8659, -1.4666]])
and the second layer's bias is now tensor([[ 2.3198, -1.4162]])


## Classifier Test - Toy Problem 3

We have just trained a multilayer classifier for Toy Problem 3.

It seems to be learning (the loss on the training data decreases).

Let us evaluate the performance of the classifier on the test data.

In [5]:
data = Data("data/toy_problem_3_test.txt")

weights1 = torch.load("models/toy_problem_3_trained_deep_model_weights1.bin")
print(weights1)
weights2 = torch.load("models/toy_problem_3_trained_deep_model_weights2.bin")
print(weights2)
bias1 = torch.load("models/toy_problem_3_trained_deep_model_bias1.bin")
print(bias1)
bias2 = torch.load("models/toy_problem_3_trained_deep_model_bias2.bin")
print(bias2)

labels, features = data.get_all()

features = torch.autograd.Variable(torch.Tensor(features))
#print(features)

target = torch.autograd.Variable(torch.LongTensor(labels))
#print(target)

result = torch.mm(features, weights1) + bias1
result1 = F.relu(result)
result2 = torch.mm(result1, weights2) + bias2
#print(result2)

maxv, observed = torch.max(result2, 1)

total = 0
correct = 0
for i in range(len(labels)):
    total += 1
    #print(str(target.data[i]) + " " + str(observed.data[i]))
    if target.data[i] == observed.data[i]:
        correct += 1
accuracy = correct / total
print("Accuracy: "+str(accuracy))

tensor([[-0.1011,  0.9002,  1.0588,  0.5838],
        [ 1.0819,  0.3983, -0.0719,  0.4183]])
tensor([[ 0.3297,  0.8613],
        [ 1.0011,  0.5731],
        [ 0.0570,  0.9569],
        [ 1.1003,  0.1630]])
tensor([[ 0.9092, -0.7460,  0.8659, -1.4666]])
tensor([[ 2.3198, -1.4162]])
Accuracy: 0.986


As you can see, the accuracy is way better this time.

On this problem at least, ReLU seems to trump Sigmoid.