# ReLU & Bias on Toy Problem 3

In exercise 690, we saw that if we **used a sigmoid** with bias parameters, the classifier could learn the XOR function.

Now, we use a ReLU instead of the sigmoid, with the bias terms in addition to the weights.

We expect the bias to give the classifier more power.

We've provided a utility class 'Data' (in data_reader.py) to load the training data (it works for all the toy problems).

In [1]:
import torch
import torch.nn.functional as F
from data_reader import Data

data = Data("data/toy_problem_3_train.txt")

labels, features = data.get_sample()

print("Labels:\n"+str(labels))

print("Features:\n"+str(features))
    
target = torch.autograd.Variable(torch.LongTensor(labels))
#print("Labels Tensor:\n"+str(target))

features = torch.autograd.Variable(torch.Tensor(features))
#print("Features Tensor:\n"+str(features))

Labels:
[1, 1, 1, 1, 0, 0, 1, 0, 0, 1]
Features:
[[-58, 10], [56, -8], [78, -43], [53, -57], [-43, -14], [-9, -68], [97, -58], [40, 67], [-90, -40], [-97, 5]]


We initialize the weights and biases (one set of weights and biases per layer) randomly.

In [2]:
middle = 4

weights1 = torch.nn.Parameter(torch.rand(2, middle))
print("Weights1 => "+str(weights1))

bias1 = torch.nn.Parameter(torch.rand(1, middle))
print("Bias1 => "+str(bias1))

weights2 = torch.nn.Parameter(torch.rand(middle, 2))
print("Weights2 => "+str(weights2))

bias2 = torch.nn.Parameter(torch.rand(1,2))
print("Bias2 => "+str(bias2))



Weights1 => Parameter containing:
 0.4443  0.1887  0.1967  0.7059
 0.1097  0.0477  0.3916  0.3397
[torch.FloatTensor of size 2x4]

Bias1 => Parameter containing:
 0.5594  0.7939  0.2642  0.9598
[torch.FloatTensor of size 1x4]

Weights2 => Parameter containing:
 0.4195  0.0154
 0.1422  0.0492
 0.5783  0.0510
 0.7182  0.8030
[torch.FloatTensor of size 4x2]

Bias2 => Parameter containing:
 0.5795  0.2133
[torch.FloatTensor of size 1x2]



We can now perform 1000 learning iterations below as many times as we want.

Notice that the code for the learning iterations is almost identical to that of exercise 630 but that we've used the Adam optimizer class in Pytorch to nudge the weights in the direction they must go.

In [3]:
optimizer = torch.optim.Adam([weights1, weights2, bias1, bias2], lr=0.01)

for i in range(1001):
    optimizer.zero_grad()   # zero the gradient buffers
    
    labels, features = data.get_sample(1000)
    
    features = torch.autograd.Variable(torch.Tensor(features))
    #print("Features: "+str(features))
    
    target = torch.autograd.Variable(torch.LongTensor(labels))
    #print("Target: "+str(target))
    
    result = features.mm(weights1) + bias1
    result1 = F.relu(result)
    result2 = result1.mm(weights2) + bias2
    
    loss = F.cross_entropy(result2, target)
    #print("Cross entropy loss: "+str(loss))

    loss.backward()
    
    optimizer.step()
        
    if i % 10 == 0:
        print("The loss is now "+str(loss.data[0]))

torch.save(weights1, "models/toy_problem_3_trained_deep_model_weights1.bin")
torch.save(weights2, "models/toy_problem_3_trained_deep_model_weights2.bin")
torch.save(bias1, "models/toy_problem_3_trained_deep_model_bias1.bin")
torch.save(bias2, "models/toy_problem_3_trained_deep_model_bias2.bin")

The loss is now 4.125748157501221
The loss is now 0.9919514656066895
The loss is now 0.5785403847694397
The loss is now 0.43390631675720215
The loss is now 0.3487151861190796
The loss is now 0.3022889792919159
The loss is now 0.2655614912509918
The loss is now 0.23335859179496765
The loss is now 0.22074243426322937
The loss is now 0.19434170424938202
The loss is now 0.18351207673549652
The loss is now 0.17960983514785767
The loss is now 0.16072426736354828
The loss is now 0.1588166058063507
The loss is now 0.15214304625988007
The loss is now 0.14005398750305176
The loss is now 0.1309504359960556
The loss is now 0.14025351405143738
The loss is now 0.12538421154022217
The loss is now 0.11924946308135986
The loss is now 0.11144011467695236
The loss is now 0.12391482293605804
The loss is now 0.10285323858261108
The loss is now 0.104105643928051
The loss is now 0.08798294514417648
The loss is now 0.10140535980463028
The loss is now 0.08999021351337433
The loss is now 0.09262876957654953
The

## The Loss

Observe the loss that is printed at the end of every 10 iterations.

The loss decrease a lot more than it did when we used the sigmoid activation function.

The loss has not ceased decreasing after 1000 iterations (you can train it for thousands more iterations and improve it further).

## Parameters

We can now print the weights and the biases.

In [4]:
print("The first layer weights are now "+str(weights1.data))
print("and the second layer's weights are now "+str(weights2.data))
print("The first layer bias is now "+str(bias1.data))
print("and the second layer's bias is now "+str(bias2.data))

The first layer weights are now 
 0.4739 -0.0791  0.5436  1.2241
 0.7023  0.9879  0.4664 -0.0864
[torch.FloatTensor of size 2x4]

and the second layer's weights are now 
 0.6246 -0.1897
-0.4027  0.5942
 0.7463 -0.1171
 0.3969  1.1243
[torch.FloatTensor of size 4x2]

The first layer bias is now 
-1.3011  0.5464 -1.2566 -0.0181
[torch.FloatTensor of size 1x4]

and the second layer's bias is now 
 2.2892 -1.4964
[torch.FloatTensor of size 1x2]



## Classifier Test - Toy Problem 3

We have just trained a multilayer classifier for Toy Problem 3.

It seems to be learning (the loss on the training data decreases).

Let us evaluate the performance of the classifier on the test data.

In [5]:
data = Data("data/toy_problem_3_test.txt")

weights1 = torch.load("models/toy_problem_3_trained_deep_model_weights1.bin")
print(weights1)
weights2 = torch.load("models/toy_problem_3_trained_deep_model_weights2.bin")
print(weights2)
bias1 = torch.load("models/toy_problem_3_trained_deep_model_bias1.bin")
print(bias1)
bias2 = torch.load("models/toy_problem_3_trained_deep_model_bias2.bin")
print(bias2)

labels, features = data.get_all()

features = torch.autograd.Variable(torch.Tensor(features))
#print(features)

target = torch.autograd.Variable(torch.LongTensor(labels))
#print(target)

result = torch.mm(features, weights1) + bias1
result1 = F.relu(result)
result2 = torch.mm(result1, weights2) + bias2
#print(result2)

maxv, observed = torch.max(result2, 1)

total = 0
correct = 0
for i in range(len(labels)):
    total += 1
    #print(str(target.data[i]) + " " + str(observed.data[i]))
    if target.data[i] == observed.data[i]:
        correct += 1
accuracy = correct / total
print("Accuracy: "+str(accuracy))

Parameter containing:
 0.4739 -0.0791  0.5436  1.2241
 0.7023  0.9879  0.4664 -0.0864
[torch.FloatTensor of size 2x4]

Parameter containing:
 0.6246 -0.1897
-0.4027  0.5942
 0.7463 -0.1171
 0.3969  1.1243
[torch.FloatTensor of size 4x2]

Parameter containing:
-1.3011  0.5464 -1.2566 -0.0181
[torch.FloatTensor of size 1x4]

Parameter containing:
 2.2892 -1.4964
[torch.FloatTensor of size 1x2]

Accuracy: 0.985


As you can see, the accuracy is way better this time.

On this problem at least, ReLU seems to trump Sigmoid.