# Multi-Layer Neural Network Classifier for Toy Problem 3

In exercise 630, we built a single layer classifier for Toy Problem 3.

We saw that for the single-layer neural network, the loss did not decrease despite the prolonged use of gradient descent.

The data used in Toy Problem 3 was a version of the XOR function -the categories in this data set are not linearly separable - and so it is impossible for a single-layer neural network (or any other linear classifier) to do well on this dataset.

Now let's build a multi-layer classifier for Toy Problem 3 using the ReLU non-linearity (which was discovered/invented only in the year 2000) and see how it fares.

We've provided a utility class 'Data' (in data_reader.py) to load the training data (it works for all the toy problems).

In [1]:
import torch
import torch.nn.functional as F
from data_reader import Data

data = Data("data/toy_problem_3_train.txt")

labels, features = data.get_sample()

print("Labels:\n"+str(labels))

print("Features:\n"+str(features))
    
target = torch.autograd.Variable(torch.LongTensor(labels))
#print("Labels Tensor:\n"+str(target))

features = torch.autograd.Variable(torch.Tensor(features))
#print("Features Tensor:\n"+str(features))

Labels:
[1, 0, 1, 0, 0, 0, 1, 1, 0, 1]
Features:
[[28, -24], [-40, -48], [-79, 12], [-28, -98], [-9, -91], [22, 64], [58, -6], [99, -16], [-6, -7], [-36, 43]]


We initialize the weights (one set of weights per layer) randomly.

In [2]:
middle = 4

weights1 = torch.nn.Parameter(torch.rand(2, middle))
print("Weights1 => "+str(weights1))

weights2 = torch.nn.Parameter(torch.rand(middle, 2))
print("Weights2 => "+str(weights2))


Weights1 => Parameter containing:
tensor([[ 0.8643,  0.9986,  0.7930,  0.6866],
        [ 0.1748,  0.6669,  0.2100,  0.0473]])
Weights2 => Parameter containing:
tensor([[ 0.9471,  0.9182],
        [ 0.0255,  0.2166],
        [ 0.6838,  0.3335],
        [ 0.8089,  0.7325]])


We can now perform 1000 learning iterations below as many times as we want.

Notice that the code for the learning iterations is almost identical to that of exercise Adam but that we've used the SGD optimizer class in Pytorch to nudge the weights in the direction they must go.

In [3]:
optimizer = torch.optim.Adam([weights1, weights2], lr=0.01)

for i in range(1001):
    optimizer.zero_grad()   # zero the gradient buffers
    
    labels, features = data.get_sample(1000)
    
    features = torch.autograd.Variable(torch.Tensor(features))
    #print("Features: "+str(features))
    
    target = torch.autograd.Variable(torch.LongTensor(labels))
    #print("Target: "+str(target))
    
    result = features.mm(weights1)
    result1 = F.relu(result)
    result2 = result1.mm(weights2)
    
    loss = F.cross_entropy(result2, target)
    #print("Cross entropy loss: "+str(loss))

    loss.backward()
    
    optimizer.step()
        
    if i % 10 == 0:
        print("The loss is now "+str(loss.data.item()))

torch.save(weights1, "models/toy_problem_3_trained_deep_model_weights1.bin")
torch.save(weights2, "models/toy_problem_3_trained_deep_model_weights2.bin")

The loss is now 2.756173849105835
The loss is now 0.6389186978340149
The loss is now 0.6298412680625916
The loss is now 0.6763184666633606
The loss is now 0.5558179616928101
The loss is now 0.49082499742507935
The loss is now 0.4915899336338043
The loss is now 0.46581557393074036
The loss is now 0.45589393377304077
The loss is now 0.44978952407836914
The loss is now 0.43294915556907654
The loss is now 0.4207729399204254
The loss is now 0.36652064323425293
The loss is now 0.31385186314582825
The loss is now 0.29051724076271057
The loss is now 0.2532366216182709
The loss is now 0.27499422430992126
The loss is now 0.2722074091434479
The loss is now 0.25757089257240295
The loss is now 0.25776299834251404
The loss is now 0.25943371653556824
The loss is now 0.24867649376392365
The loss is now 0.23981423676013947
The loss is now 0.2563263177871704
The loss is now 0.2491896003484726
The loss is now 0.23097574710845947
The loss is now 0.23793457448482513
The loss is now 0.22942709922790527
The 

## The Loss

The loss that is printed at the end of every 10 iterations is now seen to decrease.

This tells us that the machine learning algorithm is now learning something.

## Parameters

We can now print the weights.

In [4]:
print("The first layer weights are now "+str(weights1.data))
print("and the second layer's weights are now "+str(weights2.data))

The first layer weights are now tensor([[ 1.6524,  1.1334,  0.8918, -0.0659],
        [-0.0610,  0.6734,  0.4895,  1.1477]])
and the second layer's weights are now tensor([[ 0.5838,  1.2815],
        [ 0.3170, -0.0750],
        [ 0.9079,  0.1094],
        [ 0.4954,  1.0460]])


## Classifier Test - Toy Problem 3

We have just trained a multilayer classifier for Toy Problem 3.

It seems to be learning something (the loss on the training data has decreased till about 0.2).

Let us evaluate the performance of the classifier on the test data.

In [5]:
data = Data("data/toy_problem_3_test.txt")

weights1 = torch.load("models/toy_problem_3_trained_deep_model_weights1.bin")
print(weights1)
weights2 = torch.load("models/toy_problem_3_trained_deep_model_weights2.bin")
print(weights2)

labels, features = data.get_all()

features = torch.autograd.Variable(torch.Tensor(features))
#print(features)

target = torch.autograd.Variable(torch.LongTensor(labels))
#print(target)

result = torch.mm(features, weights1)
result1 = F.relu(result)
result2 = torch.mm(result1, weights2)
#print(result2)

maxv, observed = torch.max(result2, 1)

total = 0
correct = 0
for i in range(len(labels)):
    total += 1
    #print(str(target.data[i]) + " " + str(observed.data[i]))
    if target.data[i] == observed.data[i]:
        correct += 1
accuracy = correct / total
print("Accuracy: "+str(accuracy))

tensor([[ 1.6524,  1.1334,  0.8918, -0.0659],
        [-0.0610,  0.6734,  0.4895,  1.1477]])
tensor([[ 0.5838,  1.2815],
        [ 0.3170, -0.0750],
        [ 0.9079,  0.1094],
        [ 0.4954,  1.0460]])
Accuracy: 0.987


As you can see, the accuracy is in the high 90s.

This is a good score.

It tells us that the multi-layer neural network was able to learn the non-linear XOR function.