# Multi-Layer Neural Network Classifier for Toy Problem 3

In exercise 630, we built a single layer classifier for Toy Problem 3.

We saw that for the single-layer neural network, the loss did not decrease despite the prolonged use of gradient descent.

The data used in Toy Problem 3 was a version of the XOR function -the categories in this data set are not linearly separable - and so it is impossible for a single-layer neural network (or any other linear classifier) to do well on this dataset.

Now let's build a multi-layer classifier for Toy Problem 3 using the ReLU non-linearity (which was discovered/invented only in the year 2000) and see how it fares.

We've provided a utility class 'Data' (in data_reader.py) to load the training data (it works for all the toy problems).

In [7]:
import torch
import torch.nn.functional as F
from data_reader import Data

data = Data("data/toy_problem_3_train.txt")

labels, features = data.get_sample()

print("Labels:\n"+str(labels))

print("Features:\n"+str(features))
    
target = torch.autograd.Variable(torch.LongTensor(labels))
#print("Labels Tensor:\n"+str(target))

features = torch.autograd.Variable(torch.Tensor(features))
#print("Features Tensor:\n"+str(features))

Labels:
[1, 1, 0, 1, 0, 1, 1, 1, 1, 1]
Features:
[[-53, 4], [37, -68], [70, 2], [88, -95], [-2, -64], [28, -66], [52, -88], [-29, 7], [-15, 52], [83, -42]]


We initialize the weights (one set of weights per layer) randomly.

In [8]:
middle = 4

weights1 = torch.nn.Parameter(torch.rand(2, middle))
print("Weights1 => "+str(weights1))

weights2 = torch.nn.Parameter(torch.rand(middle, 2))
print("Weights2 => "+str(weights2))


Weights1 => Parameter containing:
 0.2741  0.1079  0.5357  0.2768
 0.8693  0.0993  0.1424  0.7299
[torch.FloatTensor of size 2x4]

Weights2 => Parameter containing:
 0.2426  0.7439
 0.2633  0.9062
 0.5629  0.8794
 0.1872  0.8052
[torch.FloatTensor of size 4x2]



We can now perform 1000 learning iterations below as many times as we want.

Notice that the code for the learning iterations is almost identical to that of exercise Adam but that we've used the SGD optimizer class in Pytorch to nudge the weights in the direction they must go.

In [9]:
optimizer = torch.optim.Adam([weights1, weights2], lr=0.01)

for i in range(1001):
    optimizer.zero_grad()   # zero the gradient buffers
    
    labels, features = data.get_sample(1000)
    
    features = torch.autograd.Variable(torch.Tensor(features))
    #print("Features: "+str(features))
    
    target = torch.autograd.Variable(torch.LongTensor(labels))
    #print("Target: "+str(target))
    
    result = features.mm(weights1)
    result1 = F.relu(result)
    result2 = result1.mm(weights2)
    
    loss = F.cross_entropy(result2, target)
    #print("Cross entropy loss: "+str(loss))

    loss.backward()
    
    optimizer.step()
        
    if i % 10 == 0:
        print("The loss is now "+str(loss.data[0]))

torch.save(weights1, "models/toy_problem_3_trained_deep_model_weights1.bin")
torch.save(weights2, "models/toy_problem_3_trained_deep_model_weights2.bin")

The loss is now 16.446395874023438
The loss is now 8.513433456420898
The loss is now 2.8546459674835205
The loss is now 0.860838770866394
The loss is now 0.7812145352363586
The loss is now 0.6594237089157104
The loss is now 0.4663209021091461
The loss is now 0.41401979327201843
The loss is now 0.38516300916671753
The loss is now 0.3501630425453186
The loss is now 0.3344234228134155
The loss is now 0.32852596044540405
The loss is now 0.31941601634025574
The loss is now 0.3146955370903015
The loss is now 0.30019524693489075
The loss is now 0.28402167558670044
The loss is now 0.3068307936191559
The loss is now 0.2830791175365448
The loss is now 0.2841053605079651
The loss is now 0.2849158048629761
The loss is now 0.29306915402412415
The loss is now 0.2820267677307129
The loss is now 0.2742811441421509
The loss is now 0.2826817035675049
The loss is now 0.2755924463272095
The loss is now 0.26690950989723206
The loss is now 0.25641873478889465
The loss is now 0.2779945731163025
The loss is n

## The Loss

The loss that is printed at the end of every 10 iterations is now seen to decrease.

This tells us that the machine learning algorithm is now learning something.

## Parameters

We can now print the weights.

In [10]:
print("The first layer weights are now "+str(weights1.data))
print("and the second layer's weights are now "+str(weights2.data))

The first layer weights are now 
 0.2479  0.4143  0.7116 -0.0562
 0.5629 -0.0183  0.4606  0.7253
[torch.FloatTensor of size 2x4]

and the second layer's weights are now 
 0.5984  0.3880
-0.0612  1.2307
 1.0671  0.3753
 0.2092  0.7833
[torch.FloatTensor of size 4x2]



## Classifier Test - Toy Problem 3

We have just trained a multilayer classifier for Toy Problem 3.

It seems to be learning something (the loss on the training data has decreased till about 0.2).

Let us evaluate the performance of the classifier on the test data.

In [11]:
data = Data("data/toy_problem_3_test.txt")

weights1 = torch.load("models/toy_problem_3_trained_deep_model_weights1.bin")
print(weights1)
weights2 = torch.load("models/toy_problem_3_trained_deep_model_weights2.bin")
print(weights2)

labels, features = data.get_all()

features = torch.autograd.Variable(torch.Tensor(features))
#print(features)

target = torch.autograd.Variable(torch.LongTensor(labels))
#print(target)

result = torch.mm(features, weights1)
result1 = F.relu(result)
result2 = torch.mm(result1, weights2)
#print(result2)

maxv, observed = torch.max(result2, 1)

total = 0
correct = 0
for i in range(len(labels)):
    total += 1
    #print(str(target.data[i]) + " " + str(observed.data[i]))
    if target.data[i] == observed.data[i]:
        correct += 1
accuracy = correct / total
print("Accuracy: "+str(accuracy))

Parameter containing:
 0.2479  0.4143  0.7116 -0.0562
 0.5629 -0.0183  0.4606  0.7253
[torch.FloatTensor of size 2x4]

Parameter containing:
 0.5984  0.3880
-0.0612  1.2307
 1.0671  0.3753
 0.2092  0.7833
[torch.FloatTensor of size 4x2]

Accuracy: 0.977


As you can see, the accuracy is in the high 90s.

This is a good score.

It tells us that the multi-layer neural network was able to learn the non-linear XOR function.