# Sequential Models for Toy Problem 5

How do we train a classifier to count the non-zero values in a sequence of integers?

We have the data ... having already generated data points consisting of features that consist of a sequence of $1$s and $0$ and an output label which is the count of the non-zero features.

The sequences are all the same length - 20 numbers.

So the problem is modeled as a sequential classification task where the sequential classifier reads a sequence, encodes it in its hidden state, and then classifies its hidden state into 21 states (0, 1, 2, 3 ... till 20).  There are 21 possible outputs in the range 0 to 20 since we're counting 0.

The sequential classifier that we use for this task is the Recurrent Neural Network (RNN).

The state of the RNN at the current time step $s_{t}$ is calculated using the equation $s_{t} = tanh(concat(x,s_{t-1})*W + b)$

Essentially, we're passing the current inputs and the previous state through a linear transformation to obtain a pre-activation, then passing the pre-activation through a *tanh* activation function.

In [4]:
import torch
import torch.nn.functional as F
from data_reader import Data

data = Data("data/toy_problem_5_train.txt")

sequence_length = 20
num_output = sequence_length+1
num_hidden = 4

W = torch.nn.Parameter(torch.rand(num_hidden+1, num_hidden))
#print("Weights input & hidden to hidden => "+str(W))

b = torch.nn.Parameter(torch.rand(1, num_hidden))
#print("Bias to hidden => "+str(b))

V = torch.nn.Parameter(torch.rand(num_hidden, num_output))
#print("Weights hidden to output categories => "+str(V))

d = torch.nn.Parameter(torch.rand(1, num_output))
#print("Bias to output categories => "+str(d))

optimizer = torch.optim.Adam([W, b, V, d], lr=0.01)

for j in range(10001):
    optimizer.zero_grad()   # zero the gradient buffers
    
    labels, features = data.get_sample(1000)
    
    features = torch.autograd.Variable(torch.Tensor(features))
    #print("Features: "+str(features))
    
    target = torch.autograd.Variable(torch.LongTensor(labels))
    #print("Target: "+str(target))
    
    state = torch.autograd.Variable(torch.zeros(features.size()[0],num_hidden))
    #print("State: "+str(state))

    for i in range(sequence_length):
        features_at_current_step = torch.unsqueeze(features[:,i], 1)
        # This is the RNN's state update equation s[t] = tanh( concat(x,s[t-1]) * W + b)
        x = torch.cat((state, features_at_current_step), 1)
        state = F.tanh(x.mm(W) + b)
    
    result = state.mm(V) + d
    
    loss = F.cross_entropy(result, target)
    #print("Cross entropy loss: "+str(loss))

    loss.backward()
    
    optimizer.step()
    
    if j % 100 == 0:
        print("The loss is now "+str(loss.data[0]))

print("Done training!")
#print("The first layer weights are now "+str(weights1.data))
#print("\tand the second layer's weights are now "+str(weights2.data))

torch.save(W, "models/toy_problem_5_trained_sequential_deep_model_weights1.bin")
torch.save(b, "models/toy_problem_5_trained_sequential_deep_model_bias1.bin")
torch.save(V, "models/toy_problem_5_trained_sequential_deep_model_weights2.bin")
torch.save(d, "models/toy_problem_5_trained_sequential_deep_model_bias2.bin")

The loss is now 3.229327917098999
The loss is now 2.542915105819702
The loss is now 2.34072208404541
The loss is now 2.282954692840576
The loss is now 2.234914541244507
The loss is now 2.1748650074005127
The loss is now 2.1541709899902344
The loss is now 2.0981101989746094
The loss is now 2.0799953937530518
The loss is now 2.0007340908050537
The loss is now 1.6037726402282715
The loss is now 1.5092980861663818
The loss is now 1.4156628847122192
The loss is now 1.3740284442901611
The loss is now 1.3425984382629395
The loss is now 1.2781156301498413
The loss is now 1.2552376985549927
The loss is now 1.1923974752426147
The loss is now 1.2044429779052734
The loss is now 1.1664882898330688
The loss is now 1.1283648014068604
The loss is now 1.1008405685424805
The loss is now 1.0644724369049072
The loss is now 1.0676239728927612
The loss is now 1.0223077535629272
The loss is now 1.0172436237335205
The loss is now 0.9921795129776001
The loss is now 0.9601276516914368
The loss is now 0.95725303

The loss can be seen steadily decreasing.  Now to test the model.

In [5]:
import torch
import torch.nn.functional as F
from data_reader import Data

data = Data("data/toy_problem_5_test.txt")

sequence_length = 20
num_output = sequence_length+1
num_hidden = 4

W = torch.load("models/toy_problem_5_trained_sequential_deep_model_weights1.bin")
#print(W)
V = torch.load("models/toy_problem_5_trained_sequential_deep_model_weights2.bin")
#print(V)
b = torch.load("models/toy_problem_5_trained_sequential_deep_model_bias1.bin")
#print(b)
d = torch.load("models/toy_problem_5_trained_sequential_deep_model_bias2.bin")
#print(d)

labels, features_list = data.get_all()

features = torch.autograd.Variable(torch.Tensor(features_list))
#print(features)

target = torch.autograd.Variable(torch.LongTensor(labels))
#print(target)

state = torch.autograd.Variable(torch.zeros(features.size()[0],num_hidden))
#print("State: "+str(state))

for i in range(sequence_length):
    features_at_current_step = torch.unsqueeze(features[:,i], 1)
    #print("Features at current step: "+str(features_at_current_step))
    x = torch.cat((state, features_at_current_step), 1)
    state = F.tanh(x.mm(W) + b)

result = state.mm(V) + d

maxv, observed = torch.max(result, 1)

loss = F.cross_entropy(result, target)
print("Cross entropy loss: "+str(loss))

#print("Target: "+str(target))
#print("Result: "+str(observed))

total = 0
correct = 0
for i in range(len(labels)):
    total += 1
    print("Features: "+str(features_list[i])+" => Gold: "+str(target.data[i]) + " Obs: " + str(observed.data[i]))
    if target.data[i] == observed.data[i]:
        correct += 1
accuracy = correct / total
print("Accuracy: "+str(accuracy))

Cross entropy loss: Variable containing:
 0.4653
[torch.FloatTensor of size 1]

Features: [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0] => Gold: 6 Obs: 6
Features: [0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] => Gold: 14 Obs: 14
Features: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0] => Gold: 2 Obs: 2
Features: [1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0] => Gold: 4 Obs: 4
Features: [0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0] => Gold: 14 Obs: 14
Features: [1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0] => Gold: 5 Obs: 5
Features: [0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0] => Gold: 12 Obs: 12
Features: [0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0] => Gold: 10 Obs: 10
Features: [1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0] => Gold: 8 Obs: 8
Features: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0] => Gold: 4 Obs: 4
Features: [0, 1, 1, 0,

As you can see, the accuracy of this sequential model (an RNN) at this task is pretty impressive (in the high 90s).