## Using RNNs to add two binary strings ##

In this first lab, to get familiar with RNNs, we will explore the simple problem of teaching an RNN to add binary strings. Recall, like grade-school addition, binary addition moves from the right-most bit (least-significant bit or LSB) towards the left-most bit (most-significant) bit, with a carry bit passed from the previous addition.

Following is the "truth table" for a "full-adder" (i.e., carry-in, carry-out):

     i1   i2   carry-in  |  sum  carry-out
     --------------------+----------------
     0    0       0      |   0      0
     0    0       1      |   1      0
     0    1       0      |   1      0
     0    1       1      |   0      1
     1    0       0      |   1      0
     1    0       1      |   0      1
     1    1       0      |   0      1
     1    1       1      |   1      1

where, `i1` and `i2` are the input bits

The RNN is fed two bit-sequences and the target "sum" sequence.
The sequence is ordered from LSB to MSB, i.e., time-step 1 (t=1) corresponds to LSB, and the last time-step is the MSB.

For example:
If the bit strings 010 (integer value = 2) and 011 (integer value = 3) are to be added to produce the sum 101 (integer value 5), the following is the sequence of inputs and targets to the RNN when training:

    time | i1  i2 | output
    -----+--------+--------
     1   | 0    1 |   1
     2   | 1    1 |   0
     3   | 0    0 |   1

Note, in the example above, the "carry" bit is not explicitly provided as the input, and the RNN has to *learn* the concept of a carry-bit

### Overview ###
We will be using [PyTorch](http://pytorch.org) for implementation.

This question is planned as below:

    1. First we discuss how the training samples are generated
    2. Next, the we discuss the input and output format used for the RNN
    3. We set up the RNN network 
    4. We explore the effects of various parameters.

In [1]:
# coding: utf-8
# =============================================================================
# Make a simple RNN learn binray addition 
# Binary string pairs and the sum is generated for a given #numBits

# ==============================================================================


%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
from time import sleep
import random
import sys
import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
random.seed( 10 ) # set the random seed (for reproducibility)

## Preparing the Training Data ##

###   Radom binary strings of required length as training data ###
The function `getSample` below takes a string-length `L` as input and returns a training sample to be fed to the RNN.


For a given length `L`, a training sample is a 2-tuple of (`input`, `output`), where

* `input` is a tensor of size [`L+1x2`]:<br>
	* The second dimension of 2, corresponds to 2 inputs which are
		to be summed together.
	* The first row is for LSBs, and the last row correspond to MSBs.
	* The bit-strings are `L+1` due to a possible carry when adding two `L`-length bit strings.

* `output` is a tensor of size `L+1`, which is the sum of the inputs

In [2]:
def getSample(stringLength, testFlag=False):
    """
    Returns a random sample for bit-string addition.
    STRINGLENGTH: (int scalar) (one less than) length of the bit-string to return.
    TESTFLAG: (boolean) if True, the returned sample is printed.
    
    Returns:
        a 2-tuple of (Input,Output), where:
        INPUT: (L+1 x 2) dimensional tensor of the inputs, where L==STRINGLENGTH
        OUTPUT: (L+1) dimensional "target" vector, which is the binary sum of inputs.
    """
    lowerBound=pow(2,stringLength-1)
    upperBound=pow(2,stringLength)-1
    
    num1=random.randint(lowerBound,upperBound)
    num2=random.randint(lowerBound,upperBound)

    num3=num1+num2
    num3Binary=(bin(num3)[2:])

    num1Binary=(bin(num1)[2:])
    num2Binary=(bin(num2)[2:])

    if testFlag==1:
        print('input numbers and their sum  are', num1, ' ', num2, ' ', num3)
        print ('binary strings are', num1Binary, ' ' , num2Binary, ' ' , num3Binary)
    len_num1= (len(num1Binary))

    len_num2= (len(num2Binary))
    len_num3= (len(num3Binary))

    # since num3 will be the largest, we pad  other numbers with zeros to that num3_len
    num1Binary= ('0'*(len(num3Binary)-len(num1Binary))+num1Binary)
    num2Binary= ('0'*(len(num3Binary)-len(num2Binary))+num2Binary)

    # forming the input sequence
    # the input at first timestep is the least significant bits of the two input binary strings
    # x will be then a len_num3 ( or T ) * 2 array
    x=torch.zeros(len_num3,2)
    #x=np.zeros((len_num3,2),dtype=np.float32)
    for i in range(0, len_num3):
        x[i,0]=int(num1Binary[len_num3-1-i]) # note that MSB of the binray string should be the last input along the time axis
        x[i,1]=int(num2Binary[len_num3-1-i])
    #y=np.zeros((len_num3,1),dtype=np.float32)
    y=torch.zeros(len_num3,1).long()
    for i in range(0,len_num3):
        y[i,0]=int(num3Binary[len_num3-1-i])
    
    # target vector is the sum in binary
    return x,y 

### Model Input ###
As noted above, the inputs are `L+1x2` dimensional tensors, with the first row corresponding to LSB and the last to MSB. As addition proceeds from LSB to MSB, the `input` rows are fed  one-by-one, starting from the top, proceeding all the way to the last row.<br>
__Note__: This is not shown explicitly in the provided code, but is done internally.


### Model Output  ###
At each time-step the model needs to predict the target "sum".


## Training Loss ##
We use the `Mean-Squared-Error` loss function to compare the predicted value $y_t$, and the target $\tilde{y_t}$:

$$\mathcal{L}_t(y_t,\tilde{y_t}) = \left(y_t - \tilde{y_t}\right)^2$$

__Note__: The total loss for a given sequence is the sum of all the losses from each time-step.


The image below shows a schematic of the "unrolled" RNN for binary-addition:


![static/network architecture](static/binAdd.png)


## Model Implementation ##
The following class `Adder` implements the above RNN. We only give the forward-pass implementation.
The backward pass is calcuated automatically by PyTorch's auto-grad.

The only parameters passed to the model is the size (dimensionality) of the hidden state.
Bigger state, means higher capacity.


In [3]:
class Adder (nn.Module):
    def __init__(self,stateDim):
        super(Adder, self).__init__()
        self.stateDim = stateDim
        self.inputDim = 2  # two for the two inputs
        self.outputDim = 2  # one for the "score"
        # currently the model uses the 'LSTM' cell. You could try
        # others like: tanh, GRU. See: https://github.com/pytorch/examples/blob/master/word_language_model/model.py#L11
        self.lstm = nn.LSTM(self.inputDim, self.stateDim )
        self.outputLayer = nn.Linear(self.stateDim, self.outputDim)
        self.softmax = nn.Softmax()

    def forward(self, x):
        """
        X: [L,B,inputDim(=2)] dimensional input tensor
            L: Sequence length
            B: is the "batch" dimension. As we are training on 
               single examples, B = 1 for us.
        """
        lstmOut,_ = self.lstm(x)
        L,B,D  = lstmOut.size(0),lstmOut.size(1),lstmOut.size(2)
        lstmOut = lstmOut.contiguous() 
        # before  feeding to linear layer we squash one dimension
        lstmOut = lstmOut.view(L*B,D)
        pred = torch.sigmoid(self.outputLayer(lstmOut)) # project lstm states to "output"
        # reshape actiavtions to T*B*outputDim
        #squeeze ll remove the extra dummy dimension for B so pred would be 2D tensor
        pred = pred.view(L,B,-1).squeeze(1) 
        return pred

### Training the Network ###

The model is trained on bit-strings of length `stringLen` sampled randomly.

For simplicity, training code runs for a fixed number of epochs (or iterations). In practice, the training should be monitored with performance on a held-out or validation set, in order to avoid over-fitting.

We use the [`Adam` optimizer](https://arxiv.org/abs/1412.6980).

The model runs fast enough to train on the CPU itself (GPUs are not used).



In [4]:
# set here the size of the RNN state:
stateSize = 10
# set here the size of the binary strings to be used for training:
stringLen = 3

# create the model:
model = Adder(stateSize)
print ('Model initialized')

# create the loss-function:
# lossFunction = nn.MSELoss() # or 
lossFunction = nn.CrossEntropyLoss() #-- see question #2 below
# lossFunction = nn.NLLLoss()

# uncomment below to change the optimizers:
# optimizer = optim.SGD(model.parameters(), lr=3e-2, momentum=0.8)
optimizer = optim.Adam(model.parameters(),lr=0.01)
iterations = 500
min_epochs = 20
num_epochs,totalLoss = 0,float("inf")
while num_epochs < min_epochs:
    print("[epoch %d/%d] Avg. Loss for last 500 samples = %lf"%(num_epochs+1,min_epochs,totalLoss))
    num_epochs += 1
    totalLoss = 0
    for i in range(0,iterations):
        # get a new random training sample:
        x,y = getSample(stringLen)
        # zero the gradients from the previous time-step:
        model.zero_grad()
        #convert to torch tensor and variable:
        ## unsqueeze() is used to add the extra BATCH dimension:
        x=x.unsqueeze(1)
         
        seqLen = x.size(0)
        x = x.contiguous()
        
        # push the inputs through the RNN (this is the forward pass):
        pred = model(x)
        # compute the loss:
        y = y.squeeze(1)
#         print("X is:", x)
#         print("Predictions are:", pred)
#         print("Y is:", y)
        loss = lossFunction(pred,y)
#         print("Loss is:",loss)
        totalLoss += loss.item()
        optimizer.zero_grad()
        # perform the backward pass:
        loss.backward()
        # update the weights:
        optimizer.step()
    totalLoss=totalLoss/iterations
print('Training finished!')

Model initialized
[epoch 1/20] Avg. Loss for last 500 samples = inf
[epoch 2/20] Avg. Loss for last 500 samples = 0.500129
[epoch 3/20] Avg. Loss for last 500 samples = 0.345850
[epoch 4/20] Avg. Loss for last 500 samples = 0.347017
[epoch 5/20] Avg. Loss for last 500 samples = 0.343565
[epoch 6/20] Avg. Loss for last 500 samples = 0.343209
[epoch 7/20] Avg. Loss for last 500 samples = 0.345534
[epoch 8/20] Avg. Loss for last 500 samples = 0.343442
[epoch 9/20] Avg. Loss for last 500 samples = 0.348886
[epoch 10/20] Avg. Loss for last 500 samples = 0.337850
[epoch 11/20] Avg. Loss for last 500 samples = 0.346824
[epoch 12/20] Avg. Loss for last 500 samples = 0.346808
[epoch 13/20] Avg. Loss for last 500 samples = 0.343796
[epoch 14/20] Avg. Loss for last 500 samples = 0.345787
[epoch 15/20] Avg. Loss for last 500 samples = 0.341281
[epoch 16/20] Avg. Loss for last 500 samples = 0.342776
[epoch 17/20] Avg. Loss for last 500 samples = 0.341273
[epoch 18/20] Avg. Loss for last 500 samples

### Testing the model ###
We now test the trained model. We get random inputs again (as before),
push them through the RNN, getting a prediction (in the [0,1] range due to the sigmoid) at every time step.
We discretize the output to 0 or 1, by thresholding at 0.5.

In [5]:
def test_by_length(stringLen,n_samples=100,verbose=True):
    n_samples = min(n_samples,2**stringLen)
    total_correctBits, total_num_bits = 0,0
    for i in range(n_samples):
        x,y = getSample(stringLen,testFlag=verbose)
        x=x.unsqueeze(1)
        y = np.transpose(y)
        seqLen = x.size(0)
        x = x.contiguous()
        finalScores = model(x)#data.t().numpy()
#         print(finalScores)
        # to get the final predictions, threshold the output of RNN at 0.5:
        ## this needs to be changed when you switch to cross-entropy loss (see question #2).
#         bits = (finalScores > 0.5)#.astype(np.int32)
        y_pred = []
        for i in range(len(finalScores)):
            y_pred.append(finalScores[i].detach().numpy().argmax())
#         print("Bits are:",bits)
#         y_pred = bits[0,:]

        y_pred = np.transpose(y_pred) #change to row vector
        y_pred_flipped = np.flip(y_pred,0) # reverse the array
        print("Y pred is:", y_pred)
        print("Y is:", np.flip(y, 0))
#         print ('shape of y_pred is',y_pred.shape)
#         print ('shape of y is',y.size())
        #print ('length of y is', len(y))
        fullStringCorr=1
        bitsCorrectInCurrentSample=0
        # iterate through each bit position and check if the corresponding bits are same
        # TODO - could be done in a better fashion
        for i in range (y.size()[1]):
            if  y_pred[i]!=int(y[0,i]):                             
                fullStringCorr=0
            else:
                bitsCorrectInCurrentSample+=1
                total_correctBits+=1
        
        
        total_num_bits += len(y)
        if verbose:
            print('sum predicted by RNN is ',y_pred_flipped)
            print('bit-accuracy : %s'%(bitsCorrectInCurrentSample/(len(y)+0.0)))
            print(40*'*')
    accuracy = total_correctBits / (total_num_bits + 0.0)
    if verbose:
        print(40*'*')
        print('Final bit-accuracy for strings of length %d = %.3f'%(stringLen,accuracy))
        print(40*'*')
    return accuracy

## Testing Model Generalization Ability##
Recall that the model was trained on bit-strings of length 3.
We will now test the trained model on bit-strings of different lengths (other than 3):<br>
We will sweep the length-range from 2 to 20, and plot the bit-accuracy.


In [7]:
string_len = np.arange(2,20)
# set "verbose" to true to print out detailed information:
bit_accuracy = [test_by_length(l,verbose=True,n_samples=100) for l in string_len]
# plot the accuracy:
plt.plot(string_len,bit_accuracy)
plt.xlabel('string length'); plt.ylabel('bit-accuracy'); plt.xticks(string_len,string_len)
plt.ylim([0,1.1]); 


input numbers and their sum  are 3   2   5
binary strings are 11   10   101
Y pred is: [1 0 1]
Y is: [[1 0 1]]
sum predicted by RNN is  [1 0 1]
bit-accuracy : 3.0
****************************************
input numbers and their sum  are 3   2   5
binary strings are 11   10   101
Y pred is: [1 0 1]
Y is: [[1 0 1]]
sum predicted by RNN is  [1 0 1]
bit-accuracy : 3.0
****************************************
input numbers and their sum  are 2   2   4
binary strings are 10   10   100
Y pred is: [0 0 1]
Y is: [[0 0 1]]
sum predicted by RNN is  [1 0 0]
bit-accuracy : 3.0
****************************************
input numbers and their sum  are 2   3   5
binary strings are 10   11   101
Y pred is: [1 0 1]
Y is: [[1 0 1]]
sum predicted by RNN is  [1 0 1]
bit-accuracy : 3.0
****************************************
****************************************
Final bit-accuracy for strings of length 2 = 3.000
****************************************
input numbers and their sum  are 5   6   11
binary 

input numbers and their sum  are 23   23   46
binary strings are 10111   10111   101110
Y pred is: [0 1 1 1 0 1]
Y is: [[0 1 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 1 0]
bit-accuracy : 6.0
****************************************
input numbers and their sum  are 28   31   59
binary strings are 11100   11111   111011
Y pred is: [1 1 0 1 1 1]
Y is: [[1 1 0 1 1 1]]
sum predicted by RNN is  [1 1 1 0 1 1]
bit-accuracy : 6.0
****************************************
input numbers and their sum  are 18   19   37
binary strings are 10010   10011   100101
Y pred is: [1 0 1 0 0 1]
Y is: [[1 0 1 0 0 1]]
sum predicted by RNN is  [1 0 0 1 0 1]
bit-accuracy : 6.0
****************************************
input numbers and their sum  are 27   21   48
binary strings are 11011   10101   110000
Y pred is: [0 1 1 1 0 1]
Y is: [[0 0 0 0 1 1]]
sum predicted by RNN is  [1 0 1 1 1 0]
bit-accuracy : 2.0
****************************************
input numbers and their sum  are 16   16   32
binary strings are

input numbers and their sum  are 58   43   101
binary strings are 111010   101011   1100101
Y pred is: [1 0 1 0 1 1 1]
Y is: [[1 0 1 0 0 1 1]]
sum predicted by RNN is  [1 1 1 0 1 0 1]
bit-accuracy : 6.0
****************************************
input numbers and their sum  are 63   37   100
binary strings are 111111   100101   1100100
Y pred is: [0 1 1 1 1 0 1]
Y is: [[0 0 1 0 0 1 1]]
sum predicted by RNN is  [1 0 1 1 1 1 0]
bit-accuracy : 3.0
****************************************
input numbers and their sum  are 61   38   99
binary strings are 111101   100110   1100011
Y pred is: [1 1 0 1 1 0 1]
Y is: [[1 1 0 0 0 1 1]]
sum predicted by RNN is  [1 0 1 1 0 1 1]
bit-accuracy : 4.0
****************************************
input numbers and their sum  are 59   62   121
binary strings are 111011   111110   1111001
Y pred is: [1 0 1 1 1 1 1]
Y is: [[1 0 0 1 1 1 1]]
sum predicted by RNN is  [1 1 1 1 1 0 1]
bit-accuracy : 6.0
****************************************
input numbers and their s

sum predicted by RNN is  [1 0 1 1 1 0 1]
bit-accuracy : 4.0
****************************************
input numbers and their sum  are 47   33   80
binary strings are 101111   100001   1010000
Y pred is: [0 1 1 1 0 0 1]
Y is: [[0 0 0 0 1 0 1]]
sum predicted by RNN is  [1 0 0 1 1 1 0]
bit-accuracy : 3.0
****************************************
input numbers and their sum  are 54   58   112
binary strings are 110110   111010   1110000
Y pred is: [0 0 1 1 0 1 1]
Y is: [[0 0 0 0 1 1 1]]
sum predicted by RNN is  [1 1 0 1 1 0 0]
bit-accuracy : 4.0
****************************************
input numbers and their sum  are 60   32   92
binary strings are 111100   100000   1011100
Y pred is: [0 0 1 1 1 0 1]
Y is: [[0 0 1 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 1 0 0]
bit-accuracy : 7.0
****************************************
input numbers and their sum  are 62   37   99
binary strings are 111110   100101   1100011
Y pred is: [1 1 0 1 1 0 1]
Y is: [[1 1 0 0 0 1 1]]
sum predicted by RNN is  [1

binary strings are 1000101   1111000   10111101
Y pred is: [1 0 1 1 1 1 0 1]
Y is: [[1 0 1 1 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 1 1 0 1]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 111   100   211
binary strings are 1101111   1100100   11010011
Y pred is: [1 1 0 1 1 0 1 1]
Y is: [[1 1 0 0 1 0 1 1]]
sum predicted by RNN is  [1 1 0 1 1 0 1 1]
bit-accuracy : 7.0
****************************************
input numbers and their sum  are 113   89   202
binary strings are 1110001   1011001   11001010
Y pred is: [0 1 0 1 0 1 1 1]
Y is: [[0 1 0 1 0 0 1 1]]
sum predicted by RNN is  [1 1 1 0 1 0 1 0]
bit-accuracy : 7.0
****************************************
input numbers and their sum  are 65   75   140
binary strings are 1000001   1001011   10001100
Y pred is: [0 1 1 0 1 0 0 1]
Y is: [[0 0 1 1 0 0 0 1]]
sum predicted by RNN is  [1 0 0 1 0 1 1 0]
bit-accuracy : 5.0
****************************************
input numbers and their sum  are

Y pred is: [0 1 0 0 1 0 1 1]
Y is: [[0 1 0 0 1 0 1 1]]
sum predicted by RNN is  [1 1 0 1 0 0 1 0]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 65   113   178
binary strings are 1000001   1110001   10110010
Y pred is: [0 1 0 0 1 1 0 1]
Y is: [[0 1 0 0 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 0 0 1 0]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 82   76   158
binary strings are 1010010   1001100   10011110
Y pred is: [0 1 1 1 1 1 0 1]
Y is: [[0 1 1 1 1 0 0 1]]
sum predicted by RNN is  [1 0 1 1 1 1 1 0]
bit-accuracy : 7.0
****************************************
input numbers and their sum  are 97   67   164
binary strings are 1100001   1000011   10100100
Y pred is: [0 1 1 0 0 1 0 1]
Y is: [[0 0 1 0 0 1 0 1]]
sum predicted by RNN is  [1 0 1 0 0 1 1 0]
bit-accuracy : 7.0
****************************************
input numbers and their sum  are 118   126   244
binary strings are 1110110   1111

sum predicted by RNN is  [1 0 1 1 1 1 0 1]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 93   65   158
binary strings are 1011101   1000001   10011110
Y pred is: [0 1 1 1 1 0 0 1]
Y is: [[0 1 1 1 1 0 0 1]]
sum predicted by RNN is  [1 0 0 1 1 1 1 0]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 127   94   221
binary strings are 1111111   1011110   11011101
Y pred is: [1 0 1 1 1 1 1 1]
Y is: [[1 0 1 1 1 0 1 1]]
sum predicted by RNN is  [1 1 1 1 1 1 0 1]
bit-accuracy : 7.0
****************************************
input numbers and their sum  are 109   86   195
binary strings are 1101101   1010110   11000011
Y pred is: [1 1 0 1 1 1 0 1]
Y is: [[1 1 0 0 0 0 1 1]]
sum predicted by RNN is  [1 0 1 1 1 0 1 1]
bit-accuracy : 4.0
****************************************
input numbers and their sum  are 85   125   210
binary strings are 1010101   1111101   11010010
Y pred is: [0 1 0 1 1 1 1 1]
Y is: [[0 1

input numbers and their sum  are 198   145   343
binary strings are 11000110   10010001   101010111
Y pred is: [1 1 1 0 1 0 1 0 1]
Y is: [[1 1 1 0 1 0 1 0 1]]
sum predicted by RNN is  [1 0 1 0 1 0 1 1 1]
bit-accuracy : 9.0
****************************************
input numbers and their sum  are 202   174   376
binary strings are 11001010   10101110   101111000
Y pred is: [0 0 1 1 1 1 1 0 1]
Y is: [[0 0 0 1 1 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 1 1 1 0 0]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 185   146   331
binary strings are 10111001   10010010   101001011
Y pred is: [1 1 0 1 0 1 1 0 1]
Y is: [[1 1 0 1 0 0 1 0 1]]
sum predicted by RNN is  [1 0 1 1 0 1 0 1 1]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 159   186   345
binary strings are 10011111   10111010   101011001
Y pred is: [1 0 1 1 1 1 1 0 1]
Y is: [[1 0 0 1 1 0 1 0 1]]
sum predicted by RNN is  [1 0 1 1 1 1 1 0 1]
bit-

****************************************
input numbers and their sum  are 218   236   454
binary strings are 11011010   11101100   111000110
Y pred is: [0 1 1 0 1 1 0 1 1]
Y is: [[0 1 1 0 0 0 1 1 1]]
sum predicted by RNN is  [1 1 0 1 1 0 1 1 0]
bit-accuracy : 6.0
****************************************
input numbers and their sum  are 158   148   306
binary strings are 10011110   10010100   100110010
Y pred is: [0 1 0 1 1 1 0 0 1]
Y is: [[0 1 0 0 1 1 0 0 1]]
sum predicted by RNN is  [1 0 0 1 1 1 0 1 0]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 250   251   501
binary strings are 11111010   11111011   111110101
Y pred is: [1 0 1 0 1 1 1 1 1]
Y is: [[1 0 1 0 1 1 1 1 1]]
sum predicted by RNN is  [1 1 1 1 1 0 1 0 1]
bit-accuracy : 9.0
****************************************
input numbers and their sum  are 197   202   399
binary strings are 11000101   11001010   110001111
Y pred is: [1 1 1 1 0 0 0 1 1]
Y is: [[1 1 1 1 0 0 0 1 1]]
sum pred

Y is: [[0 1 0 1 1 0 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 1 1 1 1 1 0]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 494   478   972
binary strings are 111101110   111011110   1111001100
Y pred is: [0 0 1 1 1 1 0 1 1 1]
Y is: [[0 0 1 1 0 0 1 1 1 1]]
sum predicted by RNN is  [1 1 1 0 1 1 1 1 0 0]
bit-accuracy : 7.0
****************************************
input numbers and their sum  are 296   423   719
binary strings are 100101000   110100111   1011001111
Y pred is: [1 1 1 1 0 0 1 1 0 1]
Y is: [[1 1 1 1 0 0 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 0 0 1 1 1 1]
bit-accuracy : 10.0
****************************************
input numbers and their sum  are 419   482   901
binary strings are 110100011   111100010   1110000101
Y pred is: [1 0 1 0 0 0 1 1 1 1]
Y is: [[1 0 1 0 0 0 0 1 1 1]]
sum predicted by RNN is  [1 1 1 1 0 0 0 1 0 1]
bit-accuracy : 9.0
****************************************
input numbers and their sum  are 304   503 

Y pred is: [0 0 0 1 1 0 1 1 0 1]
Y is: [[0 0 0 1 1 0 1 0 0 1]]
sum predicted by RNN is  [1 0 1 1 0 1 1 0 0 0]
bit-accuracy : 9.0
****************************************
input numbers and their sum  are 275   389   664
binary strings are 100010011   110000101   1010011000
Y pred is: [0 1 1 1 0 1 0 1 0 1]
Y is: [[0 0 0 1 1 0 0 1 0 1]]
sum predicted by RNN is  [1 0 1 0 1 0 1 1 1 0]
bit-accuracy : 6.0
****************************************
input numbers and their sum  are 448   308   756
binary strings are 111000000   100110100   1011110100
Y pred is: [0 0 1 0 1 1 1 1 0 1]
Y is: [[0 0 1 0 1 1 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 1 1 0 1 0 0]
bit-accuracy : 10.0
****************************************
input numbers and their sum  are 482   259   741
binary strings are 111100010   100000011   1011100101
Y pred is: [1 0 1 0 0 1 1 1 0 1]
Y is: [[1 0 1 0 0 1 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 1 0 0 1 0 1]
bit-accuracy : 10.0
****************************************
input num

****************************************
input numbers and their sum  are 935   523   1458
binary strings are 1110100111   1000001011   10110110010
Y pred is: [0 1 1 1 1 1 1 0 1 0 1]
Y is: [[0 1 0 0 1 1 0 1 1 0 1]]
sum predicted by RNN is  [1 0 1 0 1 1 1 1 1 1 0]
bit-accuracy : 7.0
****************************************
input numbers and their sum  are 854   827   1681
binary strings are 1101010110   1100111011   11010010001
Y pred is: [1 0 1 1 0 1 1 1 0 1 1]
Y is: [[1 0 0 0 1 0 0 1 0 1 1]]
sum predicted by RNN is  [1 1 0 1 1 1 0 1 1 0 1]
bit-accuracy : 6.0
****************************************
input numbers and their sum  are 893   799   1692
binary strings are 1101111101   1100011111   11010011100
Y pred is: [0 1 1 1 1 1 1 1 0 1 1]
Y is: [[0 0 1 1 1 0 0 1 0 1 1]]
sum predicted by RNN is  [1 1 0 1 1 1 1 1 1 1 0]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 936   893   1829
binary strings are 1110101000   1101111101   11100100101
Y p

****************************************
input numbers and their sum  are 721   783   1504
binary strings are 1011010001   1100001111   10111100000
Y pred is: [0 1 1 1 1 1 0 1 1 0 1]
Y is: [[0 0 0 0 0 1 1 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 0 1 1 1 1 1 0]
bit-accuracy : 6.0
****************************************
input numbers and their sum  are 644   984   1628
binary strings are 1010000100   1111011000   11001011100
Y pred is: [0 0 1 1 1 1 0 0 1 1 1]
Y is: [[0 0 1 1 1 0 1 0 0 1 1]]
sum predicted by RNN is  [1 1 1 0 0 1 1 1 1 0 0]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 684   814   1498
binary strings are 1010101100   1100101110   10111011010
Y pred is: [0 1 0 1 1 0 1 1 1 0 1]
Y is: [[0 1 0 1 1 0 1 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 1 0 1 1 0 1 0]
bit-accuracy : 11.0
****************************************
input numbers and their sum  are 686   741   1427
binary strings are 1010101110   1011100101   10110010011
Y 

sum predicted by RNN is  [1 1 1 0 1 1 1 1 1 1 1 0]
bit-accuracy : 11.0
****************************************
input numbers and their sum  are 1478   1036   2514
binary strings are 10111000110   10000001100   100111010010
Y pred is: [0 1 0 1 1 0 1 1 1 1 0 1]
Y is: [[0 1 0 0 1 0 1 1 1 0 0 1]]
sum predicted by RNN is  [1 0 1 1 1 1 0 1 1 0 1 0]
bit-accuracy : 10.0
****************************************
input numbers and their sum  are 2021   1847   3868
binary strings are 11111100101   11100110111   111100011100
Y pred is: [0 1 1 1 1 0 1 1 0 1 1 1]
Y is: [[0 0 1 1 1 0 0 0 1 1 1 1]]
sum predicted by RNN is  [1 1 1 0 1 1 0 1 1 1 1 0]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 1271   1042   2313
binary strings are 10011110111   10000010010   100100001001
Y pred is: [1 0 1 1 0 1 1 1 0 0 0 1]
Y is: [[1 0 0 1 0 0 0 0 1 0 0 1]]
sum predicted by RNN is  [1 0 0 0 1 1 1 0 1 1 0 1]
bit-accuracy : 7.0
****************************************
input

input numbers and their sum  are 1539   1053   2592
binary strings are 11000000011   10000011101   101000100000
Y pred is: [0 1 1 1 1 1 0 0 0 1 0 1]
Y is: [[0 0 0 0 0 1 0 0 0 1 0 1]]
sum predicted by RNN is  [1 0 1 0 0 0 1 1 1 1 1 0]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 1729   1060   2789
binary strings are 11011000001   10000100100   101011100101
Y pred is: [1 0 1 0 0 1 1 1 1 0 0 1]
Y is: [[1 0 1 0 0 1 1 1 0 1 0 1]]
sum predicted by RNN is  [1 0 0 1 1 1 1 0 0 1 0 1]
bit-accuracy : 10.0
****************************************
input numbers and their sum  are 1743   1149   2892
binary strings are 11011001111   10001111101   101101001100
Y pred is: [0 1 1 1 1 1 0 1 1 0 0 1]
Y is: [[0 0 1 1 0 0 1 0 1 1 0 1]]
sum predicted by RNN is  [1 0 0 1 1 0 1 1 1 1 1 0]
bit-accuracy : 6.0
****************************************
input numbers and their sum  are 1456   1875   3331
binary strings are 10110110000   11101010011   110100000011
Y pre

binary strings are 101001111011   100010010110   1001100010001
Y pred is: [1 0 1 1 0 1 1 1 1 1 1 0 1]
Y is: [[1 0 0 0 1 0 0 0 1 1 0 0 1]]
sum predicted by RNN is  [1 0 1 1 1 1 1 1 0 1 1 0 1]
bit-accuracy : 6.0
****************************************
input numbers and their sum  are 3649   3391   7040
binary strings are 111001000001   110100111111   1101110000000
Y pred is: [0 1 1 1 1 1 1 1 0 1 1 1 1]
Y is: [[0 0 0 0 0 0 0 1 1 1 0 1 1]]
sum predicted by RNN is  [1 1 1 1 0 1 1 1 1 1 1 1 0]
bit-accuracy : 5.0
****************************************
input numbers and their sum  are 2644   2099   4743
binary strings are 101001010100   100000110011   1001010000111
Y pred is: [1 1 1 0 0 1 1 0 0 1 0 0 1]
Y is: [[1 1 1 0 0 0 0 1 0 1 0 0 1]]
sum predicted by RNN is  [1 0 0 1 0 0 1 1 0 0 1 1 1]
bit-accuracy : 10.0
****************************************
input numbers and their sum  are 2490   2055   4545
binary strings are 100110111010   100000000111   1000111000001
Y pred is: [1 0 1 1 1 1 1 0

binary strings are 110001111111   110100010111   1100110010110
Y pred is: [0 1 1 1 1 1 1 0 1 0 0 1 1]
Y is: [[0 1 1 0 1 0 0 1 1 0 0 1 1]]
sum predicted by RNN is  [1 1 0 0 1 0 1 1 1 1 1 1 0]
bit-accuracy : 9.0
****************************************
input numbers and their sum  are 3162   3310   6472
binary strings are 110001011010   110011101110   1100101001000
Y pred is: [0 0 1 1 1 1 0 1 1 0 0 1 1]
Y is: [[0 0 0 1 0 0 1 0 1 0 0 1 1]]
sum predicted by RNN is  [1 1 0 0 1 1 0 1 1 1 1 0 0]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 3112   3545   6657
binary strings are 110000101000   110111011001   1101000000001
Y pred is: [1 0 0 0 1 1 1 1 1 1 0 1 1]
Y is: [[1 0 0 0 0 0 0 0 0 1 0 1 1]]
sum predicted by RNN is  [1 1 0 1 1 1 1 1 1 0 0 0 1]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 2630   4074   6704
binary strings are 101001000110   111111101010   1101000110000
Y pred is: [0 0 1 1 1 0 0 1 

Y pred is: [0 1 1 1 1 1 1 0 1 1 1 1 0 1]
Y is: [[0 1 1 1 1 1 0 0 1 1 1 0 0 1]]
sum predicted by RNN is  [1 0 1 1 1 1 0 1 1 1 1 1 1 0]
bit-accuracy : 12.0
****************************************
input numbers and their sum  are 5319   6742   12061
binary strings are 1010011000111   1101001010110   10111100011101
Y pred is: [1 0 1 1 1 0 0 1 1 0 1 1 0 1]
Y is: [[1 0 1 1 1 0 0 0 1 1 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 0 1 1 0 0 1 1 1 0 1]
bit-accuracy : 12.0
****************************************
input numbers and their sum  are 7167   5231   12398
binary strings are 1101111111111   1010001101111   11000001101110
Y pred is: [0 1 1 1 1 1 1 1 1 1 1 1 0 1]
Y is: [[0 1 1 1 0 1 1 0 0 0 0 0 1 1]]
sum predicted by RNN is  [1 0 1 1 1 1 1 1 1 1 1 1 1 0]
bit-accuracy : 7.0
****************************************
input numbers and their sum  are 4264   5742   10006
binary strings are 1000010101000   1011001101110   10011100010110
Y pred is: [0 1 1 0 1 0 1 1 1 0 1 1 0 1]
Y is: [[0 1 1 0 1 

binary strings are 1000010111110   1110110110011   10111001110001
Y pred is: [1 0 1 1 0 1 1 0 1 1 0 1 1 1]
Y is: [[1 0 0 0 1 1 1 0 0 1 1 1 0 1]]
sum predicted by RNN is  [1 1 1 0 1 1 0 1 1 0 1 1 0 1]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 6830   7802   14632
binary strings are 1101010101110   1111001111010   11100100101000
Y pred is: [0 0 1 1 1 1 1 1 1 0 1 0 1 1]
Y is: [[0 0 0 1 0 1 0 0 1 0 0 1 1 1]]
sum predicted by RNN is  [1 1 0 1 0 1 1 1 1 1 1 1 0 0]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 4664   5227   9891
binary strings are 1001000111000   1010001101011   10011010100011
Y pred is: [1 1 0 0 1 1 1 1 0 1 1 1 0 1]
Y is: [[1 1 0 0 0 1 0 1 0 1 1 0 0 1]]
sum predicted by RNN is  [1 0 1 1 1 0 1 1 1 1 0 0 1 1]
bit-accuracy : 11.0
****************************************
input numbers and their sum  are 5980   7599   13579
binary strings are 1011101011100   1110110101111   1101010000

sum predicted by RNN is  [1 1 0 1 1 1 1 1 1 0 1 1 0 1 1]
bit-accuracy : 11.0
****************************************
input numbers and their sum  are 13357   11199   24556
binary strings are 11010000101101   10101110111111   101111111101100
Y pred is: [0 1 1 1 1 1 1 1 1 1 1 1 1 0 1]
Y is: [[0 0 1 1 0 1 1 1 1 1 1 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 1 1 1 1 1 1 1 1 1 1 0]
bit-accuracy : 13.0
****************************************
input numbers and their sum  are 10587   8738   19325
binary strings are 10100101011011   10001000100010   100101101111101
Y pred is: [1 0 1 1 1 1 1 1 0 1 1 0 1 0 1]
Y is: [[1 0 1 1 1 1 1 0 1 1 0 1 0 0 1]]
sum predicted by RNN is  [1 0 1 0 1 1 0 1 1 1 1 1 1 0 1]
bit-accuracy : 10.0
****************************************
input numbers and their sum  are 13771   14541   28312
binary strings are 11010111001011   11100011001101   110111010011000
Y pred is: [0 1 1 0 1 1 0 1 1 1 0 1 0 1 1]
Y is: [[0 0 0 1 1 0 0 1 0 1 1 1 0 1 1]]
sum predicted by RNN is  [

bit-accuracy : 10.0
****************************************
input numbers and their sum  are 8727   13613   22340
binary strings are 10001000010111   11010100101101   101011101000100
Y pred is: [0 1 1 1 1 1 1 0 0 1 1 1 0 0 1]
Y is: [[0 0 1 0 0 0 1 0 1 1 1 0 1 0 1]]
sum predicted by RNN is  [1 0 0 1 1 1 0 0 1 1 1 1 1 1 0]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 10220   16311   26531
binary strings are 10011111101100   11111110110111   110011110100011
Y pred is: [1 1 0 1 1 0 1 1 1 1 1 1 1 0 1]
Y is: [[1 1 0 0 0 1 0 1 1 1 1 0 0 1 1]]
sum predicted by RNN is  [1 0 1 1 1 1 1 1 1 0 1 1 0 1 1]
bit-accuracy : 8.0
****************************************
input numbers and their sum  are 8715   10761   19476
binary strings are 10001000001011   10101000001001   100110000010100
Y pred is: [0 1 1 0 1 0 0 0 0 0 1 1 1 0 1]
Y is: [[0 0 1 0 1 0 0 0 0 0 1 1 0 0 1]]
sum predicted by RNN is  [1 0 1 1 1 0 0 0 0 0 1 0 1 1 0]
bit-accuracy : 13.0
*********

binary strings are 100100001111001   110011011110111   1010111101110000
Y pred is: [0 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1]
Y is: [[0 0 0 0 1 1 1 0 1 1 1 1 0 1 0 1]]
sum predicted by RNN is  [1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 0]
bit-accuracy : 9.0
****************************************
input numbers and their sum  are 32283   30687   62970
binary strings are 111111000011011   111011111011111   1111010111111010
Y pred is: [0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1]
Y is: [[0 1 0 1 1 1 1 1 1 0 1 0 1 1 1 1]]
sum predicted by RNN is  [1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0]
bit-accuracy : 14.0
****************************************
input numbers and their sum  are 31698   26308   58006
binary strings are 111101111010010   110011011000100   1110001010010110
Y pred is: [0 1 1 0 1 0 0 1 1 1 1 1 1 0 1 1]
Y is: [[0 1 1 0 1 0 0 1 0 1 0 0 0 1 1 1]]
sum predicted by RNN is  [1 1 0 1 1 1 1 1 1 0 0 1 0 1 1 0]
bit-accuracy : 11.0
****************************************
input numbers and their sum  are 17776   23752   41528

sum predicted by RNN is  [1 0 1 1 1 0 1 1 1 1 1 1 0 1 0 1]
bit-accuracy : 14.0
****************************************
input numbers and their sum  are 31516   23505   55021
binary strings are 111101100011100   101101111010001   1101011011101101
Y pred is: [1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 1]
Y is: [[1 0 1 1 0 1 1 1 0 1 1 0 1 0 1 1]]
sum predicted by RNN is  [1 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1]
bit-accuracy : 15.0
****************************************
input numbers and their sum  are 26587   24605   51192
binary strings are 110011111011011   110000000011101   1100011111111000
Y pred is: [0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1]
Y is: [[0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1]]
sum predicted by RNN is  [1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 0]
bit-accuracy : 12.0
****************************************
input numbers and their sum  are 27882   20163   48045
binary strings are 110110011101010   100111011000011   1011101110101101
Y pred is: [1 0 1 1 0 1 0 1 1 1 0 1 1 1 0 1]
Y is: [[1 0 1 1 0 1 0 1 1 1 0 1 1 1 0 1]

sum predicted by RNN is  [1 0 1 1 0 1 0 1 1 1 0 0 0 1 1 1 0]
bit-accuracy : 6.0
****************************************
input numbers and their sum  are 41838   43668   85506
binary strings are 1010001101101110   1010101010010100   10100111000000010
Y pred is: [0 1 0 1 1 1 1 1 1 0 1 1 1 0 1 0 1]
Y is: [[0 1 0 0 0 0 0 0 0 1 1 1 0 0 1 0 1]]
sum predicted by RNN is  [1 0 1 0 1 1 1 0 1 1 1 1 1 1 0 1 0]
bit-accuracy : 9.0
****************************************
input numbers and their sum  are 33993   45305   79298
binary strings are 1000010011001001   1011000011111001   10011010111000010
Y pred is: [0 1 0 0 1 1 0 1 1 0 1 1 0 1 1 0 1]
Y is: [[0 1 0 0 0 0 1 1 1 0 1 0 1 1 0 0 1]]
sum predicted by RNN is  [1 0 1 1 0 1 1 0 1 1 0 1 1 0 0 1 0]
bit-accuracy : 11.0
****************************************
input numbers and their sum  are 60045   39186   99231
binary strings are 1110101010001101   1001100100010010   11000001110011111
Y pred is: [1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 1]
Y is: [[1 1 1 1 1

Y is: [[1 1 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 1]]
sum predicted by RNN is  [1 0 1 0 0 1 1 1 1 0 1 1 0 1 0 1 1 1]
bit-accuracy : 13.0
****************************************
input numbers and their sum  are 124099   78385   202484
binary strings are 11110010011000011   10011001000110001   110001011011110100
Y pred is: [0 1 1 0 1 1 1 1 1 0 1 1 0 0 1 1 0 1]
Y is: [[0 0 1 0 1 1 1 1 0 1 1 0 1 0 0 0 1 1]]
sum predicted by RNN is  [1 0 1 1 0 0 1 1 0 1 1 1 1 1 0 1 1 0]
bit-accuracy : 10.0
****************************************
input numbers and their sum  are 109762   129882   239644
binary strings are 11010110011000010   11111101101011010   111010100000011100
Y pred is: [0 0 1 1 1 1 0 1 1 1 1 0 1 0 1 1 1 1]
Y is: [[0 0 1 1 1 0 0 0 0 0 0 1 0 1 0 1 1 1]]
sum predicted by RNN is  [1 1 1 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0]
bit-accuracy : 9.0
****************************************
input numbers and their sum  are 68697   89108   157805
binary strings are 10000110001011001   10101110000010100   1001101

****************************************
input numbers and their sum  are 76863   125483   202346
binary strings are 10010110000111111   11110101000101011   110001011001101010
Y pred is: [0 1 1 1 1 0 1 1 0 0 1 1 1 0 1 1 0 1]
Y is: [[0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 1]]
sum predicted by RNN is  [1 0 1 1 0 1 1 1 0 0 1 1 0 1 1 1 1 0]
bit-accuracy : 9.0
****************************************
input numbers and their sum  are 82274   90315   172589
binary strings are 10100000101100010   10110000011001011   101010001000101101
Y pred is: [1 0 1 1 0 1 0 1 1 1 0 0 0 0 0 1 0 1]
Y is: [[1 0 1 1 0 1 0 0 0 1 0 0 0 1 0 1 0 1]]
sum predicted by RNN is  [1 0 1 0 0 0 0 0 1 1 1 0 1 0 1 1 0 1]
bit-accuracy : 15.0
****************************************
input numbers and their sum  are 116159   95133   211292
binary strings are 11100010110111111   10111001110011101   110011100101011100
Y pred is: [0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1]
Y is: [[0 0 1 1 1 0 1 0 1 0 0 1 1 1 0 0 1 1]]
sum predicted by RNN is

binary strings are 101011010001111010   101110100110010100   1011001111000001110
Y pred is: [0 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 0 1]
Y is: [[0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 0]
bit-accuracy : 13.0
****************************************
input numbers and their sum  are 191000   242036   433036
binary strings are 101110101000011000   111011000101110100   1101001101110001100
Y pred is: [0 0 1 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0 1]
Y is: [[0 0 1 1 0 0 0 1 1 1 0 1 1 0 0 1 0 1 1]]
sum predicted by RNN is  [1 0 1 1 1 0 1 0 1 1 0 1 1 1 0 1 1 0 0]
bit-accuracy : 11.0
****************************************
input numbers and their sum  are 221806   238846   460652
binary strings are 110110001001101110   111010010011111110   1110000011101101100
Y pred is: [0 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1]
Y is: [[0 0 1 1 0 1 1 0 1 1 1 0 0 0 0 0 1 1 1]]
sum predicted by RNN is  [1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0]
bit-accuracy : 13.0
*******

****************************************
input numbers and their sum  are 190553   151339   341892
binary strings are 101110100001011001   100100111100101011   1010011011110000100
Y pred is: [0 1 1 0 1 1 1 0 1 1 1 0 1 1 0 1 1 0 1]
Y is: [[0 0 1 0 0 0 0 1 1 1 1 0 1 1 0 0 1 0 1]]
sum predicted by RNN is  [1 0 1 1 0 1 1 0 1 1 1 0 1 1 1 0 1 1 0]
bit-accuracy : 13.0
****************************************
input numbers and their sum  are 168712   211137   379849
binary strings are 101001001100001000   110011100011000001   1011100101111001001
Y pred is: [1 0 0 1 0 0 0 1 1 1 1 0 0 1 1 0 1 1 1]
Y is: [[1 0 0 1 0 0 1 1 1 1 0 1 0 0 1 1 1 0 1]]
sum predicted by RNN is  [1 1 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1]
bit-accuracy : 13.0
****************************************
input numbers and their sum  are 169596   245622   415218
binary strings are 101001011001111100   111011111101110110   1100101010111110010
Y pred is: [0 1 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1]
Y is: [[0 1 0 0 1 1 1 1 1 0 1 0 1 0 1 0 0

input numbers and their sum  are 193529   164072   357601
binary strings are 101111001111111001   101000000011101000   1010111010011100001
Y pred is: [1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1]
Y is: [[1 0 0 0 0 1 1 1 0 0 1 0 1 1 1 0 1 0 1]]
sum predicted by RNN is  [1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1]
bit-accuracy : 16.0
****************************************
input numbers and their sum  are 173729   178556   352285
binary strings are 101010011010100001   101011100101111100   1010110000000011101
Y pred is: [1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 1]
Y is: [[1 0 1 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1]]
sum predicted by RNN is  [1 0 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1]
bit-accuracy : 11.0
****************************************
input numbers and their sum  are 254828   210620   465448
binary strings are 111110001101101100   110011011010111100   1110001101000101000
Y pred is: [0 0 0 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 1]
Y is: [[0 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0 1 1 1]]
sum predicted by RNN is  [1 1 0 1 

binary strings are 1001111011001000000   1111110011100001101   11001101110101001101
Y pred is: [1 0 1 1 0 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1]
Y is: [[1 0 1 1 0 0 1 0 1 0 1 1 1 0 1 1 0 0 1 1]]
sum predicted by RNN is  [1 0 1 1 1 1 0 1 1 1 0 1 0 1 0 0 1 1 0 1]
bit-accuracy : 17.0
****************************************
input numbers and their sum  are 456038   475212   931250
binary strings are 1101111010101100110   1110100000001001100   11100011010110110010
Y pred is: [0 1 0 1 1 0 0 1 1 0 1 0 0 1 0 1 1 0 1 1]
Y is: [[0 1 0 0 1 1 0 1 1 0 1 0 1 1 0 0 0 1 1 1]]
sum predicted by RNN is  [1 1 0 1 1 0 1 0 0 1 0 1 1 0 0 1 1 0 1 0]
bit-accuracy : 14.0
****************************************
input numbers and their sum  are 362152   437378   799530
binary strings are 1011000011010101000   1101010110010000010   11000011001100101010
Y pred is: [0 1 0 1 0 1 0 0 1 1 0 1 1 0 1 0 1 1 0 1]
Y is: [[0 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 0 0 1 1]]
sum predicted by RNN is  [1 0 1 1 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0]


input numbers and their sum  are 372288   332189   704477
binary strings are 1011010111001000000   1010001000110011101   10101011111111011101
Y pred is: [1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1]
Y is:[1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0 1]
bit-accuracy : 14.0
****************************************
input numbers and their sum  are 492869   292787   785656
binary strings are 1111000010101000101   1000111011110110011   10111111110011111000
Y pred is: [0 1 1 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1]
Y is: [[0 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1]]
sum predicted by RNN is  [1 0 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 1 1 0]
bit-accuracy : 15.0
****************************************
input numbers and their sum  are 413365   322534   735899
binary strings are 1100100111010110101   1001110101111100110   10110011101010011011
Y pred is: [1 1 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 1 1]
Y is: [[1 1 0 1 1 0 0 1 0 1 0 1 1 1 0 0 1 1 0 1]]
sum predicted by RNN is  [1 1 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 1 1]
bit-accuracy

In [9]:
plt.plot(string_len,bit_accuracy)
plt.xlabel('string length'); plt.ylabel('bit-accuracy'); plt.xticks(string_len,string_len)
plt.ylim([0,1.1]); 
plt.show()

__Question 1:__ Recall that the model was trained on bit-strings of length 3. When tested on longer distances, the accuracy drops (check the plot above).

1. How does the state-size affect the accuracy for longer strings? Retrain the model with different state-sizes, specifically 50 and 100, and check how the accuracy for the longer strings is affected. Can you explain this trend?<br><br>

2. How does training on longer bit-sequences affect the generalization ability:
    * Train on 5 and 10 bit-strings, while keeping the state-size at 10. What do you observe?
    * Now, again train on bit-strings of length 5, but increase the state size to 100. What trend do you observe?
   

__Question 2:__ [Challenge question] In the current setup, the model predicts a single `score`, and uses the squared-loss. In the lectures we saw using training RNNs to predict output probabilities for each category (=2 in our case, one for 0, and one for 1), and using softmax followed by cross-entropy as the loss function:

Still, use affine (or linear) projection from hidden-state to output, but now regress to two "scores" (what should be dimensionality of `W_{hy}`?):

$$s_t = W_{hy}h_t + b_{hy}$$

This output now should converted to "probabilities" using `softmax`, and then errors should be back-propagated using the cross-entropy loss function:

$$y_t^i = \frac{\exp(s_t^i)}{\sum_{j=1}^{j=2} \exp(s_t^j)}$$

Note the "2" in the above equation is for 2 scores -- corresponding to 0 and 1 respectively.

The cross-entropy loss is:

$$\mathcal{L}_t(y_t,\tilde{y_t}) = - (1-\tilde{y_t})\log(y_t^1) - \tilde{y_t}\log(y_t^2) $$

where, $y_t$ are the predicted log-probabilties (or score above) and $\tilde{y_t}$ is the ground-truth target (i.e., either 0 or 1).

__Note__ : Make sure you understand how this equation corresponds to what you saw in the lecture.<br><br>


You might notice that the steps --- `softmax` followed by `cross-entropy` loss, first exponentiate and then take the log of the score. This is numerically unstable due to under/overflow (in the exponentiation). As this is a widely used operation, `PyTorch` combines these two steps in their [`nn.CrossEntropyLoss`](http://pytorch.org/docs/master/nn.html#crossentropyloss) function. So if we use this function, we do not have to explicitly use `softmax`.


__Part A__
1. Modify the model to predict two output probabilities -- for 0 and 1 respectively at each time-step.
2. Change the loss function from `nn.MSELoss` to `nn.CrossEntropyLoss`. Refer to the documentation of `nn.CrossEntropy` [here](http://pytorch.org/docs/master/nn.html#crossentropyloss).
3. Modify the "testing" code to get output from the model and verify that you are getting sensible outputs. 

__Part B__
1. In Q1 above, with the MSE loss, what was the minimum length of training bit-strings required to achieve perfect generalization?
2. What is the minimum length of training bit-strings required to achieve perferct generalization on longer test bit-string, when using the new model trained using the CrossEntropy loss function? Explain why / why not it is different.

Phew! That required some serious digging into PyTorch. But we are only getting started, marching on...
