Sequences: data that is sort of "interconnected" and their sequence form meaning: like audio, video, text...
using a classical neural network to take or output recurrent data is limiting since it will force us to abide by some size constraints on the input and output. Therefore, we use recurrent neural networks.

Our objective is to make a neural net that can perform sum operations. We first need it to train to extract meaning out of literals, and then learn how to sum them up.

Input is a tensor. (Matrices are 2D tensors) Tensors can represent data in any dimension.

We will convert our data into one-hot encoding vectors.

Model parts 
(1) encoder: takes the data and outputs a single vector representation of the data through a single SimpleRNN layer with a number of hidden units. It is fed into the...
(2) decoder: 


" To achieve this single vector representation of the entire input, we will use the RepeatVector layer and specify the number of times it should repeat. "

repeat layers reshapes the data: 
dense layer: layers of neurons are connected and this is the layer that outputs the predicted sequence - follows LSTM layers


In [6]:
#input should be a string, and output should be a string prediction too.

import numpy as np

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import TimeDistributed, Dense, Dropout, SimpleRNN, RepeatVector
from tensorflow.keras.callbacks import EarlyStopping, LambdaCallback
from termcolor import colored

In [14]:
#to generate our data, we willneed to see all the possible constituents of the sequence: 0 to 9 and a +.

"""
some notes:
(1) enumerate takes a list or string and outputs the index and its associated entry
"""

possible_chars = "0123456789+"

print(list(enumerate(possible_chars)))

char_to_index = dict((c,i) for i,c in enumerate(possible_chars))
index_to_char = dict((i,c) for i,c in enumerate(possible_chars))

print("If chars are the keys then our dictionary is:",char_to_index)
print("If indexes are the keys then our dictionary is:",index_to_char)


"""
now our inputs cannot be just string - we need to do some data preprocessing. RNN's expect tensors as input! We will
(1) do one-hot-encoding for each possible character - therefore, we will have 10 features.
(2) our input will be a bunch of vectors of length 10, each representing a character in the input.
"""

[(0, '0'), (1, '1'), (2, '2'), (3, '3'), (4, '4'), (5, '5'), (6, '6'), (7, '7'), (8, '8'), (9, '9'), (10, '+')]
If chars are the keys then our dictionary is: {'0': 0, '1': 1, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7, '8': 8, '9': 9, '+': 10}
If indexes are the keys then our dictionary is: {0: '0', 1: '1', 2: '2', 3: '3', 4: '4', 5: '5', 6: '6', 7: '7', 8: '8', 9: '9', 10: '+'}


In [22]:
#let's create a function to generate our data.

def generate_data():
    op1 = np.random.randint(0,100)
    op2 = np.random.randint(0,100)
    
    x = str(op1) + "+" + str(op2)
    label = str(op1+op2)
    
    return x,label
    
generate_data()
    

('49+51', '100')

In [24]:
#now we develop our model.

#typical neural networks: would pad the input to make the size fixed AND would only test for 
#detection of tokens NOT the order in which they're linked. 

#example: not terrible - test for +ve and -ve connotations

#keras is a high level API that uses tensorflow for its backend.

#simple RNN layer: a fully connected layer whose outputs are fed back into the network - uses tanh for activation

hidden_units = 128 
max_time_steps = 5 #first operand (2) + plus sign (1) + second operand (2)

model = Sequential([
    #this layer expects inputs of no defined numbers of rows (flexible tensor size) but each consituent would have a size of 10
    #to reflect one-hot-encoding of a single character.
    #Purpose: outputs the "context vector" - we have taken an input, but how does each part of the input "relate"?
    SimpleRNN(hidden_units, input_shape=(None, len(possible_chars))), 
    #to be able to take this context vector for each time step, we would need to duplicate it for the next layer to
    #have a context vector for each time step. 
    RepeatVector(max_time_steps),
    #now we do the decoder part of the architecture.
    #the first thing to do is to take this context vectors and, based on them, generate some output sequence. The output will 
    #be the probabilities for various characters  @each time step.
    SimpleRNN(hidden_units, return_sequences = True),
    TimeDistributed(Dense(len(possible_chars),activation='softmax'))
])

model.compile(loss = 'categorical_crossentropy', #used with classification
              optimizer = 'adam',
              metrics=['accuracy'] #what to care for when training
             )

model.summary()


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
simple_rnn_2 (SimpleRNN)     (None, 128)               17920     
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 5, 128)            0         
_________________________________________________________________
simple_rnn_3 (SimpleRNN)     (None, 5, 128)            32896     
_________________________________________________________________
time_distributed (TimeDistri (None, 5, 11)             1419      
Total params: 52,235
Trainable params: 52,235
Non-trainable params: 0
_________________________________________________________________


In [60]:
#we vectorize and devectorize our examples.

def vectorize_example(x,y):
    #initialize the vector versions of the input and output.
    label = np.zeros((max_time_steps,len(possible_chars)))
    example = np.zeros((max_time_steps,len(possible_chars)))
    
    #we fill our vector such that we pad with 0s from the beginning.
    start_x = max_time_steps -len(x)
    start_y = max_time_steps -len(y)
    
    #one-hot-encoding of our characters
    for(i,c) in enumerate(x): 
        example[i+start_x,char_to_index.get(c)] = 1
        
    #pad with 0's at start
    k = 0
    while(k<start_x):
        example[k,char_to_index.get('0')] = 1
        k=k+1
        
       #one-hot-encoding of our characters
    for(i,c) in enumerate(y): 
        label[i+start_y,char_to_index.get(c)] = 1
        
    #pad with 0's at start
    ll = 0
    while(ll<start_y):
        label[ll,char_to_index.get('0')] = 1
        ll=ll+1
        
    return example,label

e, l = generate_data()
print(e,l)
ev, lv = vectorize_example(e,l)
print(ev, lv)

1+70 71
[[1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] [[1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]


In [62]:
def devectorize_example(example, label):
    string_example = ""
    string_label = ""
    
    for i in range(example.shape[0]):
        for j in range(example.shape[1]):
            if(example[i][j] == 1):
                string_example = string_example + index_to_char.get(j)
    
    for i in range(label.shape[0]):
        for j in range(label.shape[1]):
            if(label[i][j] == 1):
                string_label = string_label + index_to_char.get(j)
    
    return string_example,string_label    
    

#checking...

string_example, string_label = devectorize_example(ev,lv)
print(string_example)



def devectorize_example(example):
    result = [index_to_char[np.argmax(vec)] for i, vec in enumerate(example)]
    return ''.join(result)

devectorize_example(x)

01+70


In [66]:
def create_dataset(num_examples = 2000):
    x = np.zeros((num_examples,max_time_steps, len(possible_chars)))
    y = np.zeros((num_examples,max_time_steps, len(possible_chars)))
    
    for i in range(num_examples):
        e,l = generate_data()
        ev, lv = vectorize_example(e,l)
        x[i] = ev
        y[i] = lv
    return x,y

x,y = create_dataset()
print(x.shape,y.shape)
disp_x,disp_y = devectorize_example(x[0],y[0])
print(disp_x)
print(disp_y)

(2000, 5, 11) (2000, 5, 11)
039+9
00048


In [71]:
#now we train the model.
l_cb = LambdaCallback(on_epoch_end = lambda e,l : print('{:.2f}'.format(l['val_accuracy']),end=" _ "))
es_cb = EarlyStopping(monitor='val_loss', patience = 10)
model.fit(x,y,epochs = 500, batch_size = 256, validation_split = 0.2, verbose = False, callbacks=[es_cb,l_cb])

0.61 _ 0.62 _ 0.63 _ 0.64 _ 0.63 _ 0.65 _ 0.65 _ 0.66 _ 0.66 _ 0.67 _ 0.68 _ 0.68 _ 0.68 _ 0.69 _ 0.69 _ 0.71 _ 0.70 _ 0.71 _ 0.72 _ 0.72 _ 0.72 _ 0.72 _ 0.73 _ 0.73 _ 0.73 _ 0.74 _ 0.75 _ 0.74 _ 0.75 _ 0.75 _ 0.75 _ 0.75 _ 0.74 _ 0.76 _ 0.77 _ 0.76 _ 0.77 _ 0.77 _ 0.77 _ 0.78 _ 0.78 _ 0.79 _ 0.79 _ 0.80 _ 0.79 _ 0.80 _ 0.81 _ 0.81 _ 0.81 _ 0.81 _ 0.83 _ 0.83 _ 0.83 _ 0.83 _ 0.84 _ 0.85 _ 0.85 _ 0.86 _ 0.86 _ 0.86 _ 0.86 _ 0.87 _ 0.87 _ 0.87 _ 0.87 _ 0.86 _ 0.87 _ 0.88 _ 0.88 _ 0.88 _ 0.89 _ 0.89 _ 0.89 _ 0.90 _ 0.89 _ 0.90 _ 0.90 _ 0.90 _ 0.90 _ 0.90 _ 0.90 _ 0.90 _ 0.91 _ 0.91 _ 0.91 _ 0.92 _ 0.92 _ 0.92 _ 0.92 _ 0.92 _ 0.91 _ 0.93 _ 0.92 _ 0.92 _ 0.93 _ 0.93 _ 0.93 _ 0.93 _ 0.93 _ 0.92 _ 0.93 _ 0.93 _ 0.93 _ 0.94 _ 0.93 _ 0.94 _ 0.94 _ 0.94 _ 0.94 _ 0.94 _ 0.93 _ 0.94 _ 0.94 _ 0.94 _ 0.94 _ 0.94 _ 0.94 _ 0.95 _ 0.94 _ 0.94 _ 0.94 _ 0.95 _ 0.95 _ 0.95 _ 0.94 _ 0.94 _ 0.94 _ 0.95 _ 0.95 _ 0.95 _ 0.95 _ 0.95 _ 0.95 _ 0.95 _ 0.95 _ 0.95 _ 0.95 _ 0.95 _ 0.95 _ 0.95 _ 0.95 _ 0.95 _ 0.95 _

<tensorflow.python.keras.callbacks.History at 0x274a9e48788>

In [1]:
#let's try on a test set...
x_test, y_test = create_dataset(15)
predictions = model.predict(x_test)

for i, pred in enumerate(predictions):
    pred_str = devectorize_example(pred)
    y_test_str = devectorize_example(y_test[i])
    x_test_str = devectorize_example(x_test[i])
    col = 'green' if pred_str == y_test_str else 'red'
    outstring = 'Input: {}, Out: {}, Pred: {}'.format(x_test_str, y_test_str, pred_str)
    print(colored(outstring, col))


NameError: name 'create_dataset' is not defined