<a href="https://colab.research.google.com/github/Varmai/Neural_Models/blob/main/NLP_Implementation_from_scratch_with_RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project Implementation

1. Finish shakespeare (but add more training examples where you're not just predicting the end of a sentence, but also next words) and produce a nice sounding sonnet. We'll read each others' sonnets in class.
2. Do a corpus in your own language
3. Finish training addition with the tensorflow RNN (one-hot encoded)
4. Train a *dense* network to add two numbers
5. What's the difference between **dense** networks and **recurrent** networks, in your words? Please be able to answer this question!


## 1. Finishing Shakespeare

### Importing required libraries

In [None]:
import csv
import itertools
import operator
import numpy as np
import nltk
import sys
from datetime import datetime

import matplotlib.pyplot as plt
%matplotlib inline

### Defining the RNN network

In [None]:
import numpy as np

class RNN:
    def __init__(self):
        self.W_xh = np.random.rand(2,3)
        self.W_hh = np.random.rand(3,3)
        self.W_hy = np.random.rand(3,3)
        self.h = np.zeros(3)

    def step(self, x):
        # update the hidden state
        self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh.T, x))

        # compute the output vector
        y = np.dot(self.W_hy, self.h)

        return y

rnn = RNN()
x = np.array([1,0])
y = rnn.step(x) # x is an input vector, y is the RNN's output vector
y

array([0.89841952, 0.89519482, 1.51522638])

In [None]:
# Download NLTK model data (you need to do this once)
nltk.download("book")

[nltk_data] Downloading collection 'book'
[nltk_data]    | 
[nltk_data]    | Downloading package abc to
[nltk_data]    |     C:\Users\91833\AppData\Roaming\nltk_data...
[nltk_data]    |   Package abc is already up-to-date!
[nltk_data]    | Downloading package brown to
[nltk_data]    |     C:\Users\91833\AppData\Roaming\nltk_data...
[nltk_data]    |   Package brown is already up-to-date!
[nltk_data]    | Downloading package chat80 to
[nltk_data]    |     C:\Users\91833\AppData\Roaming\nltk_data...
[nltk_data]    |   Package chat80 is already up-to-date!
[nltk_data]    | Downloading package cmudict to
[nltk_data]    |     C:\Users\91833\AppData\Roaming\nltk_data...
[nltk_data]    |   Package cmudict is already up-to-date!
[nltk_data]    | Downloading package conll2000 to
[nltk_data]    |     C:\Users\91833\AppData\Roaming\nltk_data...
[nltk_data]    |   Package conll2000 is already up-to-date!
[nltk_data]    | Downloading package conll2002 to
[nltk_data]    |     C:\Users\91833\AppData\R

True

### Reading the data

In [None]:
import re
from nltk import tokenize

vocabulary_size = 3000

unknown_token = "UNKNOWN_TOKEN"
sentence_start_token = "SENTENCE_START"
sentence_end_token = "SENTENCE_END"

def clean_roman_numerals(text):
    pattern = r"\b(?=[MDCLXVIΙ])M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})([IΙ]X|[IΙ]V|V?[IΙ]{0,3})\b\.?"
    return re.sub(pattern, '&', text)

# Read the data and append SENTENCE_START and SENTENCE_END tokens
print( "Reading txt file...")
with open(r'data/shakespeare-sonnets.txt', 'r', encoding="cp1252") as f:
    text = f.read()

    text = text.replace(",",".")
    text = text.replace(":",".")
    text = text.replace(";",".")
    text = text.replace("?",".")
    text = text.replace("!",".")

    text = clean_roman_numerals(text)

    sentences = tokenize.sent_tokenize(text)

    # Append SENTENCE_START and SENTENCE_END
    sentences = ["%s %s %s" % (sentence_start_token, x, sentence_end_token) for x in sentences]

print(  "Parsed %d sentences." % (len(sentences)))

# Tokenize the sentences into words
tokenized_sentences = [nltk.word_tokenize(sent) for sent in sentences]

# Count the word frequencies
word_freq = nltk.FreqDist(itertools.chain(*tokenized_sentences))
print(  "Found %d unique words tokens." % len(word_freq.items()))

# Get the most common words and build index_to_word and word_to_index vectors
vocab = word_freq.most_common(vocabulary_size-1)
index_to_word = [x[0] for x in vocab]
index_to_word.append(unknown_token)
word_to_index = dict([(w,i) for i,w in enumerate(index_to_word)])

print("Using vocabulary size %d." % vocabulary_size)
print("The least frequent word in our vocabulary is '%s' and appeared %d times." % (vocab[-1][0], vocab[-1][1]))

# Replace all words not in our vocabulary with the unknown token
for i, sent in enumerate(tokenized_sentences):
    tokenized_sentences[i] = [w if w in word_to_index else unknown_token for w in sent]

print(  "\nExample sentence: '%s'" % sentences[0])
print(  "\nExample sentence after Pre-processing: '%s'" % tokenized_sentences[0])

Reading txt file...
Parsed 2766 sentences.
Found 3400 unique words tokens.
Using vocabulary size 3000.
The least frequent word in our vocabulary is 'miss' and appeared 1 times.

Example sentence: 'SENTENCE_START &

From fairest creatures we desire increase. SENTENCE_END'

Example sentence after Pre-processing: '['SENTENCE_START', '&', 'From', 'fairest', 'creatures', 'we', 'desire', 'increase', '.', 'SENTENCE_END']'


## Training and testing data

In [None]:
X_train = []
y_train = []
for idx,sent in enumerate(tokenized_sentences[:-1]):
    X_train.append([word_to_index[w] for w in sent])
    y_train.append([word_to_index[w] for w in tokenized_sentences[idx+1]][:len([word_to_index[w] for w in sent])])

In [None]:
X_train = np.asarray(X_train)
y_train = np.asarray(y_train)

  X_train = np.asarray(X_train)
  y_train = np.asarray(y_train)


In [None]:
X_train = np.asarray([[word_to_index[w] for w in sent[:-1]] for sent in tokenized_sentences])
y_train = np.asarray([[word_to_index[w] for w in sent[1:]] for sent in tokenized_sentences])

  X_train = np.asarray([[word_to_index[w] for w in sent[:-1]] for sent in tokenized_sentences])
  y_train = np.asarray([[word_to_index[w] for w in sent[1:]] for sent in tokenized_sentences])


## Increasing the number of training examples from 100 to 2000

In [None]:
x_example, y_example = X_train[2000], y_train[2000]
print ("x:\n%s\n%s" % (" ".join([index_to_word[x] for x in x_example]), x_example))
print ("\ny:\n%s\n%s" % (" ".join([index_to_word[x] for x in y_example]), y_example))

x:
SENTENCE_START Seems seeing .
[0, 2921, 554, 2]

y:
Seems seeing . SENTENCE_END
[2921, 554, 2, 1]


### The RNN network

In [None]:
class RNN:
    def __init__(self, word_dim, hidden_dim=100, bptt_truncate=4):
        # Assign instance variables
        self.word_dim = word_dim
        self.hidden_dim = hidden_dim
        self.bptt_truncate = bptt_truncate

        # Randomly initialize the network parameters
        self.U = np.random.uniform(-np.sqrt(1./word_dim), np.sqrt(1./word_dim), (hidden_dim, word_dim))
        self.V = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (word_dim, hidden_dim))
        self.W = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

def forward_propagation(self, x):
    # The total number of time steps
    T = len(x)

    # During forward propagation we save all hidden states in s because need them later.
    # We add one additional element for the initial hidden, which we set to 0
    s = np.zeros((T + 1, self.hidden_dim))
    s[-1] = np.zeros(self.hidden_dim)

    # The outputs at each time step. Again, we save them for later.
    o = np.zeros((T, self.word_dim))

    # For each time step...
    for t in np.arange(T):
        # Note that we are indxing U by x[t]. This is the same as multiplying U with a one-hot vector.
        s[t] = np.tanh(self.U[:,x[t]] + self.W.dot(s[t-1]))
        o[t] = softmax(self.V.dot(s[t]))

    return [o, s]

def predict(self, x):
    # Perform forward propagation and return index of the highest score
    o, s = self.forward_propagation(x)
    return np.argmax(o, axis=1)

In [None]:
RNN.forward_propagation = forward_propagation

In [None]:
RNN.predict = predict

In [None]:
print ("x:\n%s\n%s" % (" ".join([index_to_word[x] for x in X_train[10]]), X_train[10]))

x:
SENTENCE_START Thou that art now the worlds fresh ornament .
[0, 101, 12, 57, 79, 4, 449, 351, 450, 2]


In [None]:
np.random.seed(17)
model = RNN(vocabulary_size)
o, s = model.forward_propagation(X_train[10])
print (o.shape, o)

(10, 3000) [[0.0003343  0.00033593 0.00033158 ... 0.00033856 0.00033291 0.00033345]
 [0.00033639 0.0003326  0.00032985 ... 0.0003355  0.0003329  0.00033173]
 [0.00033507 0.00033106 0.00033585 ... 0.00033147 0.00033188 0.00033758]
 ...
 [0.00033323 0.00033677 0.00033394 ... 0.00033089 0.00033101 0.00033737]
 [0.0003355  0.0003347  0.00033586 ... 0.00033185 0.00033528 0.00033442]
 [0.0003325  0.00032999 0.00033245 ... 0.0003323  0.00033009 0.00033284]]


In [None]:
predictions = model.predict(X_train[10])
print (predictions.shape, predictions)
print ("x:\n%s" % (" ".join([index_to_word[x] for x in predictions])))

(10,) [ 474  720 1253 2170 1863 1007  769 1203 2398 2383]
x:


### The loss function

In [None]:
def calculate_total_loss(self, x, y):
    L = 0
    # For each sentence...
    for i in np.arange(len(y)):
        o, s = self.forward_propagation(x[i])
#         print('o ka shape ',o.ndim)
        # We only care about our prediction of the "correct" words
        correct_word_predictions = o[np.arange(len(y[i])), y[i]]
#         correct_word_predictions = o[np.arange(len(y[i]))]
        # Add to the loss based on how off we were
        L += -1 * np.sum(np.log(correct_word_predictions))
    return L

def calculate_loss(self, x, y):
    # Divide the total loss by the number of training examples
    N = np.sum((len(y_i) for y_i in y))
    return self.calculate_total_loss(x,y)/N

In [None]:
RNN.calculate_total_loss = calculate_total_loss
RNN.calculate_loss = calculate_loss

In [None]:
# Limit to 1000 examples to save time
print ("Expected Loss for random predictions: %f" % np.log(vocabulary_size))
print ("Actual loss: %f" % model.calculate_loss(X_train[:1000], y_train[:1000]))

Expected Loss for random predictions: 8.006368


  N = np.sum((len(y_i) for y_i in y))


Actual loss: 8.006702


## Training with Backpropagation Through Time

We iterate over all our training examples and during each iteration we nudge the parameters into a direction that reduces the error.

These directions are given by the gradients on the loss: $\frac{\partial L}{\partial U}, \frac{\partial L}{\partial V}, \frac{\partial L}{\partial W}$.

We also need a *learning rate*, which defines how big of a step we want to make in each iteration.

Because the layer weight parameters are shared by all time steps in the network, the gradient at each output depends not only on the calculations of the current time step, but also the previous time steps!

We take as input a training example $(x,y)$ and return the gradients $\frac{\partial L}{\partial U}, \frac{\partial L}{\partial V}, \frac{\partial L}{\partial W}$.

In [None]:
def bptt(self, x, y):
    T = len(y)

    # Perform forward propagation
    o, s = self.forward_propagation(x)

    # We accumulate the gradients in these variables
    dLdU = np.zeros(self.U.shape)
    dLdV = np.zeros(self.V.shape)
    dLdW = np.zeros(self.W.shape)
    delta_o = o
    delta_o[np.arange(len(y)), y] -= 1.

    # For each output backwards...
    for t in np.arange(T)[::-1]:
        dLdV += np.outer(delta_o[t], s[t].T)

        # Initial delta calculation
        delta_t = self.V.T.dot(delta_o[t]) * (1 - (s[t] ** 2))

        # Backpropagation through time (for at most self.bptt_truncate steps)
        for bptt_step in np.arange(max(0, t-self.bptt_truncate), t+1)[::-1]:

            # print "Backpropagation step t=%d bptt step=%d " % (t, bptt_step)
            dLdW += np.outer(delta_t, s[bptt_step-1])
            dLdU[:,x[bptt_step]] += delta_t

            # Update delta for next step
            delta_t = self.W.T.dot(delta_t) * (1 - s[bptt_step-1] ** 2)

    return [dLdU, dLdV, dLdW]

RNN.bptt = bptt

In [None]:
def gradient_check(self, x, y, h=0.001, error_threshold=0.01):

    # Calculate the gradients using backpropagation. We want to checker if these are correct.
    bptt_gradients = model.bptt(x, y)

    # List of all parameters we want to check.
    model_parameters = ['U', 'V', 'W']

    # Gradient check for each parameter
    for pidx, pname in enumerate(model_parameters):

        # Get the actual parameter value from the mode, e.g. model.W
        parameter = operator.attrgetter(pname)(self)
        print("Performing gradient check for parameter %s with size %d." % (pname, np.prod(parameter.shape)))

        # Iterate over each element of the parameter matrix, e.g. (0,0), (0,1), ...
        it = np.nditer(parameter, flags=['multi_index'], op_flags=['readwrite'])
        while not it.finished:
            ix = it.multi_index

            # Save the original value so we can reset it later
            original_value = parameter[ix]

            # Estimate the gradient using (f(x+h) - f(x-h))/(2*h)
            parameter[ix] = original_value + h
            gradplus = model.calculate_total_loss([x],[y])
            parameter[ix] = original_value - h
            gradminus = model.calculate_total_loss([x],[y])
            estimated_gradient = (gradplus - gradminus)/(2*h)

            # Reset parameter to original value
            parameter[ix] = original_value

            # The gradient for this parameter calculated using backpropagation
            backprop_gradient = bptt_gradients[pidx][ix]

            # calculate The relative error: (|x - y|/(|x| + |y|))
            relative_error = np.abs(backprop_gradient - estimated_gradient) / (
                                np.abs(backprop_gradient) + np.abs(estimated_gradient))

               # If the error is to large fail the gradient check
            if relative_error > error_threshold:
                print( "Gradient Check ERROR: parameter=%s ix=%s" % (pname, ix))
                print( "+h Loss: %f" % gradplus)
                print( "-h Loss: %f" % gradminus)
                print( "Estimated_gradient: %f" % estimated_gradient)
                print( "Backpropagation gradient: %f" % backprop_gradient)
                print( "Relative Error: %f" % relative_error)
                return
            it.iternext()

        print( "Gradient check for parameter %s passed." % (pname))

RNN.gradient_check = gradient_check

In [None]:
grad_check_vocab_size = 100
np.random.seed(10)
model = RNN(grad_check_vocab_size, 10, bptt_truncate=1000)
model.gradient_check([0,1,2,3], [1,2,3,4])

Performing gradient check for parameter U with size 1000.
Gradient check for parameter U passed.
Performing gradient check for parameter V with size 1000.
Gradient check for parameter V passed.
Performing gradient check for parameter W with size 100.
Gradient check for parameter W passed.


  relative_error = np.abs(backprop_gradient - estimated_gradient) / (


In [None]:
# Performs one step of SGD.
def numpy_sdg_step(self, x, y, learning_rate):
    # Calculate the gradients
    dLdU, dLdV, dLdW = self.bptt(x, y)

    # Change parameters according to gradients and learning rate
    self.U -= learning_rate * dLdU
    self.V -= learning_rate * dLdV
    self.W -= learning_rate * dLdW

RNN.sgd_step = numpy_sdg_step

In [None]:
def train_with_sgd(model, X_train, y_train, learning_rate=0.005, nepoch=100, evaluate_loss_after=5):
    # We keep track of the losses so we can plot them later
    losses = []
    num_examples_seen = 0

    for epoch in range(nepoch):

        # Optionally evaluate the loss
        if (epoch % evaluate_loss_after == 0):
            loss = model.calculate_loss(X_train, y_train)
            losses.append((num_examples_seen, loss))
            time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
            print ("%s: Loss after num_examples_seen=%d epoch=%d: %f" % (time, num_examples_seen, epoch, loss))

            # Adjust the learning rate if loss increases
            if (len(losses) > 1 and losses[-1][1] > losses[-2][1]):
                learning_rate = learning_rate * 0.5
                print ("Setting learning rate to %f" % learning_rate)
            sys.stdout.flush()

        # For each training example...
        for i in range(len(y_train)):

            # One SGD step
            model.sgd_step(X_train[i], y_train[i], learning_rate)
            num_examples_seen += 1

### Description
- Model: The RNN model instance
- X_train: The training data set
- y_train: The training data labels
- learning_rate: Initial learning rate for SGD
- nepoch: Number of times to iterate through the complete dataset
- evaluate_loss_after: Evaluate the loss after this many epochs

## Training with SGD

In [None]:
np.random.seed(17)

# Train on a small subset of the data to see what happens
model = RNN(vocabulary_size)
losses = train_with_sgd(model, X_train[:100], y_train[:100], nepoch=10, evaluate_loss_after=1)

  N = np.sum((len(y_i) for y_i in y))


2023-04-03 19:35:19: Loss after num_examples_seen=0 epoch=0: 8.006920
2023-04-03 19:36:07: Loss after num_examples_seen=100 epoch=1: 7.980236


In [None]:
def generate_sentence(model, senten_max_length):
    # We start the sentence with the start token
    new_sentence = [word_to_index[sentence_start_token]]

    # Repeat until we get an end token and keep our sentences to less than senten_max_length words for now
    while (not new_sentence[-1] == word_to_index[sentence_end_token]) and len(new_sentence) < senten_max_length:
        next_word_probs = model.forward_propagation(new_sentence)
        sampled_word = word_to_index[unknown_token]

        # We don't want to sample unknown words
        while sampled_word == word_to_index[unknown_token]:
            #print(next_word_probs[-1][0])

            # correcting for abnormalities
            #abs_v = [-i if i <0 else i for i in next_word_probs[-1][0]]
            #nrm_v = [i/sum(abs_v) for i in abs_v]
            abs_v = [0 if i <0 else i for i in next_word_probs[-1][0]]
            nrm_v = [i/sum(abs_v) for i in abs_v]

            samples = np.random.multinomial(1, nrm_v)
            sampled_word = np.argmax(samples)

        new_sentence.append(sampled_word)

    #print(new_sentence)
    sentence_str = [index_to_word[x] for x in new_sentence[1:-1]]
    #print(sentence_str)
    return sentence_str

## Generating our own shakespeare sonnets

In [None]:
num_sentences = 10
senten_min_length = 7
senten_max_length = 20

for i in range(num_sentences):
    sent = []
    # We want long sentences, not sentences with one or two words
    while len(sent) < senten_min_length:
        sent = generate_sentence(model, senten_max_length)
    print (" ".join(sent))

will by your sweet & s thee shall am So one So true sweet but sweet to mine
shall thy thee eye by sweet than true by So by s true For true Or than but
one your not shall eyes will own mine eye & which thee own by The For own For
& own then her thee by sweet one The shall mine sweet thine eyes make not mine in
but all one & thee d one your mine s what am & by thee your have all
true sweet in Or her shall beauty own will beauty eye & then but mine beauty her So
her beauty by Or ‘ So thee beauty than true The own true mine then & Or her
one in not to which my beauty by true which The you than So ‘ & make than
by your to your own So thee So but make make your will to by & d So
by mine eyes thine then shall than all thy thee beauty & one their Or to her to


Not so good because the training data is not sufficient & model needs to learn more in order to make sense.

## 2. Doing a corpus in ```Hindi``` Language

In [None]:
import re
from nltk import tokenize

vocabulary_size = 3000

unknown_token = "UNKNOWN_TOKEN"
sentence_start_token = "SENTENCE_START"
sentence_end_token = "SENTENCE_END"

def clean_roman_numerals(text):
    pattern = r"\b(?=[MDCLXVIΙ])M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})([IΙ]X|[IΙ]V|V?[IΙ]{0,3})\b\.?"
    return re.sub(pattern, '&', text)

# Read the data and append SENTENCE_START and SENTENCE_END tokens
print( "Reading txt file...")
with open(r'C:\Users\91833\Downloads\Final Assignment_ Group 2\data\Hindi_Sonnets.txt','r', encoding="utf-8") as f:
    text = f.read()

    text = text.replace(",",".")
    text = text.replace(":",".")
    text = text.replace(";",".")
    text = text.replace("?",".")
    text = text.replace("!",".")

    text = clean_roman_numerals(text)

    sentences = tokenize.sent_tokenize(text)

    # Append SENTENCE_START and SENTENCE_END
    sentences = ["%s %s %s" % (sentence_start_token, x, sentence_end_token) for x in sentences]

print(  "Parsed %d sentences." % (len(sentences)))

# Tokenize the sentences into words
tokenized_sentences = [nltk.word_tokenize(sent) for sent in sentences]

# Count the word frequencies
word_freq = nltk.FreqDist(itertools.chain(*tokenized_sentences))
print(  "Found %d unique words tokens." % len(word_freq.items()))

# Get the most common words and build index_to_word and word_to_index vectors
vocab = word_freq.most_common(vocabulary_size-1)
index_to_word = [x[0] for x in vocab]
index_to_word.append(unknown_token)
word_to_index = dict([(w,i) for i,w in enumerate(index_to_word)])

print("Using vocabulary size %d." % vocabulary_size)
print("The least frequent word in our vocabulary is '%s' and appeared %d times." % (vocab[-1][0], vocab[-1][1]))

# Replace all words not in our vocabulary with the unknown token
for i, sent in enumerate(tokenized_sentences):
    tokenized_sentences[i] = [w if w in word_to_index else unknown_token for w in sent]

print(  "\nExample sentence: '%s'" % sentences[0])
print(  "\nExample sentence after Pre-processing: '%s'" % tokenized_sentences[0])

Reading txt file...
Parsed 6208 sentences.
Found 34479 unique words tokens.
Using vocabulary size 3000.
The least frequent word in our vocabulary is '66' and appeared 6 times.

Example sentence: 'SENTENCE_START 1	000 गज़ल (1) सोच ले तू किधर जा रहा है. SENTENCE_END'

Example sentence after Pre-processing: '['SENTENCE_START', '1', 'UNKNOWN_TOKEN', 'UNKNOWN_TOKEN', '(', '1', ')', 'सोच', 'ले', 'तू', 'UNKNOWN_TOKEN', 'जा', 'रहा', 'है', '.', 'SENTENCE_END']'


## Generating the training dataset

In [None]:
X_train = np.asarray([[word_to_index[w] for w in sent[:-1]] for sent in tokenized_sentences])
y_train = np.asarray([[word_to_index[w] for w in sent[1:]] for sent in tokenized_sentences])

  X_train = np.asarray([[word_to_index[w] for w in sent[:-1]] for sent in tokenized_sentences])
  y_train = np.asarray([[word_to_index[w] for w in sent[1:]] for sent in tokenized_sentences])


In [None]:
x_example, y_example = X_train[2000], y_train[2000]
print ("x:\n%s\n%s" % (" ".join([index_to_word[x] for x in x_example]), x_example))
print ("\ny:\n%s\n%s" % (" ".join([index_to_word[x] for x in y_example]), y_example))


x:
SENTENCE_START UNKNOWN_TOKEN UNKNOWN_TOKEN से UNKNOWN_TOKEN किसान के समक्ष आत्महत्या के अलावा कोई UNKNOWN_TOKEN नहीं है। UNKNOWN_TOKEN UNKNOWN_TOKEN UNKNOWN_TOKEN ( 2000 UNKNOWN_TOKEN ओलंपिक ) भारत के लिए ओलंपिक में व्यक्तिगत UNKNOWN_TOKEN जीतने वाली पहली भारतीय महिला हैं .
[1, 2999, 2999, 7, 2999, 750, 0, 1072, 1361, 0, 245, 51, 2999, 17, 11, 2999, 2999, 2999, 34, 2021, 2999, 2438, 33, 70, 0, 19, 2438, 4, 1602, 2999, 2700, 148, 322, 124, 201, 18, 3]

y:
UNKNOWN_TOKEN UNKNOWN_TOKEN से UNKNOWN_TOKEN किसान के समक्ष आत्महत्या के अलावा कोई UNKNOWN_TOKEN नहीं है। UNKNOWN_TOKEN UNKNOWN_TOKEN UNKNOWN_TOKEN ( 2000 UNKNOWN_TOKEN ओलंपिक ) भारत के लिए ओलंपिक में व्यक्तिगत UNKNOWN_TOKEN जीतने वाली पहली भारतीय महिला हैं . SENTENCE_END
[2999, 2999, 7, 2999, 750, 0, 1072, 1361, 0, 245, 51, 2999, 17, 11, 2999, 2999, 2999, 34, 2021, 2999, 2438, 33, 70, 0, 19, 2438, 4, 1602, 2999, 2700, 148, 322, 124, 201, 18, 3, 2]


In [None]:
print ("x:\n%s\n%s" % (" ".join([index_to_word[x] for x in X_train[10]]), X_train[10]))

x:
SENTENCE_START इस पर पूरे देश की UNKNOWN_TOKEN UNKNOWN_TOKEN हुई हैं .
[1, 20, 13, 380, 89, 5, 2999, 2999, 112, 18, 3]


## Training the model

In [None]:
np.random.seed(17)
model = RNN(vocabulary_size)
o, s = model.forward_propagation(X_train[10])
print (o.shape, o)

(11, 3000) [[0.00033152 0.00033647 0.00033349 ... 0.00033278 0.00033288 0.00033252]
 [0.00033625 0.00033148 0.00033669 ... 0.00032744 0.00033536 0.00033026]
 [0.00033022 0.00033356 0.0003333  ... 0.00033197 0.00033578 0.00032975]
 ...
 [0.00033787 0.00033199 0.00033517 ... 0.00033349 0.00033537 0.00033674]
 [0.00033306 0.00033255 0.00033407 ... 0.00033467 0.00033167 0.00033762]
 [0.00033662 0.00033116 0.00033269 ... 0.00033312 0.00033797 0.00033652]]


## Checking model predictions

In [None]:
predictions = model.predict(X_train[10])
print (predictions.shape, predictions)
print ("x:\n%s" % (" ".join([index_to_word[x] for x in predictions])))

(11,) [ 605  899 2592 2812 1432 2900 1279 1022  554  702  865]
x:
सफल रंग ट्रस्ट परिस्थितियों बचाने समर्पण ममता कवि बेटी इनमें युवाओं


In [None]:
# Limit to 2000 examples to save time
print ("Expected Loss for random predictions: %f" % np.log(vocabulary_size))
print ("Actual loss: %f" % model.calculate_loss(X_train[:2000], y_train[:2000]))

Expected Loss for random predictions: 8.006368


  N = np.sum((len(y_i) for y_i in y))


Actual loss: 8.006612


In [None]:
grad_check_vocab_size = 100
np.random.seed(10)
model = RNN(grad_check_vocab_size, 10, bptt_truncate=1000)
model.gradient_check([0,1,2,3], [1,2,3,4])

Performing gradient check for parameter U with size 1000.


  relative_error = np.abs(backprop_gradient - estimated_gradient) / (


Gradient check for parameter U passed.
Performing gradient check for parameter V with size 1000.
Gradient check for parameter V passed.
Performing gradient check for parameter W with size 100.
Gradient check for parameter W passed.


In [None]:
np.random.seed(17)
model = RNN(vocabulary_size)
%timeit model.sgd_step(X_train[10], y_train[10], 0.005)

54.1 ms ± 2.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [None]:
np.random.seed(17)

# Train on a small subset of the data to see what happens
model = RNN(vocabulary_size)
losses = train_with_sgd(model, X_train[:100], y_train[:100], nepoch=10, evaluate_loss_after=1)

  N = np.sum((len(y_i) for y_i in y))


2023-04-03 19:00:32: Loss after num_examples_seen=0 epoch=0: 8.006711
2023-04-03 19:00:42: Loss after num_examples_seen=100 epoch=1: 6.406952
2023-04-03 19:00:53: Loss after num_examples_seen=200 epoch=2: 5.678251
2023-04-03 19:01:05: Loss after num_examples_seen=300 epoch=3: 5.474514
2023-04-03 19:01:16: Loss after num_examples_seen=400 epoch=4: 5.340279
2023-04-03 19:01:27: Loss after num_examples_seen=500 epoch=5: 5.230854
2023-04-03 19:01:39: Loss after num_examples_seen=600 epoch=6: 5.128836
2023-04-03 19:01:50: Loss after num_examples_seen=700 epoch=7: 5.053961
2023-04-03 19:02:01: Loss after num_examples_seen=800 epoch=8: 5.008823
2023-04-03 19:02:12: Loss after num_examples_seen=900 epoch=9: 4.974727


## Generating text in our native language corpus

In [None]:
num_sentences = 7
senten_min_length = 10
senten_max_length = 20

for i in range(num_sentences):
    sent = []
    # We want long sentences, not sentences with one or two words
    while len(sent) < senten_min_length:
        sent = generate_sentence(model, senten_max_length)
    print (" ".join(sent))

लेकिन पुलिस आज हो पुलिस था। किसी कहा होने होने तरह जब में हैं। बाद ’ इसके इसके
इसके में तरह बाद बाद रूप पुलिस अपने लिए होने आप जब था। पुलिस उन्होंने था। ’ अपने
हैं। अपने अपने हैं। था। पुलिस ' सिंह में कहा कि जाता था। ' का वे आज हैं।
और आप कोई ’ दिया से सरकार होने से सरकार रूप आप में था। था। कहा बाद हैं।
लेकिन ' किसी तरह रूप रूप रूप लेकिन साल हो लिए तरह में आज कहा सरकार समय ’
सरकार गया रूप किसी समय होने कि ' जाने ‘ पुलिस कोई होने इसके तरह पुलिस अपने था।
हैं। इसके बाद पुलिस से किसी और । हैं। । किसी बाद । पुलिस हैं लेकिन इसके ’


## 3. Training addition with Tensorflow RNN(one-hot encoded)

In [None]:
# Might need to upgrade numpy to resolve TF error
!pip install --upgrade numpy



## Importing required libraries

In [None]:
from __future__ import print_function
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
import numpy as np
from six.moves import range

## Defining character encoding class

In [None]:
class CharacterTable(object):
    """Given a set of characters:
    + Encode them to a one-hot integer representation
    + Decode the one-hot or integer representation to their character output
    + Decode a vector of probabilities to their character output
    """
    def __init__(self, chars):
        """Initialize character table.

        # Arguments
            chars: Characters that can appear in the input.
        """
        self.chars = sorted(set(chars))
        self.char_indices = dict((c, i) for i, c in enumerate(self.chars))
        self.indices_char = dict((i, c) for i, c in enumerate(self.chars))

    def encode(self, C, num_rows):
        """One-hot encode given string C.

        # Arguments
            C: string, to be encoded.
            num_rows: Number of rows in the returned one-hot encoding. This is
                used to keep the # of rows for each data the same.
        """
        x = np.zeros((num_rows, len(self.chars)))
        for i, c in enumerate(C):
            x[i, self.char_indices[c]] = 1
        return x

    def decode(self, x, calc_argmax=True):
        """Decode the given vector or 2D array to their character output.

        # Arguments
            x: A vector or a 2D array of probabilities or one-hot representations;
                or a vector of character indices (used with `calc_argmax=False`).
            calc_argmax: Whether to find the character index with maximum
                probability, defaults to `True`.
        """
        if calc_argmax:
            x = x.argmax(axis=-1)
        return ''.join(self.indices_char[x] for x in x)


class colors:
    ok = '\033[92m'
    fail = '\033[91m'
    close = '\033[0m'

In [None]:
words = 'abcdefghijklmnop'
wtable = CharacterTable(words)
wtable.encode('g', 1)

array([[0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [None]:
chars = '0123456789+ '
ctable = CharacterTable(chars)

## Generating the addition of numbers data

In [None]:
# Parameters for the model and dataset.
TRAINING_SIZE = 50000
DIGITS = 3
REVERSE = False

# Maximum length of input is 'int + int' (e.g., '34+78  '). Maximum length of
# int is DIGITS.
MAXLEN = DIGITS + 1 + DIGITS

# All the numbers, plus sign and space for padding.
chars = '0123456789+ '
ctable = CharacterTable(chars)

questions = []
expected = []
seen = set()
print('Generating data...')
while len(questions) < TRAINING_SIZE:
    f = lambda: int(''.join(np.random.choice(list('0123456789'))
                    for i in range(np.random.randint(1, DIGITS + 1))))
    a, b = f(), f()

    # Skip any addition questions we've already seen
    # Also skip any such that x+Y == Y+x (hence the sorting).
    key = tuple(sorted((a, b)))
    if key in seen:
        continue
    seen.add(key)

    # Pad the data with spaces such that it is always MAXLEN.
    q = '{}+{}'.format(a, b)
    query = q + ' ' * (MAXLEN - len(q))
    ans = str(a + b)

    # Answers can be of maximum size DIGITS + 1.
    ans += ' ' * (DIGITS + 1 - len(ans))
    if REVERSE:
        # Reverse the query, e.g., '12+345  ' becomes '  543+21'. (Note the
        # space used for padding.)
        query = query[::-1]

    # store
    questions.append(query)
    expected.append(ans)
    print(query, ans)

print('Total addition questions:', len(questions))

Generating data...
2+9     11  
3+0     3   
8+71    79  
9+440   449 
6+78    84  
50+80   130 
60+47   107 
489+2   491 
277+6   283 
36+15   51  
842+3   845 
0+0     0   
6+798   804 
3+5     8   
992+2   994 
2+7     9   
93+43   136 
523+776 1299
53+119  172 
17+5    22  
91+793  884 
58+9    67  
71+0    71  
2+832   834 
843+0   843 
47+214  261 
2+2     4   
783+36  819 
611+38  649 
9+9     18  
6+547   553 
873+4   877 
9+4     13  
9+66    75  
519+3   522 
4+840   844 
803+54  857 
17+15   32  
6+337   343 
512+81  593 
84+662  746 
0+974   974 
6+6     12  
96+2    98  
507+3   510 
82+675  757 
75+31   106 
59+0    59  
906+81  987 
867+82  949 
43+123  166 
27+752  779 
6+163   169 
7+6     13  
32+87   119 
529+48  577 
7+52    59  
126+92  218 
8+179   187 
502+421 923 
146+80  226 
1+69    70  
834+84  918 
9+573   582 
0+665   665 
91+30   121 
911+761 1672
764+7   771 
7+8     15  
66+4    70  
46+647  693 
9+55    64  
103+70  173 
69+55   124 
599+87  686 
349+82

In [None]:
questions[:3], expected[:3]

(['2+9    ', '3+0    ', '8+71   '], ['11  ', '3   ', '79  '])

In [None]:
ctable.encode(questions[0], MAXLEN)

array([[0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

## Creating training and validation dataset

In [None]:
print('Vectorization into thought:')
x = np.zeros((len(questions), MAXLEN, len(chars)), dtype=bool)
y = np.zeros((len(questions), DIGITS + 1, len(chars)), dtype=bool)
for i, sentence in enumerate(questions):
    x[i] = ctable.encode(sentence, MAXLEN)
for i, sentence in enumerate(expected):
    y[i] = ctable.encode(sentence, DIGITS + 1)

# Shuffle (x, y) in unison as the later parts of x will almost all be larger digits.
indices = np.arange(len(y))
np.random.shuffle(indices)
x = x[indices]
y = y[indices]

# Explicitly set apart 10% for validation data that we never train over.
split_at = len(x) - len(x) // 10
(x_train, x_val) = x[:split_at], x[split_at:]
(y_train, y_val) = y[:split_at], y[split_at:]

print('Training Data:')
print(x_train.shape)
print(y_train.shape)
print()

print('Validation Data:')
print(x_val.shape)
print(y_val.shape)
print()

print('Example:')
print('The first row of input data is encoded internally as:')
print(x_train[0])
print()
print('The first row of output data is encoded internally as:')
print(y_train[0])
print()
print('These internal representations represent these signals:')
print(ctable.decode(x_train[0]))
print(ctable.decode(y_train[0]))

Vectorization into thought:
Training Data:
(45000, 7, 12)
(45000, 4, 12)

Validation Data:
(5000, 7, 12)
(5000, 4, 12)

Example:
The first row of input data is encoded internally as:
[[False False False False False False False False False  True False False]
 [False False False False False False False False False False  True False]
 [False False False False False False False False False  True False False]
 [False  True False False False False False False False False False False]
 [False False False False False False False  True False False False False]
 [ True False False False False False False False False False False False]
 [ True False False False False False False False False False False False]]

The first row of output data is encoded internally as:
[[False False False False False False False False False  True False False]
 [False False False False False False False False False False False  True]
 [False False False False  True False False False False False False False]
 [ True Fa

## Building the model

In [None]:
# Try replacing with GRU, or SimpleRNN.
RNN = layers.LSTM
HIDDEN_SIZE = 128
BATCH_SIZE = 128
LAYERS = 1

print('Build model...')
model = Sequential()

# "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE.
# Note: In a situation where your input sequences have a variable length,
# use input_shape=(None, num_feature).
model.add(RNN(HIDDEN_SIZE, input_shape=(MAXLEN, len(chars))))

# As the decoder RNN's input, repeatedly provide with the last output of
# RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum
# length of output, e.g., when DIGITS=3, max output is 999+999=1998.
model.add(layers.RepeatVector(DIGITS + 1))

# The decoder RNN could be multiple layers stacked or a single layer.
for _ in range(LAYERS):
    # By setting return_sequences to True, return not only the last output but
    # all the outputs so far in the form of (num_samples, timesteps,
    # output_dim). This is necessary as TimeDistributed in the below expects
    # the first dimension to be the timesteps.
    model.add(RNN(HIDDEN_SIZE, return_sequences=True))

# Apply a dense layer to the every temporal slice of an input. For each of step
# of the output sequence, decide which character should be chosen.
# We require DIGITS + 1 output vectors for our result. We will use the same fully
# connected layer (Dense) to output each vector. To use the same layer DIGITS + 1
# times, we wrap it in a TimeDistributed() wrapper layer
model.add(layers.TimeDistributed(layers.Dense(len(chars), activation='softmax')))
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.summary()

Build model...
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm (LSTM)                 (None, 128)               72192     
                                                                 
 repeat_vector (RepeatVector  (None, 4, 128)           0         
 )                                                               
                                                                 
 lstm_1 (LSTM)               (None, 4, 128)            131584    
                                                                 
 time_distributed (TimeDistr  (None, 4, 12)            1548      
 ibuted)                                                         
                                                                 
Total params: 205,324
Trainable params: 205,324
Non-trainable params: 0
_________________________________________________________________


In [None]:
# Train the model each generation and show predictions against the validation
# dataset.
for iteration in range(1, 2):
    print()
    print('-' * 50)
    print('Iteration', iteration)
    model.fit(x_train, y_train,
              batch_size=BATCH_SIZE,
              epochs=1,
              validation_data=(x_val, y_val))

    # Select 10 samples from the validation set at random so we can visualize
    # errors with green and red boxes
    for i in range(10):
        ind = np.random.randint(0, len(x_val))
        rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]
        #preds = model.predict_classes(rowx, verbose=0)
        preds = np.argmax(model.predict(rowx), axis=-1)
        q = ctable.decode(rowx[0])
        correct = ctable.decode(rowy[0])
        print("debugging:")
        print(type(preds[0]))
        print("now decoding...:")
        guess = ctable.decode(preds[0], calc_argmax=False)
        print('Q', q[::-1] if REVERSE else q, end=' ')
        print('T', correct, end=' ')
        if correct == guess:
            print(colors.ok + '☑' + colors.close, end=' ')
        else:
            print(colors.fail + '☒' + colors.close, end=' ')
        print(guess)


--------------------------------------------------
Iteration 1


2023-04-03 19:37:52.357050: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


debugging:
<class 'numpy.ndarray'>
now decoding...:
Q 166+24  T 190  [91m☒[0m 125 
debugging:
<class 'numpy.ndarray'>
now decoding...:
Q 372+236 T 608  [91m☒[0m 141 
debugging:
<class 'numpy.ndarray'>
now decoding...:
Q 584+75  T 659  [91m☒[0m 105 
debugging:
<class 'numpy.ndarray'>
now decoding...:
Q 212+6   T 218  [91m☒[0m 32  
debugging:
<class 'numpy.ndarray'>
now decoding...:
Q 86+46   T 132  [91m☒[0m 105 
debugging:
<class 'numpy.ndarray'>
now decoding...:
Q 44+854  T 898  [91m☒[0m 145 
debugging:
<class 'numpy.ndarray'>
now decoding...:
Q 469+358 T 827  [91m☒[0m 100 
debugging:
<class 'numpy.ndarray'>
now decoding...:
Q 798+8   T 806  [91m☒[0m 108 
debugging:
<class 'numpy.ndarray'>
now decoding...:
Q 369+37  T 406  [91m☒[0m 105 
debugging:
<class 'numpy.ndarray'>
now decoding...:
Q 917+0   T 917  [91m☒[0m 105 


In [None]:
# Train the model each generation and show predictions against the validation
# dataset.
for iteration in range(1, 35):
    print()
    print('-' * 50)
    print('Iteration', iteration)
    model.fit(x_train, y_train,
              batch_size=BATCH_SIZE,
              epochs=1,
              validation_data=(x_val, y_val))

    # Select 10 samples from the validation set at random so we can visualize
    # errors with green and red boxes
    for i in range(10):
        ind = np.random.randint(0, len(x_val))
        rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]
        preds = model.predict(rowx, verbose=0)
        q = ctable.decode(rowx[0])
        correct = ctable.decode(rowy[0])
        print("debugging:")
        print(type(preds[0]))
        print("now decoding...:")
        print(preds[0])
        guess = ctable.decode(preds[0] )
        print('Q', q[::-1] if REVERSE else q, end=' ')
        print('T', correct, end=' ')
        if correct == guess:
            print(colors.ok + '☑' + colors.close, end=' ')
        else:
            print(colors.fail + '☒' + colors.close, end=' ')
        print(guess)


--------------------------------------------------
Iteration 1
debugging:
<class 'numpy.ndarray'>
now decoding...:
[[7.82624556e-05 4.36326263e-05 1.33279653e-03 2.10833043e-01
  1.42211141e-02 1.01596676e-02 2.54823696e-02 9.09220800e-02
  2.21593589e-01 2.34960541e-01 1.27095595e-01 6.32772595e-02]
 [6.83165854e-05 1.54751035e-06 1.46040425e-01 6.79305270e-02
  9.47523862e-02 1.11276083e-01 1.63047329e-01 1.14250645e-01
  1.14176765e-01 7.11645037e-02 6.68049753e-02 5.04863635e-02]
 [2.49051745e-03 7.24309004e-08 1.04820728e-01 1.13754347e-01
  1.24125384e-01 1.05105110e-01 1.20043971e-01 1.02482505e-01
  8.81876722e-02 7.50904381e-02 8.57551321e-02 7.81441107e-02]
 [9.85353410e-01 1.37234997e-08 2.23254832e-03 9.16506106e-04
  2.40912172e-03 1.54318160e-03 1.42612192e-03 1.33055309e-03
  1.25429686e-03 1.24656886e-03 1.36406533e-03 9.23678454e-04]]
Q 56+778  T 834  [91m☒[0m 742 
debugging:
<class 'numpy.ndarray'>
now decoding...:
[[1.93242071e-04 2.18885307e-05 1.65513798e-03 2.2

In [None]:
# evaluate the keras model
_, accuracy = model.evaluate(x_val, y_val)
print('Accuracy: %.2f' % (accuracy*100))

Accuracy: 98.46


### Testing our model

In [None]:
x = np.random.randint(0, 100)
y = np.random.randint(0, 100)
z = x + y
print(x,y,z)
print()

x_plus_y_buffered = str(x) + '+' + str(y) + (7 - len(str(x) + '+' + str(y))) * ' '
x_plus_y_buffered

95 52 147



'95+52  '

Let's check with the results:

In [None]:
z_buffered = str(z) + (4 - len(str(z))) * ' '
z_buffered

'147 '

In [None]:
x_plus_y_encoded = ctable.encode(x_plus_y_buffered, 7)
x_plus_y_encoded

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [None]:
ctable.decode(x_plus_y_encoded)

'95+52  '

In [None]:
z_encoded = ctable.encode(z_buffered, 4)
z_encoded

array([[0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [None]:
ctable.decode(z_encoded)

'147 '

Converting to boolean as we trained our RNN in that way

In [None]:
x_plus_y = str(x) + '+' + str(y)
x_plus_y_tf = np.zeros((len(x_plus_y_buffered), len(chars)), dtype=bool)
x_plus_y_tf

array([[False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False]])

In [None]:
list(enumerate(x_plus_y_tf))

[(0,
  array([False, False, False, False, False, False, False, False, False,
         False, False, False])),
 (1,
  array([False, False, False, False, False, False, False, False, False,
         False, False, False])),
 (2,
  array([False, False, False, False, False, False, False, False, False,
         False, False, False])),
 (3,
  array([False, False, False, False, False, False, False, False, False,
         False, False, False])),
 (4,
  array([False, False, False, False, False, False, False, False, False,
         False, False, False])),
 (5,
  array([False, False, False, False, False, False, False, False, False,
         False, False, False])),
 (6,
  array([False, False, False, False, False, False, False, False, False,
         False, False, False]))]

In [None]:
ctable.encode('2', 1)

array([[0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.]])

In [None]:
for i, sentence in enumerate(x_plus_y_tf):
    x_plus_y_tf[i] = ctable.encode(x_plus_y_buffered[i], 1)
x_plus_y_tf

array([[False, False, False, False, False, False, False, False, False,
        False, False,  True],
       [False, False, False, False, False, False, False,  True, False,
        False, False, False],
       [False,  True, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False,  True, False,
        False, False, False],
       [False, False, False, False,  True, False, False, False, False,
        False, False, False],
       [ True, False, False, False, False, False, False, False, False,
        False, False, False],
       [ True, False, False, False, False, False, False, False, False,
        False, False, False]])

In [None]:
z_tf = np.zeros((len(z_buffered), len(chars)), dtype=bool)
z_tf

array([[False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False]])

In [None]:
for i, sentence in enumerate(z_tf):
    z_tf[i] = ctable.encode(z_buffered[i], 1)
z_tf

array([[False, False, False,  True, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False,  True, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
         True, False, False],
       [ True, False, False, False, False, False, False, False, False,
        False, False, False]])

**Forward step** through our RNN. `Keras`' `predict_classes()` API takes in an *array* of inputs, so we arbitrarily double up our input:

In [None]:
x_plus_y_tf2 = np.array((x_plus_y_tf, x_plus_y_tf), dtype=bool)
x_plus_y_tf2

array([[[False, False, False, False, False, False, False, False, False,
         False, False,  True],
        [False, False, False, False, False, False, False,  True, False,
         False, False, False],
        [False,  True, False, False, False, False, False, False, False,
         False, False, False],
        [False, False, False, False, False, False, False,  True, False,
         False, False, False],
        [False, False, False, False,  True, False, False, False, False,
         False, False, False],
        [ True, False, False, False, False, False, False, False, False,
         False, False, False],
        [ True, False, False, False, False, False, False, False, False,
         False, False, False]],

       [[False, False, False, False, False, False, False, False, False,
         False, False,  True],
        [False, False, False, False, False, False, False,  True, False,
         False, False, False],
        [False,  True, False, False, False, False, False, False, False,

In [None]:
predict_x=model.predict(x_plus_y_tf2)



In [None]:
ctable.decode(predict_x[0])

'147 '

## 4. Training a Dense network to add two numbers

Training the network to add 2 numbers as Strings

In [None]:
print('Vectorization into thought:')
x = np.zeros((len(questions), MAXLEN, len(chars)), dtype=bool)
y = np.zeros((len(questions), DIGITS + 1, len(chars)), dtype=bool)
for i, sentence in enumerate(questions):
    x[i] = ctable.encode(sentence, MAXLEN)
for i, sentence in enumerate(expected):
    y[i] = ctable.encode(sentence, DIGITS + 1)

# Shuffle (x, y) in unison as the later parts of x will almost all be larger digits.
indices = np.arange(len(y))
np.random.shuffle(indices)
x = x[indices]
y = y[indices]

# Explicitly set apart 10% for validation data that we never train over.
split_at = len(x) - len(x) // 10
(x_train, x_val) = x[:split_at], x[split_at:]
(y_train, y_val) = y[:split_at], y[split_at:]

print('Training Data:')
print(x_train.shape)
print(y_train.shape)
print()

print('Validation Data:')
print(x_val.shape)
print(y_val.shape)
print()

print('Example:')
print('The first row of input data is encoded internally as:')
print(x_train[0])
print()
print('The first row of output data is encoded internally as:')
print(y_train[0])
print()
print('These internal representations represent these signals:')
print(ctable.decode(x_train[0]))
print(ctable.decode(y_train[0]))

Vectorization into thought:
Training Data:
(45000, 7, 12)
(45000, 4, 12)

Validation Data:
(5000, 7, 12)
(5000, 4, 12)

Example:
The first row of input data is encoded internally as:
[[False False False  True False False False False False False False False]
 [False False False False False False False False False  True False False]
 [False False  True False False False False False False False False False]
 [False  True False False False False False False False False False False]
 [False False False False False False False False  True False False False]
 [False False False False False False False False False False False  True]
 [False False  True False False False False False False False False False]]

The first row of output data is encoded internally as:
[[False False False False False False False False False False  True False]
 [False False False False False False False False  True False False False]
 [False False  True False False False False False False False False False]
 [ True Fa

In [None]:
x_train[0]

array([[False, False, False, False, False, False, False, False, False,
        False, False,  True],
       [False, False, False, False, False, False, False, False, False,
         True, False, False],
       [False, False, False, False, False, False,  True, False, False,
        False, False, False],
       [False,  True, False, False, False, False, False, False, False,
        False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False,  True],
       [ True, False, False, False, False, False, False, False, False,
        False, False, False],
       [ True, False, False, False, False, False, False, False, False,
        False, False, False]])

In [None]:
x_train.shape

(45000, 7, 12)

In [None]:
y_train.shape

(45000, 4, 12)

## Creating the model


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Reshape

# Define the model
model = Sequential()
model.add(Reshape(target_shape=(84,), input_shape=(7, 12)))
model.add(Dense(128, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(48, activation='relu'))
model.add(Dense(4 * 12, activation='sigmoid'))
model.add(Reshape(target_shape=(4, 12)))

## Training the Model

In [None]:
# Compile the model
model.compile(optimizer="adam", loss='binary_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(x_train, y_train, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2be86f790>

In [None]:
model.save('addition_model_string.h5')
from tensorflow.keras.models import load_model

# Load the model
loaded_model = load_model('addition_model_string.h5')

print("Model Loaded")


Model Loaded


## Testing our model
Next, we use the trained model to predict the output for some test data. We create a sample problem, encode it and then use the model's predict method to generate the predictions.

In [None]:
example = "150+150"
enc_example = ctable.encode(example, 7).astype(bool)
enc_example

# Generate a random test sample
x_test = enc_example.reshape(1, 7, 12)

# Make a prediction
y_pred = loaded_model.predict(x_test)

# Print the input and output shapes
print("Input shape:", x_test.shape)
print("Output shape:", y_pred.shape)

# Print the input and output values
print("Decoded Input", ctable.decode(x_test[0]))
print("Decoded Output", ctable.decode(y_pred[0]))

Input shape: (1, 7, 12)
Output shape: (1, 4, 12)
Decoded Input 150+150
Decoded Output 379 


## Training network to add 2 numbers as Integers

In the following code, we generate some training data using numpy and then create a model using Keras' Sequential API. The model is then compiled and trained on the generated training data.

## Generating Training data

In [None]:
import numpy as np

# Generate training data
x_train = np.random.rand(10000, 2) * 100
y_train = x_train[:,0] + x_train[:,1]

print("x_train shape: ", x_train.shape)
print("y_train shape: ", y_train.shape)


x_train shape:  (10000, 2)
y_train shape:  (10000,)


We use numpy to generate a 2D array of 10,000 rows and 2 columns, where each element is a random float between 0 and 100. We then create a 1D array by summing the elements of the first and second column of the x_train array. We also print the shapes of both arrays for verification.



## Creating the model

In [None]:
from keras.models import Sequential
from keras.layers import Dense

# Create the model
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(2,)))
model.add(Dense(1))

print("Model Summary: ")
model.summary()


Model Summary: 
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_2 (Dense)             (None, 64)                192       
                                                                 
 dense_3 (Dense)             (None, 1)                 65        
                                                                 
Total params: 257
Trainable params: 257
Non-trainable params: 0
_________________________________________________________________


In this block, we create a model using the Keras Sequential API. We add a Dense layer with 64 units and a ReLU activation function, followed by another Dense layer with 1 unit. We also print a summary of the model's architecture.



## Compiling and training the model

In [None]:
# Compile the model
model.compile(loss='mse', optimizer='adam')

# Train the model
model.fit(x_train, y_train, epochs=20, batch_size=32, validation_split=0.2)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x2beacdf10>

In [None]:
model.save('addition_model_integer.h5')
from tensorflow.keras.models import load_model

# Load the model
loaded_model = load_model('addition_model_integer.h5')

print("Model Loaded")


Model Loaded


## Testing our model

Next, we use the trained model to predict the output for some test data. We create a 2D array of test data using numpy and then use the model's predict method to generate the predictions.



In [None]:
import numpy as np

# Generate test data
x_test = np.array([[4, 5], [10, 20], [0, 1]])
y_test = loaded_model.predict(x_test)

print("Test Data: \n", x_test)
print("Predictions: \n", np.round(y_test))


Test Data: 
 [[ 4  5]
 [10 20]
 [ 0  1]]
Predictions: 
 [[ 9.]
 [30.]
 [ 1.]]


## 5. Difference between Dense and Recurrent Networks


* RNNs have recurrent connections that allow information to be passed from one time step to the next, whereas DNNs have only feedforward connections.

* RNNs have a memory that enables them to remember previous inputs and use them to predict future outputs, while DNNs do not have a memory and can only consider the input data in isolation.

* Input and output dimensions: RNNs can accept inputs of varying length and produce outputs of varying length, whereas DNNs require fixed-size input and output dimensions.

* Training RNNs can be slower than DNNs because of their sequential nature and the need to backpropagate errors through time.

* Application: RNNs are well-suited for sequence modeling tasks such as language modeling, speech recognition, and video analysis, while DNNs are often used for image classification, object detection, and other computer vision tasks that do not involve sequential data.