# Spelling correction using LSTMs

This is my undergraduate final project.
We'll try and train a RNN for spelling correction of search queries.

## 1. Importing relevant data and libraries.

In this section we will import the necessary libraries and the datasets and preprocess them for the model to train on.

### 1.1. Importing the libraries

In [44]:
import pandas as pd
import numpy as np
from sc_utils import preprocess_data, string_to_int, softmax
from keras.layers import Bidirectional, Concatenate, Permute, Dot, Input, LSTM, Multiply
from keras.layers import RepeatVector, Dense, Activation, Lambda
from keras.optimizers import Adam
from keras.utils import to_categorical
from keras.models import load_model, Model
import keras.backend as K
import tensorflow as tf

import matplotlib.pyplot as plt
%matplotlib inline

### 1.2. Importing and preprocessing the datasets

#### 1.2.1. faspell
FASpell dataset was developed for the evaluation of spell checking algorithms. It contains a set of pairs of misspelled Persian words and their corresponding corrected forms similar to the ASpell dataset used for English.

In [45]:
# import faspell_main
data_faspell = pd.read_csv('data/faspell_main.txt', sep='\t')
data_faspell.head()


Unnamed: 0,#misspelt,corrected,error-category
0,آاهي,آگاهي,1
1,آبات,آیات,1
2,آبباشد,آب باشد,2
3,آبد,آید,1
4,آبری,عابری,0


In [46]:
data_faspell.drop('error-category', axis = 1, inplace=True)
data_faspell.rename({'#misspelt':'misspelt'}, axis = 1, inplace=True)

In [47]:
data_faspell.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4858 entries, 0 to 4857
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   misspelt   4858 non-null   object
 1   corrected  4858 non-null   object
dtypes: object(2)
memory usage: 76.0+ KB


In [48]:
data_faspell.head()

Unnamed: 0,misspelt,corrected
0,آاهي,آگاهي
1,آبات,آیات
2,آبباشد,آب باشد
3,آبد,آید
4,آبری,عابری


#### 1.2.2 context sensitive

This is a real-world test set for errors and context sensitive spelling errors for Persian language. This test set contains 1100 context sensitive errors

In [49]:
data_context = pd.read_csv('data/context_sensitive.txt', header = 0, 
                            names = ['corrected', 'misspelt', 'error_cat', 'sentence', 'nan'], 
                            sep = '\t').drop('nan', axis=1)
data_context.drop('error_cat', inplace=True, axis=1)
data_context.head()

Unnamed: 0,corrected,misspelt,sentence
0,فرنگي,فرهنگي,كاهش قيمت گوجه فرنگي و خيار در ميادين ميوه و ت...
1,فرنگي,فرهنگي,خوردن گوجه فرهنگي كه حرام است !
2,فرنگي,فرهنگي,گوجه فرهنگي
3,فرنگي,فرهنگي,سيب زميني 58000 كيلو ، گوجه فرهنگي 18000 كيلو
4,فرنگي,فرهنگي,اجازه فروش كارخانه كشمش ، رب گوجه فرهنگي تاكست...


we will now make the dataset ready for our use

In [50]:

data_context['corrected'] = data_context.apply(lambda x: x['sentence'].replace(x.misspelt, x.corrected), axis = 1)
data_context['misspelt'] = data_context['sentence']
data_context.drop('sentence', inplace=True, axis=1)
data_context.head()

Unnamed: 0,corrected,misspelt
0,كاهش قيمت گوجه فرنگي و خيار در ميادين ميوه و ت...,كاهش قيمت گوجه فرنگي و خيار در ميادين ميوه و ت...
1,خوردن گوجه فرنگي كه حرام است !,خوردن گوجه فرهنگي كه حرام است !
2,گوجه فرنگي,گوجه فرهنگي
3,سيب زميني 58000 كيلو ، گوجه فرنگي 18000 كيلو,سيب زميني 58000 كيلو ، گوجه فرهنگي 18000 كيلو
4,اجازه فروش كارخانه كشمش ، رب گوجه فرنگي تاكستا...,اجازه فروش كارخانه كشمش ، رب گوجه فرهنگي تاكست...


### 1.2.2 synthetic
A comprehensive parallel dataset designed for the task of spell checking in Persian. Misspelled sentences together with the correct form are produced using a massive confusion matrix, which is gathered from many sources.

In [51]:
synthetic = pd.DataFrame(columns = ['correct','misspelt'])
synthetic['corrected'] = pd.read_csv('data/correct_synthetic.txt')
synthetic['misspelt'] = pd.read_csv('data/wrong_synthetic.txt')

synthetic.head()

### 1.3 Combining the datasets

##  2. Preprocess the data

In this step we will ...

In [None]:
# create vocabulary dictionary
# map each persian dictionary to a unique number

vocab = {' ':0 , 'آ':1 , 'ا':2 , 'ب':3 , 'پ':4 , 'ت':5 , 'ث':6 , 'ج':7 , 'چ':8 , 'ح':9 , 'خ':10 ,
 'د':11 , 'ذ':12 , 'ر':13 , 'ز':14 , 'ژ':15 , 'س':16 , 'ش':17 , 'ص':18 , 'ض':19 , 'ط':20 , 'ظ':21 , 
 'ع':22 , 'غ':23 , 'ف':24 , 'ق':25 , 'ک':26 , 'گ':27 , 'ل':28 , 'م':29 , 'ن':30 , 'و':31 , 'ه':32 , 
 'ی':33, '<unk>':34, '<pad>':35}

# create inverse vocabulary dictionary
# map each number to its corresponding index in dictionary

inv_vocab = {0:' ' , 1:'آ' , 2:'ا' , 3:'ب' , 4:'پ' , 5:'ت' , 6:'ث' , 7:'ج' , 8:'چ' , 9:'ح' , 10:'خ' ,
 11:'د' , 12:'ذ' , 13:'ر' , 14:'ز', 15:'ژ' , 16:'س' , 17:'ش' , 18:'ص' , 19:'ض' , 20:'ط' , 21:'ظ' , 
 22:'ع' , 23:'غ' , 24:'ف' , 25:'ق' , 26:'ک' , 27:'گ' , 28:'ل' , 29:'م' , 30:'ن' , 31:'و' , 32:'ه' , 
 33:'ی', 34:'<unk>', 35:'<pad>'}

* We will set T=50
    * We assume T is the maximum length of the query.
    * If we get a longer input, we would have to truncate it.

In [None]:
T = 50
X, Y, Xoh, Yoh = preprocess_data(list(synthetic.itertuples(index=False, name=None)), vocab, T)
m = X.shape[0]
print("X.shape:", X.shape)
print("Y.shape:", Y.shape)
print("Xoh.shape:", Xoh.shape)
print("Yoh.shape:", Yoh.shape)

KeyboardInterrupt: 

We now have:

* `X`: a processed version of the human queries in the training set.
    - Each character in X is replaced by an index (integer) mapped to the character using `vocab`.
    - Each date is padded to ensure a length of `T` using a special character (< pad >).
    - `X.shape = (m, T)` where m is the number of training examples in a batch.
    
* `Y`: a processed version of the machine readable dates in the training set.
    - Each character is replaced by the index (integer) it is mapped to in `vocab`.
    - `Y.shape = (m, T)`.
* `Xoh`: one-hot version of `X`
    - Each index in X is converted to the one-hot representation (if the index is 2, the one-hot version has the index position 2 set to 1, and the remaining positions are 0.
    - `Xoh.shape = (m, T, len(vocab))`.
* `Yoh`: one-hot version of `Y`
    - Each index in `Y` is converted to the one-hot representation.
    - `Yoh.shape = (m, T, len(vocab))`.


* Let's also look at an example of preprocessed training examples.

In [None]:
index = 0
# print("Source query:", dataset[index][0])
# print("Target query:", dataset[index][1])
print()
print("Source after preprocessing (indices):", X[index])
print("Target after preprocessing (indices):", Y[index])
print()
print("Source after preprocessing (one-hot):", Xoh[index])
print("Target after preprocessing (one-hot):", Yoh[index])




NameError: name 'X' is not defined

## 3. Spelling correction using LSTMs with Attention

* If you had to translate a book's paragraph from French to English, you would not read the whole paragraph, then close the book and translate. 
* Even during the translation process, you would read/re-read and focus on the parts of the French paragraph corresponding to the parts of the English you are writing down. 
* The attention mechanism tells a Neural Machine Translation model where it should pay attention to at any step. 

### 3.1 - Attention Mechanism

In this part, we will implement the attention mechanism presented in the lecture videos. 
* Here is a figure to remind you how the model works. 
    * The diagram on the left shows the attention model. 
    * The diagram on the right shows what one "attention" step does to calculate the attention variables $\alpha^{\langle t, t' \rangle}$.
    * The attention variables $\alpha^{\langle t, t' \rangle}$ are used to compute the context variable $context^{\langle t \rangle}$ for each timestep in the output ($t=1, \ldots, T_y$). 

<table>
<td> 
<img src="images/attn_model.png" style="width:500;height:500px;"> <br>
</td> 
<td> 
<img src="images/attn_mechanism.png" style="width:500;height:500px;"> <br>
</td> 
</table>
<caption><center> **Figure 1**: Neural machine translation with attention</center></caption>

Here are some properties of the model that you may notice: 

#### Pre-attention and Post-attention LSTMs on both sides of the attention mechanism
- There are two separate LSTMs in this model (see diagram on the left): pre-attention and post-attention LSTMs.
- *Pre-attention* Bi-LSTM is the one at the bottom of the picture is a Bi-directional LSTM and comes *before* the attention mechanism.
    - The attention mechanism is shown in the middle of the left-hand diagram.
    - The pre-attention Bi-LSTM goes through $T_x$ time steps
- *Post-attention* LSTM: at the top of the diagram comes *after* the attention mechanism. 
    - The post-attention LSTM goes through $T_y$ time steps. 

- The post-attention LSTM passes the hidden state $s^{\langle t \rangle}$ and cell state $c^{\langle t \rangle}$ from one time step to the next. 

#### An LSTM has both a hidden state and cell state
* In the lecture videos, we were using only a basic RNN for the post-attention sequence model
    * This means that the state captured by the RNN was outputting only the hidden state $s^{\langle t\rangle}$. 
* In this assignment, we are using an LSTM instead of a basic RNN.
    * So the LSTM has both the hidden state $s^{\langle t\rangle}$ and the cell state $c^{\langle t\rangle}$. 

#### Each time step does not use predictions from the previous time step
* The post-attention LSTM at time $t$ does not take the previous time step's prediction $y^{\langle t-1 \rangle}$ as input.
* The post-attention LSTM at time 't' only takes the hidden state $s^{\langle t\rangle}$ and cell state $c^{\langle t\rangle}$ as input. 

#### Concatenation of hidden states from the forward and backward pre-attention LSTMs
- $\overrightarrow{a}^{\langle t \rangle}$: hidden state of the forward-direction, pre-attention LSTM.
- $\overleftarrow{a}^{\langle t \rangle}$: hidden state of the backward-direction, pre-attention LSTM.
- $a^{\langle t \rangle} = [\overrightarrow{a}^{\langle t \rangle}, \overleftarrow{a}^{\langle t \rangle}]$: the concatenation of the activations of both the forward-direction $\overrightarrow{a}^{\langle t \rangle}$ and backward-directions $\overleftarrow{a}^{\langle t \rangle}$ of the pre-attention Bi-LSTM. 

#### Computing "energies" $e^{\langle t, t' \rangle}$ as a function of $s^{\langle t-1 \rangle}$ and $a^{\langle t' \rangle}$
- "e" is called the "energies" variable.
- $s^{\langle t-1 \rangle}$ is the hidden state of the post-attention LSTM
- $a^{\langle t' \rangle}$ is the hidden state of the pre-attention LSTM.
- $s^{\langle t-1 \rangle}$ and $a^{\langle t \rangle}$ are fed into a simple neural network, which learns the function to output $e^{\langle t, t' \rangle}$.
- $e^{\langle t, t' \rangle}$ is then used when computing the attention $a^{\langle t, t' \rangle}$ that $y^{\langle t \rangle}$ should pay to $a^{\langle t' \rangle}$.

- The diagram on the right of figure 1 uses a `RepeatVector` node to copy $s^{\langle t-1 \rangle}$'s value $T_x$ times.
- Then it uses `Concatenation` to concatenate $s^{\langle t-1 \rangle}$ and $a^{\langle t \rangle}$.
- The concatenation of $s^{\langle t-1 \rangle}$ and $a^{\langle t \rangle}$ is fed into a "Dense" layer, which computes $e^{\langle t, t' \rangle}$. 
- $e^{\langle t, t' \rangle}$ is then passed through a softmax to compute $\alpha^{\langle t, t' \rangle}$.
- Note that the diagram doesn't explicitly show variable $e^{\langle t, t' \rangle}$, but $e^{\langle t, t' \rangle}$ is above the Dense layer and below the Softmax layer in the diagram in the right half of figure 1.
- We'll explain how to use `RepeatVector` and `Concatenation` in Keras below. 

#### Implementation Details
   
Let's implement this neural translator. We will start by implementing two functions: `one_step_attention()` and `model()`.

#### one_step_attention
* The inputs to the one_step_attention at time step $t$ are:
    - $[a^{<1>},a^{<2>}, ..., a^{<T_x>}]$: all hidden states of the pre-attention Bi-LSTM.
    - $s^{<t-1>}$: the previous hidden state of the post-attention LSTM 
* one_step_attention computes:
    - $[\alpha^{<t,1>},\alpha^{<t,2>}, ..., \alpha^{<t,T_x>}]$: the attention weights
    - $context^{ \langle t \rangle }$: the context vector:
    
$$context^{<t>} = \sum_{t' = 1}^{T_x} \alpha^{<t,t'>}a^{<t'>}\tag{1}$$ 

##### Clarifying 'context' and 'c'
- In the project, we are calling the context $context^{\langle t \rangle}$.
    - This is to avoid confusion with the post-attention LSTM's internal memory cell variable, which is also denoted $c^{\langle t \rangle}$.

Implement `one_step_attention()`. 

* The function `model()` will call the layers in `one_step_attention()` $T_y$ times using a for-loop.
* It is important that all $T_y$ copies have the same weights. 
    * It should not reinitialize the weights every time. 
    * In other words, all $T_y$ steps should have shared weights. 

In [None]:
# Defined shared layers as global variables
repeator = RepeatVector(T)
concatenator = Concatenate(axis=-1)
densor1 = Dense(10, activation = "tanh")
densor2 = Dense(1, activation = "relu")
activator = Activation(softmax, name='attention_weights') # We are using a custom softmax(axis = 1) loaded in this notebook
dotor = Dot(axes = 1)

In [None]:
def one_step_attention(a, s_prev):
    """
    Performs one step of attention: Outputs a context vector computed as a dot product of the attention weights
    "alphas" and the hidden states "a" of the Bi-LSTM.
    
    Arguments:
    a -- hidden state output of the Bi-LSTM, numpy-array of shape (m, Tx, 2*n_a)
    s_prev -- previous hidden state of the (post-attention) LSTM, numpy-array of shape (m, n_s)
    
    Returns:
    context -- context vector, input of the next (post-attention) LSTM cell
    """
    
    # Use repeator to repeat s_prev to be of shape (m, Tx, n_s) so that 
    # you can concatenate it with all hidden states "a" (≈ 1 line)
    s_prev = repeator(s_prev)
    # Use concatenator to concatenate a and s_prev on the last axis 
    # For grading purposes, please list 'a' first and 's_prev' second, in this order.
    concat = concatenator([a,s_prev])
    # Use densor1 to propagate concat through a small fully-connected neural 
    # network to compute the "intermediate energies" variable e.
    e = densor1(concat)
    # Use densor2 to propagate e through a small fully-connected 
    # neural network to compute the "energies" variable energies. 
    energies = densor2(e)
    # Use "activator" on "energies" to compute the attention weights "alphas"
    alphas = activator(energies)
    # Use dotor together with "alphas" and "a", in this order, 
    # to compute the context vector to be given to the next (post-attention) LSTM-cell (≈ 1 line)
    context = dotor([alphas,a])
    
    return context

modelf

Implement `modelf()` as explained in figure 1:

* `modelf` first runs the input through a Bi-LSTM to get $[a^{<1>},a^{<2>}, ..., a^{<T_x>}]$. 
* Then, `modelf` calls `one_step_attention()` $T_y$ times using a `for` loop.  At each iteration of this loop:
    - It gives the computed context vector $context^{<t>}$ to the post-attention LSTM.
    - It runs the output of the post-attention LSTM through a dense layer with softmax activation.
    - The softmax generates a prediction $\hat{y}^{<t>}$.
    
Again, we have defined global layers that will share weights to be used in `modelf()`.

In [None]:
n_a = 64 # number of units for the pre-attention, bi-directional LSTM's hidden state 'a'
n_s = 128 # number of units for the post-attention LSTM's hidden state "s"

# Please note, this is the post attention LSTM cell.  
post_activation_LSTM_cell = LSTM(n_s, return_state = True) # Please do not modify this global variable.
output_layer = Dense(len(vocab), activation=softmax)

Now you can use these layers $T_y$ times in a `for` loop to generate the outputs, and their parameters will not be reinitialized. You will have to carry out the following steps: 

1. Propagate the input `X` into a bi-directional LSTM.
    * [Bidirectional](https://keras.io/layers/wrappers/#bidirectional) 
    * [LSTM](https://keras.io/layers/recurrent/#lstm)
    * Remember that we want the LSTM to return a full sequence instead of just the last hidden state.  
    
Sample code:

```Python
sequence_of_hidden_states = Bidirectional(LSTM(units=..., return_sequences=...))(the_input_X)
```
    
2. Iterate for $t = 0, \cdots, T_y-1$: 
    1. Call `one_step_attention()`, passing in the sequence of hidden states $[a^{\langle 1 \rangle},a^{\langle 2 \rangle}, ..., a^{ \langle T_x \rangle}]$ from the pre-attention bi-directional LSTM, and the previous hidden state $s^{<t-1>}$ from the post-attention LSTM to calculate the context vector $context^{<t>}$.
    2. Give $context^{<t>}$ to the post-attention LSTM cell. 
        - Remember to pass in the previous hidden-state $s^{\langle t-1\rangle}$ and cell-states $c^{\langle t-1\rangle}$ of this LSTM 
        * This outputs the new hidden state $s^{<t>}$ and the new cell state $c^{<t>}$.  

        Sample code:
        ```Python
        next_hidden_state, _ , next_cell_state = 
            post_activation_LSTM_cell(inputs=..., initial_state=[prev_hidden_state, prev_cell_state])
        ```   
        Please note that the layer is actually the "post attention LSTM cell".  For the purposes of passing the automatic grader, please do not modify the naming of this global variable.  This will be fixed when we deploy updates to the automatic grader.
    3. Apply a dense, softmax layer to $s^{<t>}$, get the output.  
        Sample code:
        ```Python
        output = output_layer(inputs=...)
        ```
    4. Save the output by adding it to the list of outputs.

3. Create your Keras model instance.
    * It should have three inputs:
        * `X`, the one-hot encoded inputs to the model, of shape ($T_{x}, humanVocabSize)$
        * $s^{\langle 0 \rangle}$, the initial hidden state of the post-attention LSTM
        * $c^{\langle 0 \rangle}$, the initial cell state of the post-attention LSTM
    * The output is the list of outputs.  
    Sample code
    ```Python
    model = Model(inputs=[...,...,...], outputs=...)
    ```

In [None]:

def modelf(Tx, Ty, n_a, n_s, human_vocab_size, machine_vocab_size):
    """
    Arguments:
    Tx -- length of the input sequence
    Ty -- length of the output sequence
    n_a -- hidden state size of the Bi-LSTM
    n_s -- hidden state size of the post-attention LSTM
    human_vocab_size -- size of the python dictionary "human_vocab"
    machine_vocab_size -- size of the python dictionary "machine_vocab"

    Returns:
    model -- Keras model instance
    """
    
    # Define the inputs of your model with a shape (Tx,)
    # Define s0 (initial hidden state) and c0 (initial cell state)
    # for the decoder LSTM with shape (n_s,)
    X = Input(shape=(Tx, human_vocab_size))
    s0 = Input(shape=(n_s,), name='s0')
    c0 = Input(shape=(n_s,), name='c0')
    s = s0
    c = c0
    
    # Initialize empty list of outputs
    outputs = []
    
    #Define our pre-attention Bi-LSTM.
    a = Bidirectional(LSTM(n_a, return_sequences=True))(X)
    
    # Iterate for Ty steps
    for t in range(Ty):

        # mechanism to get back the context vector at step t 
        context = one_step_attention(a, s)
        
        # Don't forget to pass: initial_state = [hidden state, cell state]
        s, _, c = post_activation_LSTM_cell(context,initial_state=[s, c])

        # output of the post-attention LSTM
        out = output_layer(s)
        
        outputs.append(out)

    # inputs and returning the list of outputs.
    model = Model(inputs=[X, s0, c0],outputs=outputs)
    
    return model

In [None]:

model = modelf(T, T, n_a, n_s, len(vocab), len(vocab))

Let's get a summary of the model to check if it matches the expected output.

In [None]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 20, 36)]     0           []                               
                                                                                                  
 s0 (InputLayer)                [(None, 64)]         0           []                               
                                                                                                  
 bidirectional (Bidirectional)  (None, 20, 64)       17664       ['input_1[0][0]']                
                                                                                                  
 repeat_vector (RepeatVector)   (None, 20, 64)       0           ['s0[0][0]',                     
                                                                  'lstm[0][0]',               

#### Compile the Model

* After creating your model in Keras, you need to compile it and define the loss function, optimizer and metrics you want to use. 
    * Loss function: 'categorical_crossentropy'.
    * Optimizer: [Adam](https://keras.io/optimizers/#adam) [optimizer](https://keras.io/optimizers/#usage-of-optimizers)
        - learning rate = 0.005 
        - $\beta_1 = 0.9$
        - $\beta_2 = 0.999$
        - decay = 0.01  
    * metric: 'accuracy'

In [None]:
opt = Adam(lr=0.005, beta_1=0.9, beta_2=0.999, decay=0.01)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

  super().__init__(name, **kwargs)


#### Define inputs and outputs, and fit the model
The last step is to define all our inputs and outputs to fit the model:
- You need to create `s0` and `c0` to initialize your `post_attention_LSTM_cell` with zeros.
    - The list `outputs[i][0], ..., outputs[i][Ty]` represents the true labels (characters) corresponding to the $i^{th}$ training example (`X[i]`). 
    - `outputs[i][j]` is the true label of the $j^{th}$ character in the $i^{th}$ training example.

In [None]:
s0 = np.zeros((m, n_s))
c0 = np.zeros((m, n_s))
outputs = list(Yoh.swapaxes(0,1))

Let's now fit the model and run it.

In [None]:
model.fit([Xoh, s0, c0], outputs, epochs=100, batch_size=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x2033188f760>

We can now see the results on new examples.

In [None]:
EXAMPLES = ['فرنگی', 'آذار', 'کتب دینی', 'نوسابه', 'آذادگی', 'آغار']
s00 = np.zeros((1, n_s))
c00 = np.zeros((1, n_s))
for example in EXAMPLES:
    source = string_to_int(example, T, vocab)
    source = np.array(list(map(lambda x: to_categorical(x, num_classes=len(vocab)), source))).swapaxes(0,1)
    source = np.swapaxes(source, 0, 1)
    source = np.expand_dims(source, axis=0)

    prediction = model.predict([source, s00, c00])
    prediction = np.argmax(prediction, axis = -1)
    output = [inv_vocab[int(i)] for i in prediction if inv_vocab[int(i)] != '<pad>']
    print("source:", example)
    print("output:", ''.join(output),"\n")

source: فرنگی
output: فرنگی 

source: آذار
output: آرار 

source: کتب دینی
output: گتیییی 

source: نوسابه
output: نوساهه 

source: آذادگی
output: ددادیی 

source: آغار
output: آرار 

