In [0]:
import tensorflow as tf
tf.compat.v1.enable_eager_execution()
from tensorflow.keras.layers import TimeDistributed
tf.keras.backend.clear_session()
from tensorflow.keras.layers import Input, Softmax, RNN, Dense, Embedding, LSTM
from tensorflow.keras.models import Model
import numpy as np

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [0]:
tf.executing_eagerly()

True

# 1. Writing a custom layer

before we write custom layers in tensorflow lets see the definition of <b>Layers</b> class

<a href='https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer'> tf.keras.layers.Layers</a>

From the tf documentation
<pre>
This is the class from which all layers inherit.

A layer is a class implementing common neural networks operations, such as convolution, batch norm, etc. These operations require managing weights, losses, updates, and inter-layer connectivity.

Users will just instantiate a layer and then treat it as a callable.

We recommend that descendants of Layer implement the following methods:

+-------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   |
|<strong> <font color='green'>def __init__(self, trainable=True, name=None, dtype=None, dynamic=False, **kwargs):</font>                               |
+</strong>-------------------------------------------------------------------------------------------------------------------+                                                                                                                 
|                                                                                                                   |
|* the properties should be set by the user via keyword arguments.                                                  |
|                                                                                                                   |
|* note that 'dtype', 'input_shape' and 'batch_input_shape' are only applicable to input layers, do not pass these  |
|  keywords to non-input layers.                                                                                    |
+-------------------------------------------------------------------------------------------------------------------+
|* allowed_kwargs = {'input_shape', 'batch_input_shape', 'batch_size', 'weights', 'activity_regularizer','autocast'}|
+-------------------------------------------------------------------------------------------------------------------+


+-------------------------------------------------------------------------------------------------------------------+
|<strong> <font color='green'>def build(self, input_shape)</font></strong>:                                                                                     |                                                                                       +-------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   |
| * Creates the variables of the layer (optional, for subclass implementers). This is a method that implementers of |
|   subclasses of `Layer` or `Model`                                                                                |
|                                                                                                                   |
| * You can override if you need a state-creation step in-between <em><font color='blue'>layer instantiation</font></em> and <em><font color='blue'>layer call</font></em>.               |
|                                                                                                                   |
| * This is typically used to create the weights of `Layer` subclasses.                                             |
+-------------------------------------------------------------------------------------------------------------------+
| Arguments:                                                                                                        |
|    input_shape:                                                                                                   |
|    Instance of `TensorShape`, or list of instances of `TensorShape` if the layer expects a list of inputs         |
+-------------------------------------------------------------------------------------------------------------------+

+-------------------------------------------------------------------------------------------------------------------+
| <strong> <font color='green'>def call(self, inputs, **kwargs)</font></strong>:                                                                                |
+-------------------------------------------------------------------------------------------------------------------+
| * This is where the layer's logic lives.                                                                          |
+-------------------------------------------------------------------------------------------------------------------+
|* Arguments:                                                                                                       |
|        inputs: Input tensor, or list/tuple of input tensors.                                                      |
|        **kwargs: Additional keyword arguments.                                                                    |
+-------------------------------------------------------------------------------------------------------------------+
|* Returns:                                                                                                         |
|        A tensor or list/tuple of tensors.                                                                         |
+-------------------------------------------------------------------------------------------------------------------+
    
<a href='https://github.com/tensorflow/tensorflow/blob/r2.1/tensorflow/python/keras/engine/base_layer.py#L310'>check for more arguments</a>                               
+-------------------------------------------------------------------------------------------------------------------+
|<strong> <font color='green'>def add_weight(self,name=None, shape=None, ..., **kwargs)</font></strong>:                                                        |
+-------------------------------------------------------------------------------------------------------------------+
|* Adds a new variable to the layer.                                                                                |
+-------------------------------------------------------------------------------------------------------------------+
|* Arguments:                                                                                                       |
|        name : Variable name.                                                                                      |
|        shape: Variable shape. Defaults to scalar if unspecified.                                                  |
|        dtype: The type of the variable. Defaults to `self.dtype` or `float32`.                                    |
|        ...                                                                                                        |
+-------------------------------------------------------------------------------------------------------------------+
|* Returns:                                                                                                         |
|        The created variable. Usually either a `Variable` or `ResourceVariable` instance.                          |
+-------------------------------------------------------------------------------------------------------------------+
...
there are other functions also availabel, please check this link for better understanding of it
<a href='https://github.com/tensorflow/tensorflow/blob/r2.1/tensorflow/python/keras/engine/base_layer.py'>base_layer.py</a>

</pre>

## 1.1 Example
super(): https://stackoverflow.com/a/27134600/4084039
<img src='https://i.imgur.com/1a8N7gH.png' width=600>

## 1.2 Resources
Do read this blog for more information: https://www.tensorflow.org/guide/keras/custom_layers_and_models
few screenshots from the above blog

1.
<img src='https://i.imgur.com/SDNQgos.png' width=600>
2.
<img src='https://i.imgur.com/syqjpux.png' width=600>
3. 
<img src='https://i.imgur.com/PfmYWno.png' width=600>

# 2. Writing a custom Model

There are three ways to implement a model architecture in TF
<img src='https://i.imgur.com/n7DBcoo.png' width=400>
The third and final method to implement a model architecture using Keras and TensorFlow 2.0 is called model subclassing.

Inside of tf.keras the `Model` class is the root class used to define a model architecture. Since tf.keras utilizes object-oriented programming, we can actually `subclass` the Model class and then insert our architecture definition.

<pre>
    The `Model` class has the same API as `Layer`, with the following differences:
        It exposes built-in training, evaluation, and prediction loops (model.fit(), model.evaluate(), model.predict()).
        It exposes the list of its inner layers, via the `model.layers` property.
        It exposes saving and serialization APIs.
    
    <font color='blue'>Effectively, the "Layer" class corresponds to what we refer to in the literature as a "layer" (as in "convolution layer" or "recurrent layer") or as a "block" (as in "ResNet block" or "Inception block").

    Meanwhile, the "Model" class corresponds to what is referred to in the literature as a "model" (as in "deep learning model") or as a "network" (as in "deep neural network").
    </font>
</pre>
## 2. 1 Example

In [0]:
class MyDenseLayer(tf.keras.layers.Layer):
    def __init__(self, num_outputs, **kwargs):
        super().__init__(**kwargs) #https://stackoverflow.com/a/27134600/4084039
        self.num_outputs = num_outputs
        
    def build(self, input_shape):
        self.kernel = self.add_weight("kernel", shape=[int(input_shape[-1]), self.num_outputs])
        
    def call(self, input):
        print(input.shape,self.kernel.shape)
        return tf.matmul(input, self.kernel)


class MyModel(Model):
    def __init__(self, num_inputs, num_outputs, rnn_units):
        super().__init__() # https://stackoverflow.com/a/27134600/4084039
        self.dense = MyDenseLayer(num_outputs, name='myDenseLayer') 
        # we can't use the LSTM layer directly when we are building the custom model
        # we need to write like to get the functionality of the LSTM layer
        self.lstmcell = tf.keras.layers.LSTMCell(rnn_units)
        self.rnn = RNN(self.lstmcell)
        self.softmax = Softmax()
        
    def call(self, input):
        output = self.rnn(input)
        output = self.dense(output)
        output = self.softmax(output)
        return output

import numpy as np
data = np.zeros([10,5,5])
y = np.zeros([10,2])

model  = MyModel(num_inputs=5, num_outputs=2, rnn_units=32)

loss_object = tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.Adam()

model.compile(optimizer=optimizer,loss=loss_object)
model.fit(data,y, steps_per_epoch=1)

model.summary()

Instructions for updating:
Colocations handled automatically by placer.
(?, 32) (32, 2)
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.cast instead.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
myDenseLayer (MyDenseLayer)  multiple                  64        
_________________________________________________________________
lstm_cell (LSTMCell)         multiple                  4864      
_________________________________________________________________
rnn (RNN)                    multiple                  4864      
_________________________________________________________________
softmax (Softmax)            multiple                  0         
Total params: 4,928
Trainable params: 4,928
Non-trainable params: 0
_________________________________________________________________


# 3. Encode decoder Architecture

In [0]:
class Encoder(tf.keras.layers.Layer):
    def __init__(self, vocab_size, embedding_dim, input_length, enc_units):
        super().__init__()
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.input_length = input_length
        self.enc_units= enc_units
        self.lstm_output = 0
        self.state_h=0
        self.state_c=0
        
    def build(self, input_shape):
        self.embedding = Embedding(input_dim=self.vocab_size, output_dim=50, input_length=self.input_length,
                           mask_zero=True, name="embedding_layer_encoder")
        self.lstm = LSTM(self.enc_units, return_state=True, return_sequences=True, name="Encoder_LSTM")
        
    def call(self, input_sentances, training=True):
        print("ENCODER ==> INPUT SQUENCES SHAPE :",input_sentances.shape)
        input_embedd                           = self.embedding(input_sentances)
        print("ENCODER ==> AFTER EMBEDDING THE INPUT SHAPE :",input_embedd.shape)
        self.lstm_output, self.lstm_state_h,self.lstm_state_c = self.lstm(input_embedd)
        return self.lstm_output, self.lstm_state_h,self.lstm_state_c
    def get_states(self):
        return self.state_h,self.state_c
    
    
class Decoder(tf.keras.layers.Layer):
    def __init__(self, vocab_size, embedding_dim, input_length, dec_units):
        super().__init__()
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.dec_units = dec_units
        self.input_length = input_length

    def build(self, input_shape):
        self.embedding = Embedding(input_dim=self.vocab_size, output_dim=50, input_length=input_shape,
                           mask_zero=True, name="embedding_layer_decoder")
        self.lstm = LSTM(self.dec_units, return_sequences=True, return_state=True, name="Encoder_LSTM")
        
    def call(self, target_sentances, state_h, state_c):
        print("DECODER ==> INPUT SQUENCES SHAPE :",target_sentances.shape)
        target_embedd           = self.embedding(target_sentances)
        print("WE ARE INITIALIZING DECODER WITH ENCODER STATES :",state_h.shape, state_c.shape)
        lstm_output, _,_        = self.lstm(target_embedd, initial_state=[state_h, state_c])
        return lstm_output
    

class MyModel(Model):
    def __init__(self, encoder_inputs_length,decoder_inputs_length, output_vocab_size):
        super().__init__() # https://stackoverflow.com/a/27134600/4084039
        self.encoder = Encoder(vocab_size=500, embedding_dim=300, input_length=encoder_inputs_length, enc_units=64)
        self.decoder = Decoder(vocab_size=500, embedding_dim=300, input_length=decoder_inputs_length, dec_units=64)
        self.dense   = Dense(output_vocab_size, activation='softmax')
        
        
    def call(self, data):
        input,output = data[0], data[1]
        print("="*20, "ENCODER", "="*20)
        encoder_output, encoder_h, encoder_c = self.encoder(input)
        print("-"*27)
        print("ENCODER ==> OUTPUT SHAPE",encoder_output.shape)
        print("ENCODER ==> HIDDEN STATE SHAPE",encoder_h.shape)
        print("ENCODER ==> CELL STATE SHAPE", encoder_c.shape)
        print("="*20, "DECODER", "="*20)
        decoder_output                       = self.decoder(output, encoder_h, encoder_c)
        output                               = self.dense(decoder_output)
        print("-"*27)
        print("FINAL OUTPUT SHAPE",output.shape)
        print("="*50)
        return output

In [0]:
model  = MyModel(encoder_inputs_length=10,decoder_inputs_length=10,output_vocab_size=500)

ENCODER_SEQ_LEN = 30
DECODER_SEQ_LEN = 20

input = np.random.randint(0, 499, size=(2000, ENCODER_SEQ_LEN))
output = np.random.randint(0, 499, size=(2000, DECODER_SEQ_LEN))
target = tf.keras.utils.to_categorical(output, 500)

# loss_object = loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
optimizer = tf.keras.optimizers.Adam()

model.compile(optimizer=optimizer,loss='sparse_categorical_crossentropy')

model.fit([input, output], output, steps_per_epoch=1)

"""
or you can try this

model.compile(optimizer=optimizer,loss='categorical_crossentropy')
model.fit([input, output], target, steps_per_epoch=1)

"""
model.summary()

ENCODER ==> INPUT SQUENCES SHAPE : (?, 30)
ENCODER ==> AFTER EMBEDDING THE INPUT SHAPE : (?, 30, 50)
---------------------------
ENCODER ==> OUTPUT SHAPE (?, 30, 64)
ENCODER ==> HIDDEN STATE SHAPE (?, 64)
ENCODER ==> CELL STATE SHAPE (?, 64)
DECODER ==> INPUT SQUENCES SHAPE : (?, 20)
WE ARE INITIALIZING DECODER WITH ENCODER STATES : (?, 64) (?, 64)
---------------------------
FINAL OUTPUT SHAPE (?, 20, 500)
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
encoder (Encoder)            multiple                  54440     
_________________________________________________________________
decoder (Decoder)            multiple                  54440     
_________________________________________________________________
dense (Dense)                multiple                  32500     
Total params: 141,380
Trainable params: 141,380
Non-trainable params: 0
_________________________________________________________________


# 4. Sequence to Sequence Model

## 4.1 Training time

<h3 id="vanilla-seq2seq">Vanilla Seq2Seq</h3>

<p>The Seq2Seq framework relies on the <strong>encoder-decoder</strong> paradigm. The <strong>encoder</strong> <em>encodes</em> the input sequence, while the <strong>decoder</strong> <em>produces</em> the target sequence</p>

<p><strong>Encoder</strong></p>

<p>Our input sequence is <code class="highlighter-rouge">how are you</code>. Each word from the input sequence is associated to a vector $ w \in \mathbb{R}^d $ (via a lookup table). In our case, we have 3 words, thus our input will be transformed into $ [w_0, w_1, w_2] \in \mathbb{R}^{d \times 3} $. Then, we simply run an LSTM over this sequence of vectors and store the last hidden state outputed by the LSTM: this will be our encoder representation $ e $. Let’s write the hidden states $ [e_0, e_1, e_2] $ (and thus $ e = e_2 $)</p>

<table class="center-image" style="max-width: 60%">
<tr>
<td><img src="https://i.imgur.com/nToLTs2.png" alt="Vanilla Encoder" /></td>
</tr>
<caption align="bottom"><div class="text-center">Vanilla Encoder</div></caption>
</table>
<p></p>

<p><strong>Decoder</strong></p>

<p>Now that we have a vector $ e $ that captures the meaning of the input sequence, we’ll use it to generate the target sequence word by word. Feed to another LSTM cell: $ e $ as hidden state and a special <em>start of sentence</em> vector $ w_{sos} $ as input. The LSTM computes the next hidden state $ h_0 \in \mathbb{R}^h $. Then, we apply some function $ g : \mathbb{R}^h \mapsto \mathbb{R}^V $ so that $ s_0 := g(h_0) \in \mathbb{R}^V $ is a vector of the same size as the vocabulary.</p>

$$
\begin{align*}
h_0 &= \operatorname{LSTM}\left(e, w_{sos} \right)\\
s_0 &= g(h_0)\\
p_0 &= \operatorname{softmax}(s_0)\\
i_0 &= \operatorname{argmax}(p_0)\\
\end{align*} $$

<p>Then, apply a softmax to $ s_0 $ to normalize it into a vector of probabilities $ p_0 \in \mathbb{R}^V $ . Now, each entry of $ p_0 $ will measure how likely is each word in the vocabulary. Let’s say that the word <em>“comment”</em> has the highest probability (and thus $ i_0 = \operatorname{argmax}(p_0) $ corresponds to the index of <em>“comment”</em>). Get a corresponding vector $ w_{i_0} = w_{comment} $ and repeat the procedure: the LSTM will take $ h_0 $ as hidden state and $ w_{comment} $ as input and will output a probability vector $ p_1 $ over the second word, etc.</p>

$$
\begin{align*}
h_1 &= \operatorname{LSTM}\left(h_0, w_{i_0} \right)\\
s_1 &= g(h_1)\\
p_1 &= \operatorname{softmax}(s_1)\\
i_1 &= \operatorname{argmax}(p_1)
\end{align*} $$

<p>The decoding stops when the predicted word is a special <em>end of sentence</em> token.</p>

<table class="center-image" style="max-width: 60%">
<tr>
<td><img src="https://i.imgur.com/WEPJChD.png" alt="Vanilla Decoder" /></td>
</tr>
<caption align="bottom"><div class="text-center">Vanilla Decoder</div></caption>
</table>
<p></p>

<blockquote>
  <p>Intuitively, the hidden vector represents the “amount of meaning” that has not been decoded yet.</p>
</blockquote>

<p>The above method aims at modelling the distribution of the next word conditionned on the beginning of the sentence</p>

$$\mathbb{P}\left[ y_{t+1} | y_1, \dots, y_{t}, x_0, \dots, x_n \right]$$

<p>by writing</p>

$$\mathbb{P}\left[ y_{t+1} | y_t, h_{t}, e \right]$$

> in the simple venila seq-seq models, we will pass the last time step hidden and cell states to the decoder, instead of that, we can do avg-pooling or max-pooling of all the hidden states of encoder and then pass the results as the inputs to the decoder.

## 4.2 Inference

In [0]:
print("=" * 30, "Inference", "=" * 30)
enc_output, enc_state_h, enc_state_c = model.layers[0](np.expand_dims(input[0], 0))
states_values = [enc_state_h, enc_state_c]
pred = []
cur_vec = np.zeros((1, 1))
print('-'*20,"started predition","-"*20)
print("at time step 0 the word is 0")
for i in range(DECODER_SEQ_LEN):
    cur_emb = model.layers[1].embedding(cur_vec)
    [infe_output, state_h, state_c] = model.layers[1].lstm(cur_emb, initial_state=states_values)
    states_values = [state_h, state_c]
    # np.argmax(infe_output) will be a single value, which represents the the index of predicted word
    # but to pass this data into next time step embedding layer, we are reshaping it into (1,1) shape
    cur_vec = np.reshape(np.argmax(infe_output), (1, 1))
    print("at time step 0 the word is ", cur_vec)
    pred.append(cur_vec)

ENCODER ==> INPUT SQUENCES SHAPE : (1, 30)
ENCODER ==> AFTER EMBEDDING THE INPUT SHAPE : (1, 30, 50)
-------------------- started predition --------------------
at time step 0 the word is 0
at time step 0 the word is  [[41]]
at time step 0 the word is  [[43]]
at time step 0 the word is  [[58]]
at time step 0 the word is  [[58]]
at time step 0 the word is  [[0]]
at time step 0 the word is  [[60]]
at time step 0 the word is  [[60]]
at time step 0 the word is  [[60]]
at time step 0 the word is  [[60]]
at time step 0 the word is  [[53]]
at time step 0 the word is  [[21]]
at time step 0 the word is  [[27]]
at time step 0 the word is  [[62]]
at time step 0 the word is  [[62]]
at time step 0 the word is  [[62]]
at time step 0 the word is  [[41]]
at time step 0 the word is  [[8]]
at time step 0 the word is  [[36]]
at time step 0 the word is  [[16]]
at time step 0 the word is  [[52]]


# 5. Attention Mechanisum

In [0]:
from IPython.display import IFrame
IFrame("https://arxiv.org/pdf/1409.0473.pdf", width=800, height=500)

In [0]:
from IPython.display import IFrame
IFrame("https://arxiv.org/pdf/1508.04025.pdf", width=800, height=500)

## 5.1 Attention mechanism explained

<img src='https://i.imgur.com/OgxuAaM.jpg' width="100%">

<img src='https://i.imgur.com/7VhsNER.jpg' width="100%">

<img src='https://i.imgur.com/FIUe33r.jpg' width="100%">

<img src='https://i.imgur.com/aAGSfN1.jpg' width="100%">

<img src='https://i.imgur.com/oEyRXSB.jpg' width="100%">
credit: <a href ='https://guillaumegenthial.github.io/assets/img2latex/seq2seq_attention_mechanism_new.svg'>https://guillaumegenthial.github.io</a>

<ol>
<li>
<ul>
<li><img src='https://i.imgur.com/wX7RtF8.jpg' width="100%"></li>
<li><img src='https://i.imgur.com/Vh01Sf3.jpg' width="100%"></li>
<li><img src='https://i.imgur.com/Vh01Sf3.jpg' width="100%"></li>
<li><img src='https://i.imgur.com/JWOD8JM.jpg' width="100%"></li>
</ul>
</li>

<li>
<ul>
<li><img src='https://i.imgur.com/A17EPNJ.jpg' width="100%"></li>
</ul>
</li>

<li>
<ul>
<li><img src='https://i.imgur.com/S08e16r.jpg' width="100%"></li>
</ul>
</li>

<li>
<ul>
<li><img src='https://i.imgur.com/1eB1Mrl.jpg' width="100%"></li>
</ul>
</li>

<li>
<ul>
<li><img src='https://i.imgur.com/3f7graC.jpg' width="100%"></li>
</ul>
</li>

</ol>
<img src='https://i.imgur.com/WyKmkOF.jpg' width="100%" >


## 5.2 Inference and Plotting Attention weights

### 5.2.1 Inference

<li>if you observe the above decoder state the att_weights was not there, we modified the onestep_decoder such that it will return att_weights</li>
<li>For every time step in the decoder we are getting the attention weights size=(numeber of encoder units)</li>
<li>Consider if ith weight is maximum in att_weights then the ith timestep(word) in encoder is helping more for transilating curent decoder word</li>

<img src='https://i.imgur.com/9byegcX.png'>

<pre>
Encoder output shape : (1, 30, 16)
Encoder state shape : (1, 16)
-------------------- started predition --------------------
at time step 0 the word is  [[0.]]
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
at time step 0 the word is  [[60]] (30,)
</pre>
Note: the data we have created is completly random and these are the results after we have trained model only for on epoch.

### 5.2.2 plotting attention

1. if you have observed the above weights, for each time step in decoder you will get a 30 dimension vector.
2. So you can create a matrix with with dimension A = (decoder time steps * encoder time steps)
3. note that as there will be padding in both encoder and decoder inputs, you might need to remove the rows and columns if they corresponds to padding token. therefore your matrix A should (decoder tokens * encoder tokens) (except padding token)
4. you can use seaboarn to plot the matrix.

# 6. Metric

The BLEU score is a string-matching algorithm that provides basic quality metrics for MT researchers and developers.

To conduct a BLEU measurement the following is necessary:

1. One or more human reference translations. This should be data that has not been used in building the system (training data) and ideally should be unknown to the MT system developer.
2. It is generally recommended that 1,000 or more sentences be used to get a meaningful measurement. Too small a sample set can sway the score significantly with just a few sentences that match or do not match well.
3. Automated translation output of the exact same source data set.
4. A measurement utility that performs the comparison and score calculation. ex: import nltk.translate.bleu_score as bleu

The BLEU metric scores a translation on a scale of 0 to 1, in an attempt to measure the adequacy and fluency of the MT output. The closer to 1 the test sentences score, the more overlap there is with their human reference translations and thus, the better the system is deemed to be. BLEU scores are often stated on a scale of 1 to 100 to simplify communication, but this should not be confused with the percentage of accuracy.


In [0]:
import warnings
warnings.filterwarnings('ignore')

import nltk.translate.bleu_score as bleu

reference = ['i am groot'.split(),] # the original

translation = 'it is ship'.split() # trasilated using model
print('BLEU score: {}'.format(bleu.sentence_bleu(reference, translation)))

BLEU score: 0
