# Masnavi mANNavi

This tutorial demonstrates how to generate text using a character-based RNN. We will work with a dataset of Rumi's Masnavi Manavi Given a sequence of characters from this data , train a model to predict the next character in the sequence. Longer sequences of text can be generated by calling the model repeatedly.

This tutorial includes runnable code implemented using [tf.keras](https://www.tensorflow.org/programmers_guide/keras) and [eager execution](https://www.tensorflow.org/programmers_guide/eager). The following is sample output when the model in this tutorial trained for 30 epochs, and started with the word "به":

<pre>

به درد و سرد
هم مخنث را نهان پيدا شود	سنگ و كوه و ماهشان بد پيش تو
اين كرامت يافت گردش دم به دست	آتشى خواهين فقير شاه مرد
مست آن كه خوش شدند آن كردگار	بى‏اسب زين سايه در جست او قما
چشم داند جست و باطن زير چاد	كان به چاهى مى‏كنى هم از خدا
خويشى و ديوار تن از دست تست	پس ندانستيم اندر مشرق است‏
صبر كن با اين دو معنيهاى نور	كه رود جز بس عدو ضرير
و آن پيمبر گفت او را حصرت زدند	هيچ گشتيم از دروغ آن جا روى‏
روز روشن گردد آن در دود تو	سوى موسى زيركى رنجور رفت‏
گفت دارم در گذار اين اعتقان	آمديم اندر هلا در نوع دان‏
هم تو بر با هم از آن آموخته است	فايم آن فرحود خود را از كل‏
ايمن آب بود آخر زمان	كه دمى سازن مرا پيش كشيد
كيست ما زين افتضاط فقل را	جفت در خشم آمد و رفته توار
نه به دست آمد كه اى من سرخ گشت	گنج نوع و خواب را ياريك شو
كى كنند آن پير انسبان مان خوى رو سوى خانه‏ى زردها را بر تنت	با سرشتن را بود هم در پذير	كه نباشد طبع شير و جهل شير
پا خلوص است از ناودان فر من خوشند يك ذوت آب از جبيله‏ى قوتى است	عاقلى بر صد معرف اين ستوخ‏
اين ندارد جان مادر مى‏فتد	چون نهاد او شير مردانه بجس
</pre>

While some of the sentences are grammatical, most do not make sense. The model has not learned the meaning of words, but consider:

* The model is character-based. When training started, the model did not know how to spell a word, or that words were even a unit of text.

* As demonstrated below, the model is trained on small batches of text (100 characters each), and is still able to generate a longer sequence of text with coherent structure.

## Setup

### Import TensorFlow and other libraries

In [1]:
from __future__ import absolute_import, division, print_function, unicode_literals

!pip install tensorflow-gpu==2.0.0-alpha0
import tensorflow as tf

import numpy as np
import os
import time

Collecting tensorflow-gpu==2.0.0-alpha0
[?25l  Downloading https://files.pythonhosted.org/packages/1a/66/32cffad095253219d53f6b6c2a436637bbe45ac4e7be0244557210dc3918/tensorflow_gpu-2.0.0a0-cp36-cp36m-manylinux1_x86_64.whl (332.1MB)
[K    100% |████████████████████████████████| 332.1MB 52kB/s 
Collecting tf-estimator-nightly<1.14.0.dev2019030116,>=1.14.0.dev2019030115 (from tensorflow-gpu==2.0.0-alpha0)
[?25l  Downloading https://files.pythonhosted.org/packages/13/82/f16063b4eed210dc2ab057930ac1da4fbe1e91b7b051a6c8370b401e6ae7/tf_estimator_nightly-1.14.0.dev2019030115-py2.py3-none-any.whl (411kB)
[K    100% |████████████████████████████████| 419kB 11.1MB/s 
Collecting google-pasta>=0.1.2 (from tensorflow-gpu==2.0.0-alpha0)
[?25l  Downloading https://files.pythonhosted.org/packages/64/bb/f1bbc131d6294baa6085a222d29abadd012696b73dcbf8cf1bf56b9f082a/google_pasta-0.1.5-py3-none-any.whl (51kB)
[K    100% |████████████████████████████████| 61kB 30.4MB/s 
[?25hCollecting tb-nightly<1.14

### Load Masnavi dataset

Originally Masnavi is a series of six books of poetry that together amount to around 25,000 verses or 50,000 line. hover to the books folder where you can find them seperately in doc format with extra credits and organized. also there is a Masnavi.txt file that contains all six books plain poets one after another. 

In [0]:
path_to_file = '/Books/Masnavi.txt'

### Read the data

First, look in the text:

In [5]:
# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print ('Length of text: {} characters'.format(len(text)))

Length of text: 1530858 characters


In [6]:
# Take a look at the first 250 characters in text
print(text[:225])

بشنو از نى چون حكايت مى‏كند	از جدايى‏ها شكايت مى‏كند
كز نيستان تا مرا ببريده‏اند	در نفيرم مرد و زن ناليده‏اند
سينه خواهم شرحه شرحه از فراق	تا بگويم شرح درد اشتياق‏
هر كسى كاو دور ماند از اصل خويش	باز جويد روزگار وصل خويش‏


In [7]:
# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

59 unique characters


## Process the text

### Vectorize the text

Before training, we need to map strings to a numerical representation. Create two lookup tables: one mapping characters to numbers, and another for numbers to characters.

In [0]:
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])

Now we have an integer representation for each character. Notice that we mapped the character as indexes from 0 to `len(unique)`.

In [9]:
print('{')
for char,_ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ...\n}')

{
  '\t':   0,
  '\n':   1,
  '\r':   2,
  ' ' :   3,
  '"' :   4,
  '(' :   5,
  ')' :   6,
  ':' :   7,
  '،' :   8,
  'ء' :   9,
  'آ' :  10,
  'أ' :  11,
  'ؤ' :  12,
  'إ' :  13,
  'ئ' :  14,
  'ا' :  15,
  'ب' :  16,
  'ة' :  17,
  'ت' :  18,
  'ث' :  19,
  ...
}


In [10]:
# Show how the first 13 characters from the text are mapped to integers
print ('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))

'بشنو از نى چو' ---- characters mapped to int ---- > [16 28 40 42  3 15 26  3 40 43  3 54 42]


### The prediction task

Given a character, or a sequence of characters, what is the most probable next character? This is the task we're training the model to perform. The input to the model will be a sequence of characters, and we train the model to predict the output—the following character at each time step.

Since RNNs maintain an internal state that depends on the previously seen elements, given all the characters computed until this moment, what is the next character?


### Create training examples and targets

Next divide the text into example sequences. Each input sequence will contain `seq_length` characters from the text.

For each input sequence, the corresponding targets contain the same length of text, except shifted one character to the right.

To do this first use the `tf.data.Dataset.from_tensor_slices` function to convert the text vector into a stream of character indices.

In [11]:
# The maximum length sentence we want for a single input in characters
seq_length = 100
examples_per_epoch = len(text)//seq_length

# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

for i in char_dataset.take(5):
  print(idx2char[i.numpy()])

ب
ش
ن
و
 


The `batch` method lets us easily convert these individual characters to sequences of the desired size.

In [12]:
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

for item in sequences.take(5):
  print(repr(''.join(idx2char[item.numpy()])))

'بشنو از نى چون حكايت مى\u200fكند\tاز جدايى\u200fها شكايت مى\u200fكند\r\nكز نيستان تا مرا ببريده\u200fاند\tدر نفيرم مرد و زن ن'
'اليده\u200fاند\r\nسينه خواهم شرحه شرحه از فراق\tتا بگويم شرح درد اشتياق\u200f\r\nهر كسى كاو دور ماند از اصل خويش\tباز'
' جويد روزگار وصل خويش\u200f\r\nمن به هر جمعيتى نالان شدم\tجفت بد حالان و خوش حالان شدم\u200f\r\nهر كسى از ظن خود شد '
'يار من\tاز درون من نجست اسرار من\u200f\r\nسر من از ناله\u200fى من دور نيست\tليك چشم و گوش را آن نور نيست\u200f\r\nتن ز جان'
' و جان ز تن مستور نيست\tليك كس را ديد جان دستور نيست\u200f\r\nآتش است اين بانگ ناى و نيست باد\tهر كه اين آتش ن'


For each sequence, duplicate and shift it to form the input and target text by using the `map` method to apply a simple function to each batch:

In [0]:
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

Each index of these vectors are processed as one time step. For the input at time step 0, the model receives the index for "F" and trys to predict the index for "i" as the next character. At the next timestep, it does the same thing but the `RNN` considers the previous step context in addition to the current input character.

In [15]:
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

Step    0
  input: 16 ('ب')
  expected output: 28 ('ش')
Step    1
  input: 28 ('ش')
  expected output: 40 ('ن')
Step    2
  input: 40 ('ن')
  expected output: 42 ('و')
Step    3
  input: 42 ('و')
  expected output: 3 (' ')
Step    4
  input: 3 (' ')
  expected output: 15 ('ا')


### Create training batches

We used `tf.data` to split the text into manageable sequences. But before feeding this data into the model, we need to shuffle the data and pack it into batches.

In [16]:
# Batch size
BATCH_SIZE = 64

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset

<BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

## Build The Model

Use `tf.keras.Sequential` to define the model. For this simple example three layers are used to define our model:

* `tf.keras.layers.Embedding`: The input layer. A trainable lookup table that will map the numbers of each character to a vector with `embedding_dim` dimensions;
* `tf.keras.layers.GRU`: A type of RNN with size `units=rnn_units` (You can also use a LSTM layer here.)
* `tf.keras.layers.Dense`: The output layer, with `vocab_size` outputs.

In [0]:
# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

In [0]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.LSTM(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

In [19]:
model = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

W0420 07:02:01.526203 140266123802496 tf_logging.py:161] <tensorflow.python.keras.layers.recurrent.UnifiedLSTM object at 0x7f91d08955f8>: Note that this layer is not optimized for performance. Please use tf.keras.layers.CuDNNLSTM for better performance on GPU.


In [20]:
for input_example_batch, target_example_batch in dataset.take(1):
  example_batch_predictions = model(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 59) # (batch_size, sequence_length, vocab_size)


In the above example the sequence length of the input is `100` but the model can be run on inputs of any length:

In [21]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (64, None, 256)           15104     
_________________________________________________________________
unified_lstm (UnifiedLSTM)   (64, None, 1024)          5246976   
_________________________________________________________________
dense (Dense)                (64, None, 59)            60475     
Total params: 5,322,555
Trainable params: 5,322,555
Non-trainable params: 0
_________________________________________________________________


To get actual predictions from the model we need to sample from the output distribution, to get actual character indices. This distribution is defined by the logits over the character vocabulary.

Note: It is important to _sample_ from this distribution as taking the _argmax_ of the distribution can easily get the model stuck in a loop.

Try it for the first example in the batch:

## Train the model

At this point the problem can be treated as a standard classification problem. Given the previous RNN state, and the input this time step, predict the class of the next character.

### Attach an optimizer, and a loss function

The standard `tf.keras.losses.sparse_softmax_crossentropy` loss function works in this case because it is applied across the last dimension of the predictions.

Because our model returns logits, we need to set the `from_logits` flag.


In [22]:
def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

example_batch_loss  = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ", example_batch_loss.numpy().mean())

Prediction shape:  (64, 100, 59)  # (batch_size, sequence_length, vocab_size)
scalar_loss:       4.0772667


Configure the training procedure using the `tf.keras.Model.compile` method. We'll use `tf.keras.optimizers.Adam` with default arguments and the loss function.

In [0]:
model.compile(optimizer='adam', loss=loss)

### Configure checkpoints

In [0]:
# Directory where the checkpoints will be saved
checkpoint_dir = './checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

### Execute the training

In Colab, set the runtime to GPU for faster training.

In [0]:
EPOCHS=30

In [0]:
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


## Generate text

### Restore the latest checkpoint

In [0]:
tf.train.latest_checkpoint(checkpoint_dir)

'./checkpoints/ckpt_30'

In [0]:
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

model.build(tf.TensorShape([1, None]))

W0416 06:34:17.037098 140272007907200 tf_logging.py:161] <tensorflow.python.keras.layers.recurrent.UnifiedLSTM object at 0x7f92ab8f65f8>: Note that this layer is not optimized for performance. Please use tf.keras.layers.CuDNNLSTM for better performance on GPU.


In [0]:
model.summary()

Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_6 (Embedding)      (1, None, 256)            15104     
_________________________________________________________________
unified_lstm_6 (UnifiedLSTM) (1, None, 1024)           5246976   
_________________________________________________________________
dense_6 (Dense)              (1, None, 59)             60475     
Total params: 5,322,555
Trainable params: 5,322,555
Non-trainable params: 0
_________________________________________________________________


In [0]:
def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  num_generate = 1000

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 1.0

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the word returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted word as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

In [0]:
print(generate_text(model, start_string=u"به"))

به درد و سرد
هم مخنث را نهان پيدا شود	سنگ و كوه و ماهشان بد پيش تو
اين كرامت يافت گردش دم به دست	آتشى خواهين فقير شاه مرد
مست آن كه خوش شدند آن كردگار	بى‏اسب زين سايه در جست او قما
چشم داند جست و باطن زير چاد	كان به چاهى مى‏كنى هم از خدا
خويشى و ديوار تن از دست تست	پس ندانستيم اندر مشرق است‏
صبر كن با اين دو معنيهاى نور	كه رود جز بس عدو ضرير
و آن پيمبر گفت او را حصرت زدند	هيچ گشتيم از دروغ آن جا روى‏
روز روشن گردد آن در دود تو	سوى موسى زيركى رنجور رفت‏
گفت دارم در گذار اين اعتقان	آمديم اندر هلا در نوع دان‏
هم تو بر با هم از آن آموخته است	فايم آن فرحود خود را از كل‏
ايمن آب بود آخر زمان	كه دمى سازن مرا پيش كشيد
كيست ما زين افتضاط فقل را	جفت در خشم آمد و رفته توار
نه به دست آمد كه اى من سرخ گشت	گنج نوع و خواب را ياريك شو
كى كنند آن پير انسبان مان خوى رو سوى خانه‏ى زردها را بر تنت	با سرشتن را بود هم در پذير	كه نباشد طبع شير و جهل شير
پا خلوص است از ناودان فر من خوشند يك ذوت آب از جبيله‏ى قوتى است	عاقلى بر صد معرف اين ستوخ‏
اين ندارد جان مادر مى‏فتد	چون نهاد او شير مردانه ب

The easiest thing you can do to improve the results it to train it for longer (try `EPOCHS=50`).

You can also experiment with a different start string, or try adding another RNN layer to improve the model's accuracy, or adjusting the temperature parameter to generate more or less random predictions.

# Conclusion

Apparently  our model managed to generate meaningful words, but meaningful poem? definitely not. so to wrap it up, don't do drugs!