# Part II: Sequence to Sequence Models in tensorflow

 

## Task: Language Modeling
Using tensorflow to build a simple seq2seq structure for language modeling.
### Definition
We input a sequence of words/characters to an RNN so that it can learn the probability distribution of the next word/character in the sequence given the history of previous characters. This will then allow us to generate text one unit at a time.

We will use 全唐诗 as our training data, and try to generate new poems later.

## Check the content



The format of our input data is like this:

`(optional title + ":")poem`

We will use only the poem part and not the title.

However, some special cases like:

* 河鱼未上冻，江蛰已闻雷。（见《纬略》）
* □□□□□

We need some preprocessing

* Remove title
* Remove spaces
* Remove empty symbols
* Replace other symbols

Finally, we will randomize them. In case the model learning the same pattern and increase the variance

In [1]:
import re
import numpy as np

data_filename ='poetry.txt'
poems = []
with open(data_filename, "r") as in_file:
    for line in in_file.readlines():
        line = line.strip()
        # find title if exists
        if ':' in line:
            line = line.split(':')
        # some poems are empty
        if len(line) == 2:
            poem = line[1]
        else: # only got title
            continue
        # discard if contains special symbols
        if re.search(r'[(（《_□]', poem):
            continue
        # discard if too short or too long
        if len(poem) < 5 or len(poem) > 40:
            continue
        # remove symbols
        poem = re.sub(u'[，。]','',poem) # punctuation would appear many times, remove them in case model learned it.
        poems.append(poem)

poems = np.random.permutation(poems)

In [2]:
len(poems)

11103

We select 5 poems as our test set.

In [3]:
poems_train, poems_test = poems[:-5], poems[-5:]
len(poems_train), len(poems_test)

(11098, 5)

## Word to ID

Use the tokenizer in Keras to tokenize the words. We set
```python
Tokenizer(num_words=None, lower=False, char_level=True)
```
to not limit the number of words in the dictionary, and use character as our unit.

**char_level=True** indicates convert input to character.

We also need to correct the dictionary because it starts from 1 and that will be a problem later

In [4]:
import time
from tensorflow.keras.preprocessing.text import Tokenizer

poem_tokenizer = Tokenizer(num_words=None, lower=False, char_level=True) 
# Create word to ID dictionary
poem_tokenizer.fit_on_texts(poems)
# Get dictionary
word_index = poem_tokenizer.word_index

# Note that ID starts from 1!!
# We need to add special ID 0
word_index["<PAD>"] = 0 # make sure id count equal to word cound
# Create ID to word 
reverse_word_index = dict([(v, k) for (k, v) in word_index.items()])
print("Number of unique chars: {}".format(len(word_index)))

Number of unique chars: 4762


Again, check if there is any strange symbols in the dictionary. Here we only print first and last parts.

In [5]:
# sort word index by ID, and sorting by frequency. So we check the most and least frequency words which most likely
# have problem.
for (w,i) in sorted(word_index.items(), key=lambda w: w[1]):
# print some words to check if there are errors!
    if i > 10 and i < len(word_index)-5: continue
    print("{} {}".format(w,i))

<PAD> 0
不 1
人 2
一 3
山 4
风 5
无 6
花 7
来 8
日 9
春 10
爚 4757
茏 4758
捻 4759
窖 4760
湍 4761


In [6]:
# Apply word to ID on training and test set
poems_train = poem_tokenizer.texts_to_sequences(poems_train)
poems_test = poem_tokenizer.texts_to_sequences(poems_test)
# Check and see if there is any error
print(poems_train[0])
print(''.join([reverse_word_index[w] for w in poems_train[0]]))

[2818, 13, 81, 78, 6, 714, 10, 1227, 44, 136, 761, 114, 115, 130, 45, 2, 76, 11, 31, 888, 110, 5, 162, 297, 7, 89, 599, 2]
汴水东流无限春隋家宫阙已成尘行人莫上长堤望风起杨花愁杀人


## Prepare the data for input

Flatten the input to a long list.

In [7]:
# flatten to a long string of characters
poems_train = [w for po in poems_train for w in po]

# flatten to a long string of characters
poems_test = [w for po in poems_test for w in po]

## Define an input object

We need to put the input into batches.
* Reshape input data into a rectangular matrix and crop remainders
* Calculate shape of each batch
* Generate batch with input and output = input shift by one time step


In [8]:
import tensorflow as tf

In [9]:
class PoemInput(object):
    def __init__(self, data, config, name=None):
        self.batch_size = batch_size = config.batch_size
        self.num_steps = num_steps = config.num_steps
        self.epoch_size = ((len(data) // batch_size) - 1) // num_steps # how many batchs
        self.sources, self.targets = self.input_producer(
            data, batch_size, num_steps, name=name)

    def input_producer(self, raw_data, batch_size, num_steps, name=None):
        """Reshape the poem data to form input and output.
        This chunks the raw_data into batches of examples and returns Tensors that
        are drawn from these batches.
        INPUT:
          raw_data: a list of words      # poems_train
          batch_size: int, the batch size.
          num_steps: int, the sequence length.
          name: the name of this operation (optional).
        OUTPUT:
          A pair of Tensors, each shaped [batch_size, num_steps]. The second element
          of the tuple is the same data time-shifted to the right by one.
        """
        raw_data = tf.convert_to_tensor(raw_data, name="raw_data", dtype=tf.int32)
        # get size of the 1-d tensor
        data_len = tf.size(raw_data)
        # calculate how many batches
        batch_len = data_len // batch_size
        # crop data that does not fit in a batch
        data = tf.reshape(raw_data[0:batch_size*batch_len],
                          [batch_size, batch_len])
        # calculate how many batches in an epoch
        epoch_size = (batch_len-1) // num_steps
        # make sure there is at least one batch
        assertion = tf.assert_positive(epoch_size,
            message="epoch_size == 0, decrease batch_size or num_steps")
        with tf.control_dependencies([assertion]):
            epoch_size = tf.cast(tf.identity(epoch_size, name="epoch_size"), tf.int64)

        # start generating slices
        # range_input_producer returns a sequence of IDs 
        i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue()
        x = data[:, i*num_steps  :(i+1)*num_steps]
        y = data[:, i*num_steps+1:(i+1)*num_steps+1]
        print("input_ source type: ", type(x))   # tensor
        print("input_ source shape: ", x.shape) # batch_size, num_steps, epoch_size
        return x, y

## Define hyperparameters

In [10]:
# Define hyperparameters
class Hparam(object):
    learning_rate = 1.0
    max_grad_norm = 5
    num_layers = 1
    num_steps = 35 # how many words in each training; affect training time highly
    vocab_size = len(word_index)
    embedding_size = 100
    hidden_size = 100 # LSTM hidden lalyer size
    warmup_epochs = 3 # first 3 epoch, learning rate fixed, after this, learning rate decresing
    num_epochs_to_train = 5
    keep_prob = 0.6 # random drop out 40% data
    lr_decay = 0.9 # learning rate decreses 0.1 every time
    batch_size = 100 # depend on hardware, cpu or gpu

config = Hparam()

## Construct model
In this step, the entire model structure must be defined completely. Including
* Input
* Size of layers
* Connection between layers
* Variables in layers
* Output
* Loss
* Operations that apply the gradients (optimizer)
* Placeholder for feeding special values
* Properties that can be read from outside


In [11]:
from tensorflow.contrib.cudnn_rnn import CudnnLSTM

In [12]:
from tensorflow.contrib.rnn import BasicLSTMCell, MultiRNNCell
from tensorflow.nn import embedding_lookup, dropout

# Build our model
class MySeq2SeqModel(object):
    def __init__(self, is_training, config, input_):
        self._is_training = is_training
        self._input = input_
        self._cell = None
        self.batch_size = input_.batch_size
        self.num_steps = input_.num_steps
        rnn_size = config.hidden_size
        vocab_size = config.vocab_size
        embedding_size = config.embedding_size

        # Embeddings can only exist on CPU
        with tf.device("/cpu:0"):
            embedding_weights = tf.get_variable("embedding", \
                         [vocab_size, embedding_size]) # 4762 * 100
            # take input_.sources from embedding_weights, convert wordID to embedding
            embed_inputs = tf.nn.embedding_lookup(embedding_weights, input_.sources) 
            print('embed_inputs shape: ', embed_inputs.shape) # batch_size, num_steps, epoch_size, embedding_size

        if is_training and config.keep_prob < 1.:
            embed_inputs = tf.nn.dropout(embed_inputs, config.keep_prob)

        # build RNN using CudnnLSTM
        output, _ = self._build_rnn(embed_inputs, config, is_training)
        # build RNN using basic LSTM
#         output, _ = self._build_rnn_old_lstm(embed_inputs, config, is_training) # _ for state

        # Remember RNN output is [batch_size x time, rnnsize]
        # Dense layer for projecting onto vocabulary size
        softmax_w = tf.get_variable("softmax_w", [rnn_size, vocab_size])
        softmax_b = tf.get_variable("softmax_b", [vocab_size])
        logits = tf.nn.xw_plus_b(output, softmax_w, softmax_b)
        # Reshape logits to be a 3-D tensor for sequence loss
        logits = tf.reshape(logits, [self.batch_size, self.num_steps, vocab_size])
        self._logits = logits

        # Use the contrib sequence loss and average over the batches
        loss = tf.contrib.seq2seq.sequence_loss(
            logits,
            input_.targets,
            tf.ones([self.batch_size, self.num_steps]), # weight
            average_across_timesteps=False, 
            average_across_batch=True)
        
        # loss is a sequence by time step
        
        # Update the cost
        self._cost = tf.reduce_sum(loss)

        if not is_training:
            return

        # A variable to store learning rate
        self._lr = tf.Variable(0.0, trainable=False) # avoid make gradient of learning rate

        # Calculate gradients
        # Get a list of trainable variables
        tvars = tf.trainable_variables()
        # Get gradient and clip by norm
        grads, _ = tf.clip_by_global_norm(\
                     tf.gradients(self._cost, tvars),
                     config.max_grad_norm) # control all gradients in max_grad_norm
        # Define an optimizer
        # Note that the optimizer reads the value of learning rate from variable
        optimizer = tf.train.GradientDescentOptimizer(self._lr) # control optimizer's learning rate
        # Define an operation that actually applies the gradients
        self._train_op = optimizer.apply_gradients(
            zip(grads, tvars),
            global_step=tf.train.get_or_create_global_step())
        # A placeholder for feeding new learning rates
        self._new_lr = tf.placeholder(
             tf.float32, shape=[], name="new_learning_rate")
        self._lr_update_op = tf.assign(self._lr, self._new_lr)
  
    def _build_rnn(self, inputs, config, is_training):
        # RNN requires time-major
        inputs = tf.transpose(inputs, [1, 0, 2]) # replace batch_size with time
        self._cell = CudnnLSTM(
            num_layers=config.num_layers,
            num_units=config.hidden_size,
            )
        self._cell.build(inputs.get_shape()) 
        outputs, state = self._cell(inputs)
        # Transpose from time-major to batch-major
        outputs = tf.transpose(outputs, [1, 0, 2]) # replace back
        # Reshape from [batch, time, rnnsize] to [batch x time, rnnsize]
        # For computing softmax later
        outputs = tf.reshape(outputs, [-1, config.hidden_size])
        return outputs, state

    def _build_rnn_old_lstm(self, inputs, config, is_training):
        def make_cell():
            cell = BasicLSTMCell(
            config.hidden_size, forget_bias=0.0, state_is_tuple=True,
            reuse=not is_training)
            if is_training and config.keep_prob < 1:
                cell = tf.contrib.rnn.DropoutWrapper(
                    cell, output_keep_prob=config.keep_prob)
            return cell

        cell = tf.contrib.rnn.MultiRNNCell(
            [make_cell() for _ in range(config.num_layers)], state_is_tuple=True)

        self._initial_state = cell.zero_state(config.batch_size, tf.float32)
        state = self._initial_state
        outputs = []
        inputs = tf.unstack(inputs, num=self.num_steps, axis=1)
        outputs, state = tf.nn.static_rnn(cell, inputs,
                                          initial_state=self._initial_state)
        output = tf.reshape(tf.concat(outputs, 1), [-1, config.hidden_size])
        return output, state

    def assign_lr(self, session, lr_value):
        session.run(self._lr_update_op, feed_dict={self._new_lr: lr_value})

    @property
    def input(self):
        return self._input

    @property
    def cost(self):
        return self._cost

    @property
    def lr(self):
        return self._lr

    @property
    def train_op(self):
        return self._train_op

    @property
    def logits(self):
        return self._logits

## Define a training operation for an epoch
This procedure gets the output from the model for each batch.
We need a dictionary with these keys:

* "cost": Reads the propertie `model.cost` that we defined above. 
* "do_op": Perform operation `model.train_op` that applies gradients

After running (calling `session.run()`), the same key will contain the return values.

We can add any key in the dictionary that corresponds to `@property` in the model!

In [13]:
def run_epoch(session, model, do_op=None, verbose=False):
    start_time = time.time()
    costs = 0.0
    iters = 0
    feed_to_model_dict = {
        "cost": model.cost,
    }
    # if an operation is provided, put that in the feed
    if do_op is not None:
        feed_to_model_dict["do_op"] = do_op

    for step in range(model.input.epoch_size):
        # use the session to run, feed the dictionary
        s_out = session.run(feed_to_model_dict)
        # The returned dictionary will contain the information we need
        cost = s_out["cost"]
        # Accumulate cost
        costs += cost
        # Accumulate number of training steps
        iters += model.input.num_steps
        # Print loss periodically
        if verbose and (step+1) % (model.input.epoch_size // 5) == 0:
            print("%.0f%% ppl: %.3f, speed: %.0f char/sec" %
                ((step+1) * 100.0 / model.input.epoch_size, \
                 np.exp(costs/iters), \
                 iters * model.input.batch_size/(time.time() - start_time)))

    return np.exp(costs / iters) # costs are natural log value


## Define a Generator
We will also create a Generator Model to generate new poems. Note that it is much less complicated than the training model.
However, we need to add a procedure to generate output for some steps. 

In [14]:
class MyGeneratorModel(object):
    def __init__(self, config):
        self._input = tf.placeholder(tf.int32, shape=[1], name="_input")
        self.batch_size = 1
        self.num_steps = config.num_steps
        rnn_size = config.hidden_size
        vocab_size = config.vocab_size
        embedding_size = config.embedding_size

        # Embeddings can only exist on CPU
        with tf.device("/cpu:0"):
            embedding_weights = tf.get_variable("embedding", \
                         [vocab_size, embedding_size])
            embed_inputs = tf.nn.embedding_lookup(embedding_weights, self._input)
            embed_inputs = tf.expand_dims(embed_inputs, 0)

        # build RNN using CudnnLSTM
        self._cell = CudnnLSTM(
            num_layers=config.num_layers,
            num_units=config.hidden_size,
            )

        # build final projection layer
        softmax_w = tf.get_variable("softmax_w", [rnn_size, vocab_size])
        softmax_b = tf.get_variable("softmax_b", [vocab_size])

        # Collect a sequence of output word IDs
        self._output_word_ids = []

        # Decode first word
        outputs, state = self._cell(embed_inputs)
        outputs = tf.reshape(outputs, [-1, config.hidden_size])
        logits = tf.nn.xw_plus_b(outputs, softmax_w, softmax_b)
        # Get input for next step
        next_input = tf.argmax(logits, axis=-1)
        next_input = tf.squeeze(next_input)
        self._output_word_ids.append(next_input)
        # Convert next input to word embeddings
        next_input = tf.nn.embedding_lookup(embedding_weights, next_input)
        next_input = tf.reshape(next_input, [1, 1, embedding_size])

        # Feed back to LSTM
        for _ in range(self.num_steps-1):
            outputs, state = self._cell(next_input, state)
            outputs = tf.reshape(outputs, [-1, config.hidden_size])
            logits = tf.nn.xw_plus_b(outputs, softmax_w, softmax_b)
            next_input = tf.argmax(logits, axis=-1)
            next_input = tf.squeeze(next_input)
            self._output_word_ids.append(next_input)

            next_input = tf.nn.embedding_lookup(embedding_weights, next_input)
            next_input = tf.reshape(next_input, [1, 1, embedding_size])

    @property
    def output_word_ids(self):
        return self._output_word_ids


## Define a call to generator
Again we need a decoder to translate word IDs back to words. And we need to define a procedure to communicate with the model. `feed_dict` and `fetches` are two keys to do that.

In [15]:
def decode_text(text, max_len_newline=5):
    words = [reverse_word_index.get(i, "<UNK>") for i in text]
    fixed_width_string = []

    for w_pos in range(len(words)):
        fixed_width_string.append(words[w_pos])
        if (w_pos+1) % max_len_newline == 0:
            fixed_width_string.append('\n')
    return ''.join(fixed_width_string)

def run_generator(session, model, seed_word, config):
  
    feed_to_model_dict = {
        model._input: [seed_word],
    }
    fetch_model_dict = {
        "output_word_ids": model.output_word_ids
    }

    # An example of sending and receiving data from the model
    vals = session.run(fetches=fetch_model_dict, feed_dict=feed_to_model_dict)
    output_word_ids = vals['output_word_ids']

    # Decode to readable words
    print(decode_text([seed_word] + output_word_ids, (config.num_steps+1)//4))
    return

## Main training controller
Finally, we define a controller that:
* Create the model for training
* Create the model for testing, copying from the training model
* Prepare the input data
* Define what to log in the progress of training
* Create a `session` that communicates with computation graph
* Change learning rate optionally
* Get test set results


In [16]:
def main(_):

    with tf.Graph().as_default():
        initializer = tf.random_uniform_initializer(-0.1, 0.1)

        with tf.name_scope("Train"):
            # Create input producer
            train_input = PoemInput(poems_train, config, name="TrainInput")
            # Create the model instance
            with tf.variable_scope("Model", reuse=None, initializer=initializer):
                m = MySeq2SeqModel(is_training=True, config=config, input_=train_input)
            # Add information to logs
            tf.summary.scalar("Training_Loss", m.cost)
            tf.summary.scalar("Learning_Rate", m.lr)

        with tf.name_scope("Test"):
            eval_config = Hparam()
            eval_config.batch_size = 1
            eval_config.num_steps = 20

            # Create another input for test data
            # Note that eval_config was set locally
            test_input = PoemInput(poems_test, eval_config, name="TestInput")
            # Create another model but reuse the variables in the training model
            with tf.variable_scope("Model", reuse=True):
                mtest = MySeq2SeqModel(is_training=False, config=eval_config,
                             input_=test_input)

        with tf.name_scope("Gen"):
            generator_config = Hparam()
            generator_config.batch_size = 1
            generator_config.num_steps = 19
            # Create generator model
            with tf.variable_scope("Model", reuse=True):
                mgenerate = MyGeneratorModel(config=generator_config)

        # Hardware settings
        config_proto = tf.ConfigProto(allow_soft_placement=True)
        # Create a MonitoredTrainingSession that controls the training process
        # Also automatically logs and reports 
        # Note the `checkpoint_dir` setting
        with tf.train.MonitoredTrainingSession(checkpoint_dir="logs", \
                                               config=config_proto, \
                                               log_step_count_steps=-1) as session:
            for i in range(config.num_epochs_to_train):

                # Calculate learning rate decay
                lr_decay = config.lr_decay ** max(i + 1 - config.warmup_epochs, 0.0)
                # Set learning rate
                m.assign_lr(session, config.learning_rate * lr_decay)
                # Print new learning rate
                print("Epoch: %d Learning rate: %.3f" % (i + 1, session.run(m.lr)))
                # Train one epoch and report loss
                train_perplexity = run_epoch(session, m, do_op=m.train_op, verbose=True)
                print("Epoch: %d Train Perplexity: %.3f" % (i + 1, train_perplexity))

          # End of training
          # Evaluate test set performance
            test_perplexity = run_epoch(session, mtest)
            print("Test Perplexity: %.3f" % test_perplexity)

          # Set a seed word and generate new poem
            seed_word = '天'
            run_generator(session, mgenerate, seed_word=word_index[seed_word], config=generator_config)

## Start training
We can actually start training by calling the controller.

In [17]:
main(1)

Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.range(limit).shuffle(limit).repeat(num_epochs)`. If `shuffle=False`, omit the `.shuffle(...)`.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensor_slices(input_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs)`. If `shuffle=False`, omit the `.shuffle(...)`.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensors(tensor).repeat(num_epochs)`.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
input_ source type:  <class 'tensorflow.python.framework.ops.Tensor'>
input_ source shape:  (100, ?)
embed_inputs shape:  (100, ?, 100)
input_ source type:  <class 'tensorflow.python.framework.ops.Tensor'>
input_ sou

InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' with these attrs.  Registered devices: [CPU,XLA_CPU,XLA_GPU], Registered kernels:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]

	 [[node Train/Model/cudnn_lstm_1/CudnnRNNCanonicalToParams (defined at /opt/anaconda/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py:1251)  = CudnnRNNCanonicalToParams[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", num_params=8, rnn_mode="lstm", seed=0, seed2=0, _device="/device:GPU:0"](Train/Model/cudnn_lstm_1/CudnnRNNCanonicalToParams/num_layers, Train/Model/cudnn_lstm_1/CudnnRNNCanonicalToParams/num_units, Train/Model/cudnn_lstm_1/CudnnRNNCanonicalToParams/input_size, Train/Model/cudnn_lstm_1/random_uniform, Train/Model/cudnn_lstm_1/random_uniform_1, Train/Model/cudnn_lstm_1/random_uniform_2, Train/Model/cudnn_lstm_1/random_uniform_3, Train/Model/cudnn_lstm_1/random_uniform_4, Train/Model/cudnn_lstm_1/random_uniform_5, Train/Model/cudnn_lstm_1/random_uniform_6, Train/Model/cudnn_lstm_1/random_uniform_7, Train/Model/cudnn_lstm_1/Const, Train/Model/cudnn_lstm_1/Const_1, Train/Model/cudnn_lstm_1/Const_2, Train/Model/cudnn_lstm_1/Const_3, Train/Model/cudnn_lstm_1/Const_4, Train/Model/cudnn_lstm_1/Const_5, Train/Model/cudnn_lstm_1/Const_6, Train/Model/cudnn_lstm_1/Const_7)]]

Caused by op 'Train/Model/cudnn_lstm_1/CudnnRNNCanonicalToParams', defined at:
  File "/opt/anaconda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/anaconda/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/anaconda/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/opt/anaconda/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/opt/anaconda/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 505, in start
    self.io_loop.start()
  File "/opt/anaconda/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "/opt/anaconda/lib/python3.6/asyncio/base_events.py", line 438, in run_forever
    self._run_once()
  File "/opt/anaconda/lib/python3.6/asyncio/base_events.py", line 1451, in _run_once
    handle._run()
  File "/opt/anaconda/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/opt/anaconda/lib/python3.6/site-packages/tornado/ioloop.py", line 758, in _run_callback
    ret = callback()
  File "/opt/anaconda/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/anaconda/lib/python3.6/site-packages/tornado/gen.py", line 1233, in inner
    self.run()
  File "/opt/anaconda/lib/python3.6/site-packages/tornado/gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "/opt/anaconda/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 370, in dispatch_queue
    yield self.process_one()
  File "/opt/anaconda/lib/python3.6/site-packages/tornado/gen.py", line 346, in wrapper
    runner = Runner(result, future, yielded)
  File "/opt/anaconda/lib/python3.6/site-packages/tornado/gen.py", line 1080, in __init__
    self.run()
  File "/opt/anaconda/lib/python3.6/site-packages/tornado/gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "/opt/anaconda/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 357, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/opt/anaconda/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/opt/anaconda/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 267, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/opt/anaconda/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/opt/anaconda/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 534, in execute_request
    user_expressions, allow_stdin,
  File "/opt/anaconda/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/opt/anaconda/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 294, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/opt/anaconda/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/opt/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2843, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/opt/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2869, in _run_cell
    return runner(coro)
  File "/opt/anaconda/lib/python3.6/site-packages/IPython/core/async_helpers.py", line 67, in _pseudo_sync_runner
    coro.send(None)
  File "/opt/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3044, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/opt/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3215, in run_ast_nodes
    if (yield from self.run_code(code, result)):
  File "/opt/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3291, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-17-0a902aad3aed>", line 1, in <module>
    main(1)
  File "<ipython-input-16-0d7053a87609>", line 11, in main
    m = MySeq2SeqModel(is_training=True, config=config, input_=train_input)
  File "<ipython-input-12-70e2c2bb3d52>", line 28, in __init__
    output, _ = self._build_rnn(embed_inputs, config, is_training)
  File "<ipython-input-12-70e2c2bb3d52>", line 86, in _build_rnn
    self._cell.build(inputs.get_shape())
  File "/opt/anaconda/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 352, in build
    opaque_params_t = self._canonical_to_opaque(weights, biases)
  File "/opt/anaconda/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 474, in _canonical_to_opaque
    direction=self._direction)
  File "/opt/anaconda/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 1251, in cudnn_rnn_canonical_to_opaque_params
    name=name)
  File "/opt/anaconda/lib/python3.6/site-packages/tensorflow/python/ops/gen_cudnn_rnn_ops.py", line 642, in cudnn_rnn_canonical_to_params
    name=name)
  File "/opt/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/opt/anaconda/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/opt/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/opt/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' with these attrs.  Registered devices: [CPU,XLA_CPU,XLA_GPU], Registered kernels:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]

	 [[node Train/Model/cudnn_lstm_1/CudnnRNNCanonicalToParams (defined at /opt/anaconda/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py:1251)  = CudnnRNNCanonicalToParams[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", num_params=8, rnn_mode="lstm", seed=0, seed2=0, _device="/device:GPU:0"](Train/Model/cudnn_lstm_1/CudnnRNNCanonicalToParams/num_layers, Train/Model/cudnn_lstm_1/CudnnRNNCanonicalToParams/num_units, Train/Model/cudnn_lstm_1/CudnnRNNCanonicalToParams/input_size, Train/Model/cudnn_lstm_1/random_uniform, Train/Model/cudnn_lstm_1/random_uniform_1, Train/Model/cudnn_lstm_1/random_uniform_2, Train/Model/cudnn_lstm_1/random_uniform_3, Train/Model/cudnn_lstm_1/random_uniform_4, Train/Model/cudnn_lstm_1/random_uniform_5, Train/Model/cudnn_lstm_1/random_uniform_6, Train/Model/cudnn_lstm_1/random_uniform_7, Train/Model/cudnn_lstm_1/Const, Train/Model/cudnn_lstm_1/Const_1, Train/Model/cudnn_lstm_1/Const_2, Train/Model/cudnn_lstm_1/Const_3, Train/Model/cudnn_lstm_1/Const_4, Train/Model/cudnn_lstm_1/Const_5, Train/Model/cudnn_lstm_1/Const_6, Train/Model/cudnn_lstm_1/Const_7)]]


We can see the training process shown here. Observe that training loss keeps decreasing, which means that the model is actually learning. 

Also, due to the speedup of CudnnLSTM, the speed can be very fast (> 100,000 w/s). Using basic LSTM can only achieve ~6,000 w/s.

If you are running this script locally, start `tensorboard` and point it to the `logs` directory will allow you to see the loss plot over time. We will not be able to show that easily in Colab environment.

You can also continue training by calling the controller again. Try this later and see if the poems generated gets better over time.

## Clear previous output

Tensorflow will automatically load previous models if you specify a path for the `session`. However, that will be a problem if you change some parts of the model. e.g., change embedding size, LSTM size, or number of layers.

You will see something like 
```
INFO:tensorflow:Restoring parameters from logs/model.ckpt-4465
...
InvalidArgumentError: Assign requires shapes of both tensors to match.
```
Always remember to clear output directory if you are experimenting with different model structures!

In [None]:
!rm -R logs

# Summary
What we learned today:
1. Preprocessing for language modeling data
    * Create a dictionary that maps words to unique IDs
    * Convert words to ID
    * Reshape sequences to unified lengths
    * Create a helper to produce data
2. Building a model using tensorflow
    * Hyperparameters
    * Training operation
    * Testing operation
    * Control function
3. Training and evaluation
    * Observe loss
    * Evaluate on test set

You are now capable of building a deep learning model for a basic seq2seq task using **tensorflow**! 

However, tensorflow is extremely complicated (but powerful). There are numerous examples online for you to explore.

## Extension

Can you think of anything else that may also be learned using this model? 

# Appendix: connect your Google Drive to Colab for uploading your data

First, copy the file into Google Drive. Then run the following code to link your Drive to this notebook. Follow the link in the output.

In [None]:
from google.colab import drive
drive.mount('/gdrive')

Copy (`cp`) the file from `/gdrive` to this server.

In [None]:
!cp /gdrive/My\ Drive/Colab\ Notebooks/poetry.txt ./

In [None]:
!cp /gdrive/My\ Drive/Colab\ Notebooks/Book*.txt ./

# Part III: Attention


## Task: Translation
Hints on how to add attention in seq2seq model in order to perform translation. 
### CWMT corpus
This is a Chinese-English translation dataset.

Visit source website to download manually:
http://nlp.nju.edu.cn/cwmt-wmt/

Take a look at some examples:

In [None]:
import time
import numpy as np
import tensorflow as tf
from tensorflow.contrib.cudnn_rnn import CudnnLSTM
from tensorflow.contrib.rnn import BasicLSTMCell, MultiRNNCell
from tensorflow.nn import embedding_lookup, dropout

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

In [None]:
c_sents = [ss.strip() for ss in open('Book14_cn.txt').readlines()]

In [None]:
c_sents[0]

In [None]:
e_sents=[ss.strip() for ss in open('Book14_en.txt').readlines()]


In [None]:
e_sents[0]

In [None]:
c_tokenizer = Tokenizer(num_words=None, lower=False, char_level=True)
# Create word to ID dictionary
c_tokenizer.fit_on_texts(c_sents)
# Get dictionary
c_word_index = c_tokenizer.word_index
# Fix word to ID
c_word_index = {c:i+1 for c, i in c_word_index.items()}
c_word_index["<PAD>"] = 0
c_word_index["<UNK>"] = 1
c_tokenizer.word_index = c_word_index
c_reverse_word_index = dict([(v, k) for (k, v) in c_word_index.items()])

In [None]:
# sort word index by ID
for (w,i) in sorted(c_word_index.items(), key=lambda w: w[1]):
# print some words to check if there are errors!
  if i > 10 and i < len(c_word_index)-5: continue
  print("{} {}".format(w,i))

In [None]:
e_vocab_size = 20000
e_tokenizer = Tokenizer(num_words=e_vocab_size, lower=True, oov_token="<UNK>")
# Create word to ID dictionary
e_tokenizer.fit_on_texts(e_sents)
# Get dictionary
e_word_index = e_tokenizer.word_index
# Fix word to ID
e_word_index = {e:i+1 for e, i in e_word_index.items() if i < e_vocab_size-1}
e_word_index["<PAD>"] = 0
e_word_index["<UNK>"] = 1
e_tokenizer.word_index = e_word_index
e_reverse_word_index = dict([(v, k) for (k, v) in e_word_index.items()])
# sort word index by ID
for (w,i) in sorted(e_word_index.items(), key=lambda w: w[1]):
# print some words to check if there are errors!
  if i > 10 and i < len(e_word_index)-5: continue
  print("{} {}".format(w,i))

In [None]:
c_sents = c_tokenizer.texts_to_sequences(c_sents)
e_sents = e_tokenizer.texts_to_sequences(e_sents)
c_sents = pad_sequences(c_sents,value=c_word_index["<PAD>"], padding='post', truncating='post', maxlen=10)
e_sents = pad_sequences(e_sents,value=e_word_index["<PAD>"], padding='post', truncating='post', maxlen=10)
train_data = (c_sents[:-5], e_sents[:-5])
test_data = (c_sents[-5:], e_sents[-5:])

In [None]:
train_data[0][0], train_data[0][1]

### Change hyperparameters
* Add separate vocabulary sized for English and Chinese

In [None]:
# Define hyperparameters
class Hparam(object):
  # ...
  source_vocab_size = len(c_word_index)
  target_vocab_size = len(e_word_index)
  # ...

### Prepare input for translation

* Modify `class PoemInput(object)` to create different source and target sentences. Most importantly, change the final part.
* Use `pad_sequences` to pad both Chinese and English sentences



In [None]:
class TranslationInput(object):
  def __init__(self, data, config, name=None):
    self.batch_size = batch_size = config.batch_size
    self.num_steps = config.num_steps
    self.sources, self.targets = self.input_producer(
        data, batch_size, name=name)

  def input_producer(self, raw_data, batch_size, name=None):
    source_data = tf.convert_to_tensor(raw_data[0], name="source_data", dtype=tf.int32)
    target_data = tf.convert_to_tensor(raw_data[1], name="target_data", dtype=tf.int32)

    num_batches = len(raw_data[0]) // self.batch_size
    i = tf.train.range_input_producer(num_batches, shuffle=False).dequeue()
    x = source_data[i*self.batch_size:(i+1)*self.batch_size, :]
    y = target_data[i*self.batch_size:(i+1)*self.batch_size, :]
    return x, y

## Build Translation Model
This is also a seq2seq model, with some major differences:
* Use one LSTM as the encoder
* Add another as the decoder

```python
def _build_rnn_encoder
    # RNN requires time-major
    inputs = tf.transpose(inputs, [1, 0, 2])
    self._enccell = CudnnLSTM(
        num_layers=config.num_layers,
        num_units=config.hidden_size,
        name=name)
    outputs, state = self._enccell(inputs)
    return outputs, state

def _build_rnn_decoder
    self._deccell = CudnnLSTM(
            num_layers=config.num_layers,
            num_units=config.hidden_size,
            name=name)

    outputs, state = self._deccell(inputs)
    # Transpose from time-major to batch-major
    outputs = tf.transpose(outputs, [1, 0, 2])
    return outputs, state
 ```

### Modify training controller

Use `TranslationInput` and `MyTranslationModel`.

```python
train_input = TranslationInput(train_data, config, name="TrainInput")

m = MyTranslationModel(is_training=True, config=config, input_=train_input)

```

### Snippet for adding attention mechanism in the model
* Calculate attention score
* Normalize score
* Calculate context vector = *attention weighted sum*
* Concatenate context vector with input
* Use decoder to decode next step

In [None]:
# Require:
# hidden: decoder hidden (memory)
# enc_output: encoder output
        
# hidden shape == (batch_size, hidden size)
# hidden_with_time_axis shape == (batch_size, 1, hidden size)
# we are doing this to perform addition to calculate the score
hidden_with_time_axis = tf.expand_dims(hidden, 1)

# enc_output shape == (batch_size, max_length, hidden_size)
# score shape == (batch_size, max_length, hidden_size)
score = tf.nn.tanh(W1(enc_output) + W2(hidden_with_time_axis))

# attention_weights shape == (batch_size, max_length, 1)
# we get 1 at the last axis because we are applying score to V
attention_weights = tf.nn.softmax(V(score), axis=1)

# context_vector shape after sum == (batch_size, hidden_size)
context_vector = attention_weights * enc_output
context_vector = tf.reduce_sum(context_vector, axis=1)

# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(input_)

# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

# passing the concatenated vector to the decoder
output, state = self.decoder(x)

# output shape == (batch_size * 1, hidden_size)
output = tf.reshape(output, (-1, output.shape[2]))

# output shape == (batch_size * 1, vocab)
x = SoftmaxLayer(output)

# Output:
# x: decoder output
# state: decoder state
# attention_weights: weights over encoder output at one time