# JokeNet
This notebook outlines a Recurrent Neural Net using LSTM cells to make jokes.

For a comprehensive intro to using Recurrent Neural Nets and a few fun examples, look at this blog post by Andrei Karpathy - Tesla's AI lead. https://karpathy.github.io/2015/05/21/rnn-effectiveness/

The training data is an assortment from reddit, wocka.com and stupidstuff.org.

It's also worthy to note that most of this code comes from the [tensorflow tutorial on RNN's](https://www.tensorflow.org/tutorials/recurrent) and is based off of the code in tf_models.

For further information about the algorithm and its implementation, see here:
> (Zaremba, et. al.) Recurrent Neural Network Regularization
> http://arxiv.org/abs/1409.2329


Disclaimer: these jokes are scraped from the internet and so some may be offensive.

In [1]:
import time
import json

import numpy as np
import tensorflow as tf

import util

from tensorflow.python.client import device_lib

  from ._conv import register_converters as _register_converters


### (Re)Scrape the Dataset

Scraping the dataset occurs in the `jokes_dataset` folder. Re-run the scraping with the commands:
```
python reddit.py
python wocka.py
python stupidstuff.py
```

In [2]:
from lxml import html
import requests
import logging
import re


def extract_joke(id):
    """Download and parse a single joke."""
    
    re_category_rating = re.compile(r"\s*Category: (.*[A-z])\s*Rating: (.*\d)\s*")

    url_base = "http://stupidstuff.org/jokes/joke.htm?jokeid={}"
    response = requests.get(url_base.format(id))

    tree = html.fromstring(response.content)
    content = tree.xpath('//table[@bgcolor="#ffffff" and @width="470"]//table[@class="scroll"]//td')[0]
    category_rating_cells = content.xpath('//table[@bgcolor="#ffffff"]//table[@class="bkline"]//td/b[text()="Category: "]/..')
    #print(category_rating_cell)
    
    # all html nodes in content, but not plaintext
    crap = content.xpath('./child::node()[not(self::text()) and not(self::br)]') 
    
    for node in crap:
        content.remove(node)

    body_text = content.text_content().strip()
    joke_body = body_text

    cell_text = category_rating_cells[0].text_content()

    match = re_category_rating.search(cell_text)
    category = match.group(1)
    rating = float(match.group(2))

    return joke_body, category, rating


def save_stupid_stuff_jokes():
    """Parse and save all jokes to stupidstuff.json file"""
    jokes = []

    save_frequency = 100 # save after every 100 IDs
    max_id = 3773
    for id in range(1, max_id+1): #19000
        try:
            body, category, rating = extract_joke(id)

            joke = {"id": id, "category": category, "body": body, "rating": rating}
            jokes.append(joke)
            print("ID {} success: [{}]".format(id, category))
        except Exception as ex:
            print("ID {} failed: ".format(id))
            logging.error(ex)
            raise ex

        if id % save_frequency == 0 or id == max_id:
            with open("stupidstuff.json", "w") as f:
                json.dump(jokes, f, indent=4, sort_keys=True)

# Data Preprocessing functions

The data is formatted into json objects with various IDs:
    
reddit_jokes.json
```json
{
    "title": "My boss said to me, \"you're the worst train driver ever. How many have you derailed this year?\"",
    "body": "I said, \"I'm not sure; it's hard to keep track.\"",
    "id": "5tyytx",
    "score": 3
}
```

stupidstuff.json
```json
{
    "category": "Blonde Jokes",
    "body": "A blonde is walking down the street with her blouse open, exposing one of her breasts. A nearby policeman approaches her and remarks, \"Ma'am, are you aware that I could cite you for indecent exposure?\" \"Why, officer?\" asks the blonde. \"Because your blouse is open and your breast is exposed.\" \"Oh my goodness,\" exclaims the blonde, \"I must have left my baby on the bus!\"",
    "id": 14,
    "rating": 3.5
}
```

wocka.json
```json
{
    "title": "Infants vs Adults",
    "body": "Do infants enjoy infancy as much as adults enjoy adultery?",
    "category": "One Liners",
    "id": 17
}

```

The data must be processed into training, testing and validation sets. It also must be shuffled around.

In [3]:
from random import shuffle
import os


def read_reddit(filename):
    '''Parse reddit jokes which often include the title into the joke.'''
    jokes = []

    with open("rnn_data/reddit_jokes.json","r") as f:
        data = json.loads(f.read())

    for joke in data:
        jokes.append(joke["title"] + "\n" + joke["body"])
    
    return jokes

def read_other(filename):
    '''Parse other jokes from stupidstuff or wocka'''
    jokes = []

    with open(filename,"r") as f:
        data = json.loads(f.read())

    for joke in data:
        jokes.append(joke["body"])
    
    return jokes


def parse_raw_data(data_path=""):
    """Collate data objects and randomly shuffle into train, test, validate splits"""
    
    jokes = []

    jokes += read_reddit(os.path.join(data_path, "reddit_jokes.json"))
    jokes += read_other(os.path.join(data_path, "wocka.json"))
    jokes += read_other(os.path.join(data_path, "stupidstuff.json"))

    shuffle(jokes)
    
    jokes_length = len(jokes)
    train, test, validate = jokes[0:int(0.7*jokes_length)], \
                            jokes[int(0.7*jokes_length)+1:int(0.85*jokes_length)], \
                            jokes[int(0.85*jokes_length)+1:]

    with open("rnn_data/train.txt","w") as f:
        for line in train:
            f.write(line + "\n\n")
    
    with open("rnn_data/test.txt","w") as f:
        for line in test:
            f.write(line + "\n\n")
            
    with open("rnn_data/validate.txt","w") as f:
        for line in validate:
            f.write(line + "\n\n")
    
            
# parse_raw_data(data_path="rnn_data") # uncomment if you want to reshuffle the data

## Further processing

The data is then turned into lists of numbers representing the words. These numbers can then be trained on by the RNN.

In [4]:
import collections
import sys


def _read_words(filename):
    with tf.gfile.GFile(filename, "r") as f:
        return f.read().replace("\n\n", "<eos>").replace("\r", "").split()


def _build_vocab(filename, maxlen=-1):
    data = _read_words(filename)

    counter = collections.Counter(data)
    count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))

    words, _ = list(zip(*count_pairs))
    word_to_id = dict(zip(words[0:maxlen], range(len(words[0:maxlen]))))

    return word_to_id


def get_vocab(filename):
    data = _read_words(filename)

    counter = collections.Counter(data)
    count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))

    words, _ = list(zip(*count_pairs))

    return words


def _file_to_word_ids(filename, word_to_id):
    data = _read_words(filename)
    return [word_to_id[word] for word in data if word in word_to_id]


def ptb_raw_data(data_path=None):
    """Load PTB raw data from data directory "data_path".

    Reads PTB text files, converts strings to integer ids,
    and performs mini-batching of the inputs.

    The PTB dataset comes from Tomas Mikolov's webpage:

    http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz

    Args:
    data_path: string path to the directory where simple-examples.tgz has
      been extracted.

    Returns:
    tuple (train_data, valid_data, test_data, vocabulary)
    where each of the data objects can be passed to PTBIterator.
    """

    train_path = os.path.join(data_path, "train.txt")
    valid_path = os.path.join(data_path, "test.txt")
    test_path = os.path.join(data_path, "validate.txt")

    word_to_id = _build_vocab(train_path, 9999)
    train_data = _file_to_word_ids(train_path, word_to_id)
    valid_data = _file_to_word_ids(valid_path, word_to_id)
    test_data = _file_to_word_ids(test_path, word_to_id)
    vocabulary = len(word_to_id)
    return train_data, valid_data, test_data, vocabulary



def ptb_producer(raw_data, batch_size, num_steps, name=None):
    """Iterate on the raw PTB data.

    This chunks up raw_data into batches of examples and returns Tensors that
    are drawn from these batches.

    Args:
    raw_data: one of the raw data outputs from ptb_raw_data.
    batch_size: int, the batch size.
    num_steps: int, the number of unrolls.
    name: the name of this operation (optional).

    Returns:
    A pair of Tensors, each shaped [batch_size, num_steps]. The second element
    of the tuple is the same data time-shifted to the right by one.

    Raises:
    tf.errors.InvalidArgumentError: if batch_size or num_steps are too high.
    """
    with tf.name_scope(name, "PTBProducer", [raw_data, batch_size, num_steps]):
        raw_data = tf.convert_to_tensor(raw_data, name="raw_data", dtype=tf.int32)

    data_len = tf.size(raw_data)
    batch_len = data_len // batch_size
    data = tf.reshape(raw_data[0 : batch_size * batch_len],
                      [batch_size, batch_len])

    epoch_size = (batch_len - 1) // num_steps
    assertion = tf.assert_positive(
        epoch_size,
        message="epoch_size == 0, decrease batch_size or num_steps")
    with tf.control_dependencies([assertion]):
        epoch_size = tf.identity(epoch_size, name="epoch_size")

    i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue()
    x = tf.strided_slice(data, [0, i * num_steps],
                         [batch_size, (i + 1) * num_steps])
    x.set_shape([batch_size, num_steps])
    y = tf.strided_slice(data, [0, i * num_steps + 1],
                         [batch_size, (i + 1) * num_steps + 1])
    y.set_shape([batch_size, num_steps])
    return x, y




# Creating the RNN Model

Tensorflow wraps up the math for us but we have to look at a multitude of `tf.this.that.theother`. Don't let this scare you though - in fact the entirety of a neural network could be written in [11 lines of python without tensorflow](https://iamtrask.github.io/2015/07/12/basic-python-network/). Tensorflow allows us to do fancier things faster - at least thats the idea.

There are 3 supported model configurations:

| config | epochs | train | valid  | test
|-|-|-|-|-|
| small  | 13     | 37.99 | 121.39 | 115.91
| medium | 39     | 48.45 |  86.16 |  82.07
| large  | 55     | 37.87 |  82.62 |  78.29

That is ways in which we want to train this model - how long (epochs), how fast (learning rate) and many other tweaks of the hyper-parameters. It's generally noted that the longer you train a model (or the larger the config in this case) the better it performs.

The hyperparameters used in this model:
- init_scale - the initial scale of the weights
- learning_rate - the initial value of the learning rate
- max_grad_norm - the maximum permissible norm of the gradient
- num_layers - the number of LSTM layers
- num_steps - the number of unrolled steps of LSTM
- hidden_size - the number of LSTM units
- max_epoch - the number of epochs trained with the initial learning rate
- max_max_epoch - the total number of epochs for training
- keep_prob - the probability of keeping weights in the dropout layer
- lr_decay - the decay of the learning rate for each epoch after "max_epoch"
- batch_size - the batch size
- rnn_mode - the low level implementation of lstm cell: one of CUDNN, BASIC, or BLOCK, representing cudnn_lstm, basic_lstm, and lstm_block_cell classes.

In [5]:
flags = tf.flags
logging = tf.logging

flags.DEFINE_string('model', 'small',
                    'A type of model. Possible options are: small, medium, large.'
                    )
flags.DEFINE_string('data_path', None,
                    'Where the training/test data is stored.')
flags.DEFINE_string('save_path', None, 'Model output directory.')
flags.DEFINE_bool('use_fp16', False,
                  'Train using 16-bit floats instead of 32bit floats')
flags.DEFINE_integer('num_gpus', 1,
                     'If larger than 1, Grappler AutoParallel optimizer will create multiple training replicas with each GPU running one replica.'
                     )

flags.DEFINE_string('rnn_mode', None,
                    'The low level implementation of lstm cell: one of CUDNN, BASIC, and BLOCK, representing cudnn_lstm, basic_lstm, and lstm_block_cell classes.'
                    )

FLAGS = flags.FLAGS
BASIC = 'basic'
CUDNN = 'cudnn'
BLOCK = 'block'

In [6]:
def data_type():
    return tf.float32 #(tf.float16 if FLAGS.use_fp16 else tf.float32)


class PTBInput(object):

    """The input data."""

    def __init__(
        self,
        config,
        data,
        name=None,
        ):
        self.batch_size = batch_size = config.batch_size
        self.num_steps = num_steps = config.num_steps
        self.epoch_size = (len(data) // batch_size - 1) // num_steps
        (self.input_data, self.targets) = ptb_producer(data,
                batch_size, num_steps, name=name)


class PTBModel(object):

    """The PTB model."""

    def __init__(
        self,
        is_training,
        config,
        input_,
        ):
        self._is_training = is_training
        self._input = input_
        self._rnn_params = None
        self._cell = None
        self.batch_size = input_.batch_size
        self.num_steps = input_.num_steps
        size = config.hidden_size
        vocab_size = config.vocab_size

        with tf.device('/cpu:0'):
            embedding = tf.get_variable('embedding', [vocab_size,
                    size], dtype=tf.float32) #data_type()
            inputs = tf.nn.embedding_lookup(embedding,
                    input_.input_data)

        if is_training and config.keep_prob < 1:
            inputs = tf.nn.dropout(inputs, config.keep_prob)

        (output, state) = self._build_rnn_graph(inputs, config,
                is_training)

        softmax_w = tf.get_variable('softmax_w', [size, vocab_size],
                                    dtype=tf.float32) #data_type()
        softmax_b = tf.get_variable('softmax_b', [vocab_size],
                                    dtype=tf.float32) #data_type()
        logits = tf.nn.xw_plus_b(output, softmax_w, softmax_b)

     # Reshape logits to be a 3-D tensor for sequence loss

        logits = tf.reshape(logits, [self.batch_size, self.num_steps,
                            vocab_size])
        
        self._output_probs = tf.nn.softmax(logits)

    # Use the contrib sequence loss and average over the batches

        loss = tf.contrib.seq2seq.sequence_loss(logits, input_.targets,
                tf.ones([self.batch_size, self.num_steps],
                dtype=data_type()), average_across_timesteps=False,
                average_across_batch=True)

    # Update the cost

        self._cost = tf.reduce_sum(loss)
        self._final_state = state

        if not is_training:
            return

        self._lr = tf.Variable(0.0, trainable=False)
        tvars = tf.trainable_variables()
        (grads, _) = tf.clip_by_global_norm(tf.gradients(self._cost,
                tvars), config.max_grad_norm)
        optimizer = tf.train.GradientDescentOptimizer(self._lr)
        self._train_op = optimizer.apply_gradients(zip(grads, tvars),
                global_step=tf.train.get_or_create_global_step())

        self._new_lr = tf.placeholder(tf.float32, shape=[],
                name='new_learning_rate')
        self._lr_update = tf.assign(self._lr, self._new_lr)

    def _build_rnn_graph(
        self,
        inputs,
        config,
        is_training,
        ):
        if config.rnn_mode == CUDNN:
            return self._build_rnn_graph_cudnn(inputs, config,
                    is_training)
        else:
            return self._build_rnn_graph_lstm(inputs, config,
                    is_training)

    def _build_rnn_graph_cudnn(
        self,
        inputs,
        config,
        is_training,
        ):
        """Build the inference graph using CUDNN cell."""

        inputs = tf.transpose(inputs, [1, 0, 2])
        self._cell = \
            tf.contrib.cudnn_rnn.CudnnLSTM(num_layers=config.num_layers,
                num_units=config.hidden_size,
                input_size=config.hidden_size, dropout=(1
                - config.keep_prob if is_training else 0))
        params_size_t = self._cell.params_size()
        self._rnn_params = tf.get_variable('lstm_params',
                initializer=tf.random_uniform([params_size_t],
                -config.init_scale, config.init_scale),
                validate_shape=False)
        c = tf.zeros([config.num_layers, self.batch_size,
                     config.hidden_size], tf.float32)
        h = tf.zeros([config.num_layers, self.batch_size,
                     config.hidden_size], tf.float32)
        self._initial_state = (tf.contrib.rnn.LSTMStateTuple(h=h, c=c),
                               )
        (outputs, h, c) = self._cell(inputs, h, c, self._rnn_params,
                is_training)
        outputs = tf.transpose(outputs, [1, 0, 2])
        outputs = tf.reshape(outputs, [-1, config.hidden_size])
        return (outputs, (tf.contrib.rnn.LSTMStateTuple(h=h, c=c), ))

    def _get_lstm_cell(self, config, is_training):
        if config.rnn_mode == BASIC:
            return tf.contrib.rnn.BasicLSTMCell(config.hidden_size,
                    forget_bias=0.0, state_is_tuple=True,
                    reuse=not is_training)
        if config.rnn_mode == BLOCK:
            return tf.contrib.rnn.LSTMBlockCell(config.hidden_size,
                    forget_bias=0.0)
        raise ValueError('rnn_mode %s not supported' % config.rnn_mode)

    def _build_rnn_graph_lstm(
        self,
        inputs,
        config,
        is_training,
        ):
        """Build the inference graph using canonical LSTM cells."""

    # Slightly better results can be obtained with forget gate biases
    # initialized to 1 but the hyperparameters of the model would need to be
    # different than reported in the paper.

        def make_cell():
            cell = self._get_lstm_cell(config, is_training)
            if is_training and config.keep_prob < 1:
                cell = tf.contrib.rnn.DropoutWrapper(cell,
                        output_keep_prob=config.keep_prob)
            return cell

        cell = tf.contrib.rnn.MultiRNNCell([make_cell() for _ in
                range(config.num_layers)], state_is_tuple=True)

        self._initial_state = cell.zero_state(config.batch_size,
                data_type())
        state = self._initial_state

    # Simplified version of tf.nn.static_rnn().
    # This builds an unrolled LSTM for tutorial purposes only.
    # In general, use tf.nn.static_rnn() or tf.nn.static_state_saving_rnn().
    #
    # The alternative version of the code below is:
    #
    # inputs = tf.unstack(inputs, num=self.num_steps, axis=1)
    # outputs, state = tf.nn.static_rnn(cell, inputs,
    #                                   initial_state=self._initial_state)

        outputs = []
        with tf.variable_scope('RNN'):
            for time_step in range(self.num_steps):
                if time_step > 0:
                    tf.get_variable_scope().reuse_variables()
                (cell_output, state) = cell(inputs[:, time_step, :],
                        state)
                outputs.append(cell_output)
        output = tf.reshape(tf.concat(outputs, 1), [-1,
                            config.hidden_size])
        return (output, state)

    def assign_lr(self, session, lr_value):
        session.run(self._lr_update, feed_dict={self._new_lr: lr_value})

    def export_ops(self, name):
        """Exports ops to collections."""

        self._name = name
        ops = {util.with_prefix(self._name, 'cost'): self._cost}
        if self._is_training:
            ops.update(lr=self._lr, new_lr=self._new_lr,
                       lr_update=self._lr_update)
            if self._rnn_params:
                ops.update(rnn_params=self._rnn_params)
        for (name, op) in ops.items():
            tf.add_to_collection(name, op)
        self._initial_state_name = util.with_prefix(self._name,
                'initial')
        self._final_state_name = util.with_prefix(self._name, 'final')
        util.export_state_tuples(self._initial_state,
                                 self._initial_state_name)
        util.export_state_tuples(self._final_state,
                                 self._final_state_name)

    def import_ops(self):
        """Imports ops from collections."""

        if self._is_training:
            self._train_op = tf.get_collection_ref('train_op')[0]
            self._lr = tf.get_collection_ref('lr')[0]
            self._new_lr = tf.get_collection_ref('new_lr')[0]
            self._lr_update = tf.get_collection_ref('lr_update')[0]
            rnn_params = tf.get_collection_ref('rnn_params')
            if self._cell and rnn_params:
                params_saveable = \
                    tf.contrib.cudnn_rnn.RNNParamsSaveable(self._cell,
                        self._cell.params_to_canonical,
                        self._cell.canonical_to_params, rnn_params,
                        base_variable_scope='Model/RNN')
                tf.add_to_collection(tf.GraphKeys.SAVEABLE_OBJECTS,
                        params_saveable)
        self._cost = tf.get_collection_ref(util.with_prefix(self._name,
                'cost'))[0]
        num_replicas = (FLAGS.num_gpus if self._name == 'Train' else 1)
        self._initial_state = \
            util.import_state_tuples(self._initial_state,
                self._initial_state_name, num_replicas)
        self._final_state = util.import_state_tuples(self._final_state,
                self._final_state_name, num_replicas)

    @property
    def input(self):
        return self._input

    @property
    def initial_state(self):
        return self._initial_state

    @property
    def cost(self):
        return self._cost

    @property
    def final_state(self):
        return self._final_state

    @property
    def lr(self):
        return self._lr

    @property
    def train_op(self):
        return self._train_op

    @property
    def initial_state_name(self):
        return self._initial_state_name

    @property
    def final_state_name(self):
        return self._final_state_name
    
    @property
    def output_probs(self):
        return self._output_probs

In [7]:
class SmallConfig(object):
    """Small config."""
    init_scale = 0.1
    learning_rate = 1.0
    max_grad_norm = 5
    num_layers = 2
    num_steps = 20
    hidden_size = 200
    max_epoch = 4
    max_max_epoch = 13
    keep_prob = 1.0
    lr_decay = 0.5
    batch_size = 20
    vocab_size = 10000
    rnn_mode = BLOCK


class MediumConfig(object):
    """Medium config."""
    init_scale = 0.05
    learning_rate = 1.0
    max_grad_norm = 5
    num_layers = 2
    num_steps = 35
    hidden_size = 650
    max_epoch = 6
    max_max_epoch = 39
    keep_prob = 0.5
    lr_decay = 0.8
    batch_size = 20
    vocab_size = 10000
    rnn_mode = BLOCK


class LargeConfig(object):
    """Large config."""
    init_scale = 0.04
    learning_rate = 1.0
    max_grad_norm = 10
    num_layers = 2
    num_steps = 35
    hidden_size = 1500
    max_epoch = 14
    max_max_epoch = 55
    keep_prob = 0.35
    lr_decay = 1 / 1.15
    batch_size = 20
    vocab_size = 10000
    rnn_mode = BLOCK


class TestConfig(object):
    """Tiny config, for testing."""
    init_scale = 0.1
    learning_rate = 1.0
    max_grad_norm = 1
    num_layers = 1
    num_steps = 2
    hidden_size = 2
    max_epoch = 1
    max_max_epoch = 1
    keep_prob = 1.0
    lr_decay = 0.5
    batch_size = 20
    vocab_size = 10000
    rnn_mode = BLOCK

In [8]:
def run_epoch(
    session,
    model,
    eval_op=None,
    verbose=False,
    ):
    """Runs the model on the given data."""

    start_time = time.time()
    costs = 0.0
    iters = 0
    state = session.run(model.initial_state)

    fetches = {'cost': model.cost, 'final_state': model.final_state}
    if eval_op is not None:
        fetches['eval_op'] = eval_op

    for step in range(model.input.epoch_size):
        feed_dict = {}
        for (i, (c, h)) in enumerate(model.initial_state):
            feed_dict[c] = state[i].c
            feed_dict[h] = state[i].h

        vals = session.run(fetches, feed_dict)
        cost = vals['cost']
        state = vals['final_state']

        costs += cost
        iters += model.input.num_steps

        if verbose and step % (model.input.epoch_size // 10) == 10:
            print('%.3f perplexity: %.3f speed: %.0f wps' % (step * 1.0
                  / model.input.epoch_size, np.exp(costs / iters),
                  iters * model.input.batch_size * max(1,
                  FLAGS.num_gpus) / (time.time() - start_time)))

    return np.exp(costs / iters)


def get_config():
    """Get model config."""

    config = None
    if FLAGS.model == 'small':
        config = SmallConfig()
    elif FLAGS.model == 'medium':
        config = MediumConfig()
    elif FLAGS.model == 'large':
        config = LargeConfig()
    elif FLAGS.model == 'test':
        config = TestConfig()
    elif FLAGS.model == 'generate':
        config = SmallGenConfig()
    else:
        raise ValueError('Invalid model: %s', FLAGS.model)
    if FLAGS.rnn_mode:
        config.rnn_mode = FLAGS.rnn_mode
    if FLAGS.num_gpus != 1 or tf.__version__ < '1.3.0':
        config.rnn_mode = BASIC
    return config


def main(_):
    if not FLAGS.data_path:
        raise ValueError("Must set --data_path to PTB data directory")
    gpus = [x.name for x in device_lib.list_local_devices()
            if x.device_type == 'GPU']
    if FLAGS.num_gpus > len(gpus):
        raise ValueError('Your machine has only %d gpus which is less than the requested --num_gpus=%d.'
                          % (len(gpus), FLAGS.num_gpus))

    raw_data = ptb_raw_data(FLAGS.data_path)
    (train_data, valid_data, test_data, _) = raw_data

    config = get_config()
    eval_config = get_config()
    eval_config.batch_size = 1
    eval_config.num_steps = 1

    with tf.Graph().as_default():
        initializer = tf.random_uniform_initializer(-config.init_scale,
                config.init_scale)

        with tf.name_scope('Train'):
            train_input = PTBInput(config=config, data=train_data,
                                   name='TrainInput')
            with tf.variable_scope('Model', reuse=None,
                                   initializer=initializer):
                m = PTBModel(is_training=True, config=config,
                             input_=train_input)
            tf.summary.scalar('Training Loss', m.cost)
            tf.summary.scalar('Learning Rate', m.lr)

        with tf.name_scope('Valid'):
            valid_input = PTBInput(config=config, data=valid_data,
                                   name='ValidInput')
            with tf.variable_scope('Model', reuse=True,
                                   initializer=initializer):
                mvalid = PTBModel(is_training=False, config=config,
                                  input_=valid_input)
            tf.summary.scalar('Validation Loss', mvalid.cost)

        with tf.name_scope('Test'):
            test_input = PTBInput(config=eval_config, data=test_data,
                                  name='TestInput')
            with tf.variable_scope('Model', reuse=True,
                                   initializer=initializer):
                mtest = PTBModel(is_training=False, config=eval_config,
                                 input_=test_input)

        models = {'Train': m, 'Valid': mvalid, 'Test': mtest}
        for (name, model) in models.items():
            model.export_ops(name)
        metagraph = tf.train.export_meta_graph()
        if tf.__version__ < '1.1.0' and FLAGS.num_gpus > 1:
            raise ValueError('num_gpus > 1 is not supported for TensorFlow versions below 1.1.0'
                             )
        soft_placement = False
        if FLAGS.num_gpus > 1:
            soft_placement = True
            util.auto_parallel(metagraph, m)

    with tf.Graph().as_default():
        tf.train.import_meta_graph(metagraph)
        for model in models.values():
            model.import_ops()
        sv = tf.train.Supervisor(logdir=FLAGS.save_path)
        config_proto = \
            tf.ConfigProto(allow_soft_placement=soft_placement)
        with sv.managed_session(config=config_proto) as session:
            for i in range(config.max_max_epoch):
                lr_decay = config.lr_decay ** max(i + 1
                        - config.max_epoch, 0.0)
                m.assign_lr(session, config.learning_rate * lr_decay)

                print('Epoch: %d Learning rate: %.3f' % (i + 1,
                      session.run(m.lr)))
                train_perplexity = run_epoch(session, m,
                        eval_op=m.train_op, verbose=True)
                print('Epoch: %d Train Perplexity: %.3f' % (i + 1,
                      train_perplexity))
                valid_perplexity = run_epoch(session, mvalid)
                print('Epoch: %d Valid Perplexity: %.3f' % (i + 1,
                      valid_perplexity))
                
            test_perplexity = run_epoch(session, mtest)
            print('Test Perplexity: %.3f' % test_perplexity)

            if FLAGS.save_path:
                print('Saving model to %s.' % FLAGS.save_path)
                sv.saver.save(session, FLAGS.save_path,
                              global_step=sv.global_step)

                


In [9]:
def start_training():
    FLAGS.model = "test"
    FLAGS.data_path = "rnn_data"
    FLAGS.save_path = "save/checkpoints/"
    FLAGS.use_fp16 = False
    FLAGS.num_gpus = 0
    FLAGS.rnn_mode = BLOCK
    tf.app.run()
    
start_training()

INFO:tensorflow:Summary name Training Loss is illegal; using Training_Loss instead.
INFO:tensorflow:Summary name Learning Rate is illegal; using Learning_Rate instead.
INFO:tensorflow:Summary name Validation Loss is illegal; using Validation_Loss instead.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
INFO:tensorflow:Restoring parameters from save/checkpoints/model.ckpt-730762
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Starting standard services.
INFO:tensorflow:Saving checkpoint to path save/checkpoints/model.ckpt
INFO:tensorflow:Starting queue runners.
INFO:tensorflow:Model/global_step/sec: 0
INFO:tensorflow:Recording summary at step 730762.
Epoch: 1 Learning rate: 1.000
0.000 perplexity: 1081.185 speed: 375 wps
0.100 perplexity: 794.667 speed: 10265 wps
INFO:tensorflow:Recording summary at step 761221.
0.200 perplexity: 767.041 speed: 10258 wps
INFO:tensorflow:Model/global_step/sec: 256.27
0.300 p

SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


# Generating Text Using Our RNN

First we create a class which represents a single timestep for text generation.

Then, we write a generate text which uses the saved model weights and feed-forward the input seed to receive a set of probabilities of what should come next.

In [20]:
class SmallGenConfig(object):
    """Small config. for generation"""
    init_scale = 0.1
    learning_rate = 1.0
    max_grad_norm = 5
    num_layers = 2
    num_steps = 1
    hidden_size = 200
    max_epoch = 4
    max_max_epoch = 13
    keep_prob = 1.0
    lr_decay = 0.5
    batch_size = 1
    vocab_size = 10000
    rnn_mode = BLOCK

In [27]:
def setup_generating():
    FLAGS = tf.app.flags.FLAGS
    FLAGS.model = "generating"
    FLAGS.data_path = "rnn_data"
    FLAGS.save_path = "save/checkpoints/"
    FLAGS.use_fp16 = False
    FLAGS.num_gpus = 0
    FLAGS.rnn_mode = BLOCK
    


def generate_text(train_path, model_path, num_sentences):
    setup_generating()
    gen_config = SmallGenConfig()
    
    with tf.Graph().as_default(), tf.Session() as session:
        initializer = tf.random_uniform_initializer(-gen_config.init_scale,
                                                    gen_config.init_scale)    
        with tf.variable_scope("model", reuse=None, initializer=initializer):
            m = PTBModel(is_training=False, config=gen_config, input_=PTBInput(config=gen_config, data=[[2]]))

        # Restore variables from disk.
        saver = tf.train.Saver() 
        saver.restore(session, model_path)
        print("Model restored from file " + model_path)
        
    words = get_vocab(train_path)
    
    state = m.initial_state.eval()
    x = 2 # the id for '<eos>' from the training set
    gen_input = np.matrix([[x]])  # a 2D numpy matrix 
    
    text = ""
    count = 0
    while count < num_sentences:
        output_probs, state = session.run([m.output_probs, m.final_state],
                                   {m.input_data: gen_input,
                                    m.initial_state: state})
        x = sample(output_probs[0], 0.9)
        if words[x]=="<eos>":
            text += ".\n\n"
            count += 1
        else:
            text += " " + words[x]
            
        # now feed this new word as input into the next iteration
        gen_input = np.matrix([[x]]) 
        
    print(text)
    return

def sample(a, temperature=1.0):
    a = np.log(a) / temperature
    a = np.exp(a) / np.sum(np.exp(a))
    r = random.random() # range: [0,1)
    total = 0.0
    
    for i in range(len(a)):
        total += a[i]
        if total>r:
            return i
        
    return len(a)-1 

In [28]:
generate_text("rnn_data/train.txt", "save/checkpoints/model.ckpt", 10)

INFO:tensorflow:Restoring parameters from save/checkpoints/model.ckpt


NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for save/checkpoints/model.ckpt
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "/home/tom/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/tom/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/tom/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/tom/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 478, in start
    self.io_loop.start()
  File "/home/tom/anaconda3/lib/python3.6/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/home/tom/anaconda3/lib/python3.6/site-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/home/tom/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2856, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/tom/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-28-7fba0dc79206>", line 1, in <module>
    generate_text("rnn_data/train.txt", "save/checkpoints/model.ckpt", 10)
  File "<ipython-input-27-b400b92a0a4d>", line 23, in generate_text
    saver = tf.train.Saver()
  File "/home/tom/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1338, in __init__
    self.build()
  File "/home/tom/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1347, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1384, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 835, in _build_internal
    restore_sequentially, reshape)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 472, in _AddRestoreOps
    restore_sequentially)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 886, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/home/tom/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for save/checkpoints/model.ckpt
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]


In [10]:
%tb

SystemExit: 