In [None]:
alice_txt = '../data/alice.txt'

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

tf.logging.set_verbosity(tf.logging.WARN)

The imports below are to allow models to train for a restricted amount of time. This is useful for training multiple models over night, as they would not need to be manually stopped.

In [None]:
from datetime import datetime, timedelta

In [None]:
"""
Temp import the source. Will be removed in the final reportc
"""
import sys
sys.path.append('../')

## Reading the data
To prepare our data for use by our neural net, we first needed to split it into groups of data that follow specific rules. To streamline the process, we used the `Dataset` class to store and manage our input data. This class was responsible for splitting the data into strings of the correct length and for turning them into one hot encoded arrays that the neural net could better understand. We stored this pre-prepared data in a `Batch` object, which has `inputs` and `targets` attributes for our model to use in training.

In [None]:
"""
The real code will be inserted here in the final report
"""
from src.dataset import Batch

In [None]:
"""
The real code will be inserted here in the final report
"""
from src.dataset import Dataset

## Batching the data

In [None]:
"""
The real code will be inserted here in the final report
"""
from test.dataset_test import test_batch

In [None]:
test_batch(alice_txt, 5, 100) # The test passes without any errors

## Build the RNN Text Generator

The text generator itself is stored in the `RNNTextGenerator` class. Among other things, storing the generator in the class allows the session helps prevent accidental data loss.

The class also internalizes the methods needed to save and restore the model as a file. This allows for long term storage and quick retreaval of a file, as well as increasing the ease of using the weights for a model with a different sized input.

The text generator does not take batches when training, however, and needs to be fed the inputs and targets seperately. 

In [None]:
"""
The real code will be inserted here in the final report
"""
from src.text_generator import RNNTextGenerator

## Save and restore the model

In [None]:
"""
The real code will be inserted here in the final report
"""
from test.text_generator_test import test_save_restore

In [None]:
test_save_restore(4, 5, 10) # The test passes without any errors

## Collect tensorflow logs

In [None]:
"""
The real code will be inserted here in the final report
"""
from test.text_generator_test import test_log

In [None]:
test_log(4, 10, '../tf_logs') # The test passes without any errors

### *Here will be a screenshot from the tensorboard*

## Training the RNN Text Generator
A short amount of training provides us with a model that is capable of forming multiple words and a few phrases, but not much more. 

In [None]:
"""
The real code will be inserted here in the final report
"""
from test.alice_test import test_alice

Let's generate some text! Start by:

In [None]:
scores = test_alice(alice_txt, 'my favorite ')

In [None]:
fig, axes = plt.subplots(figsize=(15, 6), ncols=2)
scores['accuracy'].plot(ax=axes[0], title='Accuracy')
scores['loss'].plot(ax=axes[1], title='Loss')
for ax in axes:
    ax.set(xlabel='Steps')

## Build a Model Selector 

In [None]:
"""
The real code will be inserted here in the final report
"""
from src.model_selector import ModelSelector

In [None]:
"""
The real code will be inserted here in the final report
"""
from test.model_selector_test import test_model_selector

In [None]:
seq_length = 25
dataset = Dataset([alice_txt], seq_length)
params = {
    'rnn_cell': [
        tf.contrib.rnn.BasicRNNCell
    ],
    'n_neurons': np.arange(1, 1000),
    'optimizer': [
        tf.train.AdamOptimizer,
    ],
    'learning_rate': np.linspace(0, 1, 10000, endpoint=False),
    'epoch': np.arange(5, 100),
    'batch_size': np.arange(25, 100),
}

In [None]:
test_model_selector(dataset, params, 3)

## Select the best model
We then continued to train the same model on our dataset to see how well our model learned when it continued to be fed data from its dataset. 

Every so many epochs, we paused training to test our model by generating our models scores and generating a sample text. This information is stored for comparison purpouses. 

In [None]:
# Out of date
def train_test(
    dataset,
    learning_rate,
    start_seed,
    model_name = "RNNTextGenerator",
    model_exists = True,
    train_seq_length = 25,
    epoch = 20,
    time_limit = 10
    ):
    runs_for = timedelta(minutes=time_limit)
    start_time = datetime.now()
    while(runs_for > datetime.now() - start_time ):
        #build model to train on
        model = RNNTextGenerator(
            train_seq_length,
            dataset.vocab_size,
            learning_rate=learning_rate,
            name=model_name,
        )
        try:
            #If a model exists, we will need to restore it before we begin training.
            model.restore()
        except:
            #If no model already exists, we can afford to ignore this error.
            pass
        #train
        for _ in range(epoch):
            for batch in dataset.batch(batch_size):
                model.fit(batch.inputs, batch.targets)
        model.save()
        model_exists = True
        #Build model to sample with
        model = RNNTextGenerator(
            len(start_seed),
        dataset.vocab_size,
            name=model_name,
        )
        model.restore()
        #Sample stuff
        print('>>>>> {}'.format(start_seed), RNNTextGenerator.sample(
            model,
            dataset,
            start_seed,
            50
        ))
        print('<<<<<<')

In [None]:
train_test(dataset = dataset, learning_rate = learning_rate, start_seed = ".\n", model_name = "boo")