[View in Colaboratory](https://colab.research.google.com/github/demmojo/colabrnn/blob/master/colabrnn.ipynb)

# Train your own text generator using a recurrent neural network!
by [Mohamed Abdulaziz](https://www.mohamedabdulaziz.com/)

Train either a bidirectional or normal LSTM recurrent neural network to generate text using any dataset.  **No need to write any code. Just upload your text file and click run!**

You can generate text after your model has finished training as well! Also you can continue training a pre-trained model if it needs more accuracy.

**The best part is that all of the training is conducted on a free GPU courtesy of Colaboratory!**

For more information about the code used in this demo please check this github link.[github link](https://github.com/demmojo/colabrnn)



## Before you begin

Ensure you are running this in Google Chrome. 

Next, copy the notebook to your Google Drive to keep it as well as save your changes. 

Also, make sure you are using the GPU runtime type by clicking **Runtime** in the toolbar above and then **Change runtime type**. Check if the hardware accelerator is set to GPU.

That's it! Run the next code cells!


In [0]:
!git clone https://github.com/demmojo/colabrnn

The above code clones the github project on the Colaboratory VM. Next we will install the dependencies and import the necessary packages.

Note: If you get a **Failed to assign a backend** message that means that no free GPUs are available. You can connect and train using CPUs but that will be much slower. Otherwise, try again later to connect to a server with a GPU. 

In [0]:
!pip install keras==2.2

from google.colab import files
import os

# What would you like to do?

If you are training a new model we first need to upload a text file. Then colabrnn will use that to train and generate original text! 

## Train a new model

Run the cell below and click ***Choose Files*** and select your files from your local computer. (Ideally, your text files should be quite large >1mb).

Please note that the uploaded file is stored on the Colaboratory VM and** only you** have access to it.

After uploading the file run the next cell to start the training process! You can see generated text as the training process goes on to see how your model is learning.

If you prefer to change some parameters please do so before running the cell.

## Continue training a pre-trained model

Run the cell below and upload the weight, vocabulary and config files as well as the text file.

After uploading the necessary model files (weight, vocabulary and config files) as well as the text file you can retrain your old model. 

Change the ***train_new_model*** variable to **False**. Then check whether the file names are correct before running the cell below.

## Generate text with a pre-trained model

After uploading the necessary model files (weight, vocabulary and config files) as well as the text file skip to the section: **Generate text using your trained model**.

Check whether the file names are correct before running the cell.

In [3]:
uploaded = files.upload()
all_files = [(name, os.path.getmtime(name)) for name in os.listdir()]
latest_uploaded_file = sorted(all_files, key=lambda x: -x[1])[0][0]

In [0]:
from colabrnn.rnn import CharGen
from colabrnn.rnn.train_model import train

train_new_model = True
model_name = 'colab'

if train_new_model:  # Create a new neural network model and train it
    char_gen = CharGen(name=model_name)
    train(text_filepath=latest_uploaded_file,
          chargen=char_gen,
          gen_text_length=500,  # Number of characters to be generated. Average number of characters in a word is approximately 5. (default 500)
          num_epochs=10,  # One epoch is when an entire dataset is passed forward and backward through the neural network only once (default 10)
          bidirectional=False,  # Boolean. Train using a bidirectional LSTM or unidirectional LSTM. See this coursera video for more information: https://www.coursera.org/lecture/nlp-sequence-models/bidirectional-rnn-fyXnn
          rnn_size=128,  # Number of neurons in each layer of your neural network (default 128)
          rnn_layers=3,  # Number of layers in your neural network (default 3)
          batch_size=512,  # Total number of training examples present in a single batch. More is faster but there are memory constraints. If you are experiencing insufficient memory issues reduce this number. (default 512)
          embedding_dims=75,  # Size of the embedding layer
          train_new_model=train_new_model)  

    print(char_gen.model.summary())
else:  # Continue training an old model
    text_filename = 'shakespeare.txt'  # specify correct filename if you are retraining an old model
    char_gen = CharGen(name=model_name,
                      weights_filepath='colab_weights.hdf5',  # specify correct filename if you are retraining an old model
                      vocab_filepath='colab_vocabulary.json',  # specify correct filename if you are retraining an old model
                      config_filepath='colab_config.json')  # specify correct filename if you are retraining an old model
    
    train(text_filename, char_gen, train_new_model=train_new_model, num_epochs=10)  # change num_epochs to specify number of epochs to continue training


## Save the model files

Run the cell below to save the model files locally. You can upload them again later to retrain.

In [0]:
files.download('{}_weights.hdf5'.format(model_name))
files.download('{}_vocabulary.json'.format(model_name))
files.download('{}_config.json'.format(model_name))

## Generate text using your trained model!

Run the cell below to generate samples of your trained model. 

You can specify the starting text  for the model by changing the ***prefix***  variable to use as the beginning of the generated text.

You can also define how long you want your generated text to be by ***change gen_text_length*** variable.

Have fun!

In [0]:
from colabrnn.rnn import CharGen

char_gen = CharGen(weights_filepath='colab_weights.hdf5',  # specify correct filename 
                   vocab_filepath='colab_vocabulary.json',  # specify correct filename
                   config_filepath='colab_config.json')  # specify correct filename 

char_gen.generate(gen_text_length=500, prefix='To be or not to be,')

If at any time you would like to list the contents of the current directory use the following command:

In [0]:
!ls

colab_config.json  colab_vocabulary.json  datalab
colabrnn	   colab_weights.hdf5	  shakespeare.txt


If you would like to restart or reset the Colaboratory VM you can run the following cell:

In [0]:
!kill -9 -1

### If you have any questions, suggestions or would like to share your project, please contact me [here](https://www.mohamedabdulaziz.com/#contact).

### You can also check out my other projects on [Github](https://github.com/demmojo).