# MapChat: Content Generation

[MapChat GitHub](https://github.com/csci-599-applied-ml-for-games/MapChat)

[textgenrnn Github](https://github.com/minimaxir/textgenrnn)

[Yelp Dataset](https://www.yelp.com/dataset) 



## Clone the MapChat repo


In [3]:
# ! git clone https://github.com/csci-599-applied-ml-for-games/MapChat.git

# ! pip list
# ! pwd

/mnt/c/Users/Thotiana Oblock/Desktop/USC/SPRING2020/CSCI599/MapChat/content/textgenrnn


## Install and Import required Python packages

In [0]:
!pip install -q textgenrnn
# from google.colab import files
from textgenrnn import textgenrnn
from datetime import datetime
import os

## Textgenrnn model configuration
Check out [demo](https://github.com/minimaxir/textgenrnn/blob/master/docs/textgenrnn-demo.ipynb) for more info


In [0]:
model_cfg = {
    'word_level': False,   # set to True if want to train a word-level model (requires more data and smaller max_length)
    'rnn_size': 128,   # number of LSTM cells of each layer (128/256 recommended)
    'rnn_layers': 3,   # number of LSTM layers (>=2 recommended)
    'rnn_bidirectional': False,   # consider text both forwards and backward, can give a training boost
    'max_length': 30,   # number of tokens to consider before predicting the next (20-40 for characters, 5-10 for words recommended)
    'max_words': 10000,   # maximum number of words to model; the rest will be ignored (word-level model only)
}

train_cfg = {
    'line_delimited': True,   # set to True if each text has its own line in the source file
    'num_epochs': 20,   # set higher to train the model for longer
    'gen_epochs': 10,   # generates sample text from model after given number of epochs
    'train_size': 0.8,   # proportion of input data to train on: setting < 1.0 limits model from learning perfectly
    'dropout': 0.0,   # ignore a random proportion of source tokens each epoch, allowing model to generalize better
    'validation': False,   # If train__size < 1.0, test on holdout dataset; will make overall training slower
    'is_csv': False   # set to True if file is a CSV exported from Excel/BigQuery/pandas
}

## Connect text file to train on

In [0]:
model_name = "grocery"   # change to set file name of resulting trained models/text
file_name = "%s.txt" % (model_name)


## Train!

In [0]:
textgen = textgenrnn(name=model_name)

train_function = textgen.train_from_file if train_cfg['line_delimited'] else textgen.train_from_largetext_file

train_function(
    file_path=file_name,
    new_model=True,
    num_epochs=train_cfg['num_epochs'],
    gen_epochs=train_cfg['gen_epochs'],
    batch_size=1024,
    train_size=train_cfg['train_size'],
    dropout=train_cfg['dropout'],
    validation=train_cfg['validation'],
    is_csv=train_cfg['is_csv'],
    rnn_layers=model_cfg['rnn_layers'],
    rnn_size=model_cfg['rnn_size'],
    rnn_bidirectional=model_cfg['rnn_bidirectional'],
    max_length=model_cfg['max_length'],
    dim_embeddings=100,
    word_level=model_cfg['word_level'])

40,185 texts collected.
Training new model w/ 3-layer, 128-cell LSTMs
Training on 2,245,307 character sequences.
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
####################
Temperature: 0.2
####################
The food is always fresh and the service was fresh and delicious.

The food is always fresh and the service was delicious and the service is good.

They have a great selection of food and the staff was amazing.

####################
Temperature: 0.5
####################
The food is good and the staff is great.

The food is great and the service is good.

The food is so good and the service was good and the food was delicious and the service is very good.

####################
Temperature: 1.0
####################
The sandwich was very friendly and the staff was very friendly.

Great to find this food city we true market is serving a few funds to the atmosphere!

In the morning was worth browsing and easy to 

## Generate and save content

In [0]:
temperature = [1.0, 0.5, 0.2, 0.2]   
prefix = None   # if you want each generated text to start with a given seed text

if train_cfg['line_delimited']:
  n = 1000
  max_gen_length = 60 if model_cfg['word_level'] else 300
else:
  n = 1
  max_gen_length = 2000 if model_cfg['word_level'] else 10000
  
timestring = datetime.now().strftime('%Y%m%d_%H%M%S')
gen_file = '{}_gentext_{}.txt'.format(model_name, timestring)

textgen.generate_to_file(gen_file,
                         temperature=temperature,
                         prefix=prefix, 
                         n=n,
                         max_gen_length=max_gen_length)
# files.download(gen_file)

MessageError: ignored

## Save weights and config files to recreate


In [0]:
# files.download('{}_weights.hdf5'.format(model_name))
# files.download('{}_vocab.json'.format(model_name))
# files.download('{}_config.json'.format(model_name))