# GRU Word Model

The purpose of this notebook is to build on the ideas of the previous character model but to use a GRU word model instead. The key difference between a GRU and an LSTM is that a GRU has two gates, reset and update. Whereas an LSTM has three gates, input, output and forget. GRUs are related to LSTMS in that they both try and prevent vanishing gradient problems. GRUs control the flow of information without having to use a memory unit. This makes the GRU more efficient and usually has the same performace of an LSTM. Scroll through to find the implementation since all other code is the same as the LSTM model.

# Imports

In [1]:
import numpy as np
#import sys
import re
#import unicodedata
import pandas as pd
import keras.utils as ku
from nltk.tokenize import TweetTokenizer
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, Embedding, GRU
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint
from keras.preprocessing.text import Tokenizer, text_to_word_sequence
from keras.preprocessing.sequence import pad_sequences

KeyboardInterrupt: 

# Functions

In [2]:
def get_sequence_of_tokens(corpus):
    """Takes in a corpus of data, in this case the tweets and fits the tokenizer
    on the data set. A variable for the number of words is declared. And finally 
    the sequences which will be used to train the model is found using keras' 
    texts_to_sequences function. The input sequences and the total number of words
    are returned"""
    
    corpus = corpus.lower()
    t = Tokenizer()
    t.fit_on_texts(corpus)
    total_words = len(t.word_index) + 1
    
    #converts the corpus into a flat dataset of sentence sequences
    input_sequences = []
    for line in corpus:
        token_list = t.texts_to_sequences([line])[0]
        for i in range(1, len(token_list)):
            n_gram_sequence = token_list[:i+1]
            input_sequences.append(n_gram_sequence)
            
    return input_sequences, total_words

In [18]:
def generate_padded_sequences(input_sequences):
    """Pads sequences to the same length. Transforms lists of integers into a
    2d Numpy array of shape (num_samples, maxlen). Creates predictors and labels
    for the sequences. Assigns the labels to categorical variables. Returns
    predictors, label, and max sequence length."""
    max_sequence_len = max([len(x) for x in input_sequences])
    input_sequences = np.array(pad_sequences(input_sequences, maxlen = max_sequence_len, padding = 'pre'))
    predictors, label = input_sequences[:,:-1],input_sequences[:,-1]
    label = ku.to_categorical(label, num_classes = total_words)
    
    return predictors, label, max_sequence_len

In [19]:
def generate_text(seed_text, next_words, model, max_seq_len):
    """Takes a seed text as input and predicts the next words. Tokenizes the seed
    texts, pad the sequences, and pass them to be the trained model for prediction."""
    for _ in range(next_words):
        token_list = t.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre')
        
        predicted = model.predict_classes(token_list, verbose=0)
        
        output_word = ''
        
        for word,index in t.word_index.items():
            if index == predicted:
                output_word = word
                break
                
        seed_text = seed_text + " " + output_word
        
    return seed_text.title()

# Tokenizing and Cleaning Data

In [3]:
words = pd.read_csv('customer_service_data.csv')

As you can see above this is not the most beautiful output but it is manageable.

In [6]:
tweets = list(words.columns)

In [16]:
input_sequences, total_words = get_sequence_of_tokens(tweets)

In [20]:
#pads sequences and gets data ready for the model
predictors, label, max_sequence_len = generate_padded_sequences(input_sequences)

# The Model

In [21]:
model = Sequential()
model.add(Embedding(total_words, 10, input_length=max_sequence_len - 1))
model.add(GRU(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(GRU(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(GRU(128))
model.add(Dropout(0.2))
model.add(Dense(total_words, activation='softmax'))

This model has a lot of similarities to the previous notebook's model with one exception, all of the LSTM layers have been replaced with GRU layers. 

In [22]:
model.compile(loss='categorical_crossentropy', optimizer='adam')

In [None]:
model.fit(predictors, label, epochs=25, batch_size=256, verbose=1)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25

In [None]:
filename = "word_vec_model_weights_saved.hdf5"
model.save_weights(filename)
print("saved model weights")

In [21]:
print(generate_text("why can't i see this page", 120, model, max_sequence_len))

Why Can'T I See This Page We Can Help With Your Order Please Dm Us With Your Name And Address And 1 Conta… … 1 3 3 3 3 3 3 3 3 Confirming To Be Availab… I… I… I… Came To The Refer Confirming Confirming Have A Moment Confirming Be Able… At… Is The Top 2Nd Day… I… Is The Stream 1 2 20 Tue I… Is The Top Confirming Plea… The Store And Manned Ch… Tue Is A Branded Gt Months Is A P… 1 Affecting To The Store Management I'Ve… Is Been 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 A Scam Second The Enforcement Team… Is A Scam We Want


# Summary

Several things can be taken away from this model. For starters it takes much less time to train so I was able to go through more epochs and potentially have a better result. Some more fine tuning could be done. But that is always the case with NLP models. I could add more layers, more neurons, etc etc. But in this case I believe that if I remove numbers from the tokens than it will produce more text. And hopefully that text will be better. There is semblance of actual speech though in this text. It isn't just the same sentence over and over again which occurred in the word model. Keras really simplified the process of building this model as well. I would also like to explore GRUs as well as attention in the following models. 