# GRU Word Model

The purpose of this notebook is to build on the ideas of the previous character model but to use a GRU word model instead. The key difference between a GRU and an LSTM is that a GRU has two gates, reset and update. Whereas an LSTM has three gates, input, output and forget. GRUs are related to LSTMS in that they both try and prevent vanishing gradient problems. GRUs control the flow of information without having to use a memory unit. This makes the GRU more efficient and usually has the same performace of an LSTM. Scroll through to find the implementation since all other code is the same as the LSTM model.

# Imports

In [1]:
import numpy as np
#import sys
import re
#import unicodedata
import pandas as pd
import keras.utils as ku
from nltk.tokenize import TweetTokenizer
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, Embedding, GRU
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint
from keras.preprocessing.text import Tokenizer, text_to_word_sequence
from keras.preprocessing.sequence import pad_sequences

Using TensorFlow backend.


# Functions

In [2]:
def get_sequence_of_tokens(corpus):
    """Takes in a corpus of data, in this case the tweets and fits the tokenizer
    on the data set. A variable for the number of words is declared. And finally 
    the sequences which will be used to train the model is found using keras' 
    texts_to_sequences function. The input sequences and the total number of words
    are returned"""
    
    t = Tokenizer()
    t.fit_on_texts(corpus)
    total_words = len(t.word_index) + 1
    
    #converts the corpus into a flat dataset of sentence sequences
    input_sequences = []
    for tweet in corpus:
        token_list = t.texts_to_sequences([tweet])[0]
        for i in range(1, len(token_list)):
            n_gram_sequence = token_list[:i+1]
            input_sequences.append(n_gram_sequence)
            
    return input_sequences, total_words

In [3]:
def generate_padded_sequences(input_sequences):
    """Pads sequences to the same length. Transforms lists of integers into a
    2d Numpy array of shape (num_samples, maxlen). Creates predictors and labels
    for the sequences. Assigns the labels to categorical variables. Returns
    predictors, label, and max sequence length."""
    
    max_sequence_len = max([len(x) for x in input_sequences])
    input_sequences = np.array(pad_sequences(input_sequences, maxlen = max_sequence_len, padding = 'pre'))
    predictors, label = input_sequences[:,:-1],input_sequences[:,-1]
    label = ku.to_categorical(label, num_classes = total_words)
    
    return predictors, label, max_sequence_len

In [4]:
def generate_text(seed_text, next_words, model, max_seq_len):
    """Takes a seed text as input and predicts the next words. Tokenizes the seed
    texts, pad the sequences, and pass them to be the trained model for prediction."""
   
    for _ in range(next_words):
        token_list = t.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre')
        
        predicted = model.predict_classes(token_list, verbose=0)
        
        output_word = ''
        
        for word,index in t.word_index.items():
            if index == predicted:
                output_word = word
                break
                
        seed_text = seed_text + " " + output_word
        
    return seed_text.title()

# Tokenizing and Cleaning Data

In [5]:
words = pd.read_csv('customer_service_data.csv')

As you can see above this is not the most beautiful output but it is manageable.

In [6]:
tweets = list(words.columns)

In [7]:
t = Tokenizer()
t.fit_on_texts(tweets)

In [8]:
# A dictionary of words and their counts.
print(t.word_counts)

# A dictionary of words and how many documents each appeared in.
print(t.word_docs)

# An integer count of the total number of documents that were used to fit the Tokenizer (i.e. total number of documents)
print(t.document_count)

# A dictionary of words and their uniquely assigned integers.
print(t.word_index)

7942


In [9]:
input_sequences, total_words = get_sequence_of_tokens(tweets)

In [10]:
input_sequences[:10]

[[16, 444],
 [16, 444, 31],
 [16, 444, 31, 261],
 [16, 444, 31, 261, 1395],
 [16, 444, 31, 261, 1395, 19],
 [16, 444, 31, 261, 1395, 19, 6],
 [16, 444, 31, 261, 1395, 19, 6, 20],
 [16, 444, 31, 261, 1395, 19, 6, 20, 462],
 [16, 444, 31, 261, 1395, 19, 6, 20, 462, 3696],
 [16, 444, 31, 261, 1395, 19, 6, 20, 462, 3696, 201]]

In [11]:
#pads sequences and gets data ready for the model
predictors, label, max_sequence_len = generate_padded_sequences(input_sequences)

# The Model

In [12]:
model = Sequential()
model.add(Embedding(total_words, 10, input_length=max_sequence_len - 1))
model.add(GRU(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(GRU(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(GRU(128))
model.add(Dropout(0.2))
model.add(Dense(total_words, activation='softmax'))

This model has a lot of similarities to the previous notebook's model with one exception, all of the LSTM layers have been replaced with GRU layers. 

In [13]:
model.compile(loss='categorical_crossentropy', optimizer='adam')

In [14]:
model.fit(predictors, label, epochs=25, batch_size=256, verbose=1)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.callbacks.History at 0x291ceb70ba8>

In [15]:
filename = "gru_word_vec_model_weights_saved.hdf5"
model.save_weights(filename)
print("saved model weights")

saved model weights


In [16]:
print(generate_text("why can't i see this page", 120, model, max_sequence_len))

Why Can'T I See This Page To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To To


# Summary

Several things can be taken away from this model. For starters it takes much less time to train so I was able to go through more epochs and potentially have a better result. This model however got stuck in a loop and really really liked the word "To". Which if this was a bot that just output the word "To" in a really roundabout way than I nailed it. More fine tuning or something is needed. Will have to research and find out. 

# Future Work

Needs some fine tuning. Perhaps the layers need to be adjusted or some other factor that is affecting the GRU implementation. Which is strange since both of the documentations for LSTM and GRUs look the same. I need more detailed knowledge between GRU and LSTM implementation to really make any meaningful changes to this notebook.  