Skip to content

andymdc31/NLP-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Customer Service Text Generation using NLP Algorithms and Models

Summary

The purpose of this project is to show how to build a customer service text generator using Natural Language Processing algorithms using the Keras API and TensorFlow. The generator is intended to possibly replace the first interaction many consumers have with a company. And will either offer information to help the customer or if the issue is escalated it will send the consumer to a human representative. This would allow a company to have a smaller customer service team dealing with more important issues. I made a dataset of customer service tweets pulled from Twitter using Twitter's API and the tweepy library in Python. A jupyter notebook showing how to do this is in the repo. The purpose of the models are to learn from the tweets and be able to respond to a seed tweet either asking for help or some other customer service option. The models build in complexity and are explained in detail about how to build the models, why layers are chosen, and how certain model options affect the text generator.

My original idea for this project was to build all the way up to a Transformer model. If anyone keeps up with the latest models and trends you may recall that Transformers had amazing success in text generation and other sequence model tasks like translation. I was originally interested in making something like GPT-2. Now I know that it wouldn't be possible to build and train something like that on my own personal machine and I did not want to spend tons of money I knew that would be a far shot. So what I did instead was start with simple text generation and try and build more and more models to build up to something that resembles human speech while also building my knowledge base of how Keras and TensorFlow work. I was working on an Attention Encoder/Decoder model when I ran out of time for this project. That will be my future work.

Tweet Generator

This notebook is dedicated to working with the Twitter API by using the Tweepy library. The Tweepy library allows you to use Python to interact with the Twitter API. With the Tweepy API you are able to pull tweets, analyze them, send tweets, look at users, etc. For this project the main focus of this notebook is to access as many tweets as possible from ten well reviewed customer service accounts on Twitter. These accounts are known for responding to customers tweets as well as offering solutions or where to find more information. While there are some limits to the amount of tweets you can pull from individual accounts. I was able to get a little under 8,000 tweets. While more tweets would have been better, I tried to make sure that the tweets had meaningful content and weren't just words spewed by bots that are incoherent. The original data set for this project consisted of tweets containing the words "big data" or "data science." While this sounds like an ideal way of training a data science bot to tweet in these terms, there were several issues. Many tweets had nothing to do with data science, perhaps they were themselves produced by bots and then you have a loop of one bad bot creating another bad one and so on with no useful output. Another issue that was encountered was the models would predict the words "big data" and "data science" over and over again in a loop. They seemed to have found the search criteria for the tweets and would get stuck in a loop. In the end I stuck with the customer service tweets and tried to make thoses work. The results can be found in the model notebooks.

Tweet Generator

Character Level Sequence Model using LSTM Layers

The first model is a simple character tokenized sequence model with a few LSTM layers. There are many sources on text generation that use character prediction. While there are tradeoffs to this, it is the base or first model I attempted to generate text with. Using NLTK's TweetTokenizer, each tweet is tokenized and then a dictionary is made for each character and how many times it appeared in a token. The vocab was still rather large, around 6407. Much greater than I initially intended for. But what really slowed down the training time was the number of patterns that the model had to be trained on. Around half a million. After reviewing the data this was caused by several things. The first one being the numbers in the token list. I should have removed any numbers so that it would only be words but I set out to get the model to work with what I had. Next each sequence of 140 characters is processed with each character being counted and its frequency kept. This will allow us to predict the next character once we have a seed text. The model is a Keras Sequential Model with two LSTM layers each followed by a Dropout layer to prevent overfitting. The final layer is a Dense layer with a 'softmax' activation function. The model is compiled and fits the data set. This takes several hours to run. Finally another dictionary is created which allows the predicted values to be transformed back from integers to characters. The text generated by the model can be found in the notebook.

Character Level Sequence Model

Word Level Sequence Model Using LSTM Layers

This model builds on the experiences discovered from the character model. NLTK's TweetTokenizer was interesting to use but did not have the flexibility of other tokenizers. For this model I instead used Keras' own Tokenizer and text_to_word_sequence. As you can probably guess from this sections summary. Instead of predicting the next character, I will instead be predicting the next word in a sequence. I have also decided to not remove stop words. Since I am not trying to classify sentiment or translate a language I want these stop words to be contained in the data set that will train the model to create text. At a later date I may come back and remove stop words to see any differences. Once the tweets are tokenized, A sequence is then made of each tweet and its tokens. These sequences need to be padded so that each input sequence is the same size. Using Keras's pad_sequences function accomplishes this. Finally the predictors and label are declared. The predictor will be the seed and the label will be the word that is predicted to next in the sequence once the model is trained. The model first must be trained on the sequences and this gives it a way to know what it is looking for. The model only has one different layer than the character level model and that is an embedding layer. The embedding layer compresses the feature space into a smaller one and the idea was to get the model to train faster. These steps are able to train much faster, taking only about 10 minutes apiece as opposed to the hours the character model took. Since I continued to see the loss function go down I decided to run 100 epochs. The results can be seen in the notebook. This model preformed much better and I decided to write a function that would allow me to decide what the seed was instead of randomly generating it like the first model.

Word Level Sequence Model Using LSTM Layers

Word Level Sequence Model USING GRU Layers

For this model, the code is exactly the same as the previous model, just the LSTM layers have been swapped for GRU layers. Both GRUs and LSTMs have repeating modules like the RNN but they have different structures. GRUs allow you to retain the previous cell's state or to update its value. This way it can decide what previous information is important or unimportant. Unfortunately after training this model a hand full of times I was only able to get it to generate the word to a lot. Not exactly what you want from a text generator which is supposed to sound human

Word Level Sequence Model Using GRU Layers

Attention Model and Transformer Model

If you google attention model or transformer model you will see several blogs or repos that can show you how to build attention encoder/decoders or transformers, but they are all for language translation it seems. While there is not a significant difference between language translation and text generation there is some. I was only able to spend a small amount of time in this area. I was able to get the other models working and will work on this one in the future. I was trying to build off of the Keras tutorial and was not able to get it to run before the notebook would just crash. The issue with building either of these models is that it isn't as simple as just getting the data running and putting it into a Keras Model. There is no layer for Attention so it all must be hand coded. Same goes for the entire transformer model. While I enjoy the challenge and the knowledge I gained working on this it proved a little out of the scope for this project. It did make me really appreciate how simple and elegant Keras is though.

Conclusions

Text generation is hard, or at least it is difficult to build a generator that is up to par. Combining several resources and building up this project as much as I could was a challenge with some reward. I did not get as far as I wanted but the text from some models was pretty good. The results kind of speak for themselves though. You would not want misspelled words or bad grammar when talking to a customer service rep so why would you want a bot that does that. This project is more for me learning how these large companies are able to come up with such realistic text generation than for me to try to beat them. I would like to continue to work on these models and see how far I can go. Or at least get the attention model working and keep building up to the transformer model. This is my first project in NLP and certainly won't be my last

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published