# Lab 4.3 - Chatbot to detect emotions

Copyright, Vrije Universiteit Amsterdam, Faculty of Humanities, CLTL

In this notebook, we will create a Telegram chatbot that will detect the emotion in a message, and respond appropriately according to a set of keywords in the same message.

**Main goal of this notebook**: The most important goal of this notebook is to have a Telegram chatbot that can detect emotion, and detect keywords in the received messages.

**At the end of this notebook, you will**:

* **Integrate knowledge you have learned in the previous labs such as**:
  * **Load a pre-trained emotion classifier**
  * **Measure semantic similarity between a set of words**
  * **Use a predefined question - answering dataset**

## Creating an empathic semantic chatbot

Our plan is the following. You learned how to build a emotion classifier in Lab3. You also learned how to load a word embedding model and get the words that are most similar to a word. Having these skills, it should not be so difficult to:

1) send each message to the emotion classifier and to get the emotion
2) match each token or the most similar words from a token against a set of keywords and to find a match

This would be the basic design for a chatbot that given the emotion and keywords associated with a message gives a certain response.

In [1]:
import nltk
import random
import pickle
from pprint import PrettyPrinter
from collections import defaultdict
from gensim.models import KeyedVectors

from utils import read_token, read_qa, BotHandler

### Loading pretrained models

First, we will load the pre-trained models we have: an emotion classifier we built and the word embedding model that was used to build it or that we want to use to match the keywords. We assume you still have the emotion detection classifiers stored in the models folder of Lab3. You may need to adapt the path in the following code to match your local set up.

The next function loads a whole bunch of models that we need so that you do not need to worry about them.

In [2]:
def load_classifier():
    """ Function to load pre-trained machine learning models needed """
    filename_vectorizer = '../lab3.machine_learning/models/utterance_vec.sav'
    filename_transformer = '../lab3.machine_learning/models/utterance_transf.sav'
    filename_encoder = '../lab3.machine_learning/models/label_encoder.sav'
    filename_classifier = '../lab3.machine_learning/models/svm_linear_clf_bow.sav'

    # load the classifier and the vectorizer from disk
    loaded_classifier = pickle.load(open(filename_classifier, 'rb'))
    loaded_vectorizer = pickle.load(open(filename_vectorizer, 'rb'))
    loaded_transformer = pickle.load(open(filename_transformer, 'rb'))
    loaded_label_encoder = pickle.load(open(filename_encoder, 'rb'))
    
    preprocessing_tools = {'vectorizer': loaded_vectorizer, 
                           'transformer': loaded_transformer,
                           'label_encoder': loaded_label_encoder}

    return loaded_classifier, preprocessing_tools


In [3]:
def load_semantic_model():
    """ Function to load word embedding models needed """
    ### Adapt the path according to your local settings to point to your word embedding model
    path_to_model = '/Users/selbaez/Documents/PhD/data/word_embeddings/GoogleNews-vectors-negative300.bin'
    embedding_model = KeyedVectors.load_word2vec_format(path_to_model, binary=True)

    return embedding_model

### Classifying emotions

We also have to define the funtions to classify the emotion and to get similar words. To classify the emotion in the message we will need the message and the classifier.

In [4]:
def classify_emotion(message, classifier, preprocessing_tools):
    """ Function to process a message and predict the emotion it reflects """
    # Remember our classifier expects a list of texts so we simply put the message in a list
    message = [message]

    # We use the transform function to represent the message as a vector according to the model
    # This works for the Bag-of-Words classifier that we created
    counts = preprocessing_tools['vectorizer'].transform(message) ### This is the vector according to the count model
    tfidf = preprocessing_tools['transformer'].transform(counts)  ### this is the vector according to the TFIDF model
    
    # Predict
    predictions = classifier.predict(tfidf)

    # Map prediction to a label
    for predicted_label in predictions:
        predicted_emotion = preprocessing_tools['label_encoder'].classes_[predicted_label]

    return predicted_emotion

The above function only works for the Bag-of-Word classifiers created. Think about what function is needed to classify a message according to the word embedding models. How to represent the message with a vector that can be handled by a model based on averaged word embedding vectors?

### Detecting keywords

Now we have to define the function by which we will try to match the topic keywords (e.g. music) to the tokens found in the message. First we will have to expand the meaning of the message by finding similar words to the ones the user sent.

In [5]:
def get_similar_words(embedding_model, message, num_similar_words=10, verbose=False):
    """ Function to enrich the message with similar words for better keyword detection """
    tokens = nltk.tokenize.word_tokenize(message)

    similar_words = defaultdict(set)
    for token in set(tokens):
        # Add the token itself to the enriched message
        similar_words[token].add(token)
        
        # Try getting similar words if the vector for the given token is found
        try:
            word_neighborhood = embedding_model.most_similar(positive=[token], topn=num_similar_words)
            # Add neighbor words to enrich the message
            for item in word_neighborhood:
                word = item[0].lower()
                similar_words[word].add(token)

        except KeyError as e:
            print("token '%s' not in embedding vocabulary" % token)

    if verbose:
        PrettyPrinter(indent=2).pprint(similar_words)

    return similar_words

Then we can find the intersection between our enriched message tokens and the pre-defined keywords in our qa dataset

In [6]:
def get_keyword_intersection(enriched_message, keywords):
    """ Function to determine if the message matches certain keywords according to some semantic similarity or relatedness"""
    # Get enriched tokens
    message_words = enriched_message.keys()

    # Calculate intersection between the two sets of words
    word_intersection = list(set(keywords) & set(message_words))

    # Create a dictionary so we know what keywords matched to what original token
    matched_words = {w: enriched_message[w] for w in word_intersection}

    return matched_words

### Create a response

The last thing we need to do is create a response, given an incoming message. Here we can call the functions we defined before to classify emotion, enrich the meaning of the message, and match keywords.

In [7]:
def create_response(message, qa_data, classifier, preprocessing_tools, embedding_model):
    # Determine default response
    reply = "I cannot respond to this"
    
    # Classify emotion in message
    emotion = classify_emotion(message, classifier, preprocessing_tools)
    
    # Enrich the message
    similar_words = get_similar_words(embedding_model, message)
    enriched_message = message + ' ' + ' '.join(similar_words.keys())
    
    # Loop through the predefined intents, and generate a response if there is a match (emotion + keywords)
    for i in qa_data['intents']:
        
        # Only consider intents related to the emotion detected
        if emotion == i['category']:
            
            # Try to match the message to the set of predefined keywords
            word_intersection = get_keyword_intersection(similar_words, i['keywords'])

            # If there is a match, generate a response response 
            if word_intersection:
                print("\nEmotion detected: {emotion}".format(emotion=emotion))
                print("Keywords detected [(keyword): (message_token)]: \n\t{intersection}".format(intersection=word_intersection))

                reply = random.choice(i['responses'])
                break

    return reply


### Try it out!

As in previous notebooks, we create our BotHandler and respond to the last message sent to the Telegram chatbot by a specific user

In [8]:
CLTL_TOKEN = read_token()
user_id = 408043639
bot = BotHandler(CLTL_TOKEN)

qa_data = read_qa(qa_path = './data/emotions.json')
classifier, preprocessing_tools = load_classifier()
embedding_model = load_semantic_model()

In [9]:
last_message = bot.get_last_message_by(user_id)
response = create_response(last_message, qa_data, classifier, 
                           preprocessing_tools, embedding_model)
bot.send_message_to(user_id, response)

print("Received: {message}".format(message=last_message))
print("Responded: {response}".format(response=response))



token 'Maluma' not in embedding vocabulary
token ',' not in embedding vocabulary
token 'of' not in embedding vocabulary

Emotion detected: surprise
Keywords detected [(keyword): (message_token)]: 
	{'song': {'singing', 'song'}, 'melody': {'song'}}
Received: I can't believe Adele is singing the next reggaeton song with Maluma, I thought she was not that type of artist
Responded: I am just as shocked as you about this music


## End of this notebook