# Lab 4.3 - Chatbot to detect emotions

Copyright, Vrije Universiteit Amsterdam, Faculty of Humanities, CLTL

In this notebook, we will create a Telegram chatbot that will adetect the emotion in a message, and respond appropriately according to a set of keywords.

**Main goal of this notebook**: The most important goal of this notebook is to have a Telegram chatbot that can detect emotion, and detect keywords in the received messages.

**At the end of this notebook, you will**:
* **Integrate knowledge you have learned in the previous labs such as**:
  * **Load a pre-trained emotion classifier**
  * **Measure semantic similarity between a set of words**
  * **Use a predefined question - answering dataset**

## Creating an empathic semantic chatbot

In [1]:
import nltk
import random
import pickle
from pprint import PrettyPrinter
from collections import defaultdict
from gensim.models import KeyedVectors

from utils import read_token, read_qa, BotHandler

### Loading pretrained models

First, we will load the pre-trained models we have: the emotion classifier and the word embedding model

In [2]:
def load_models():
    filename_vectorizer = '../lab3.machine_learning/models/utterance_vec.sav'
    filename_transformer = '../lab3.machine_learning/models/utterance_transf.sav'
    filename_encoder = '../lab3.machine_learning/models/label_encoder.sav'
    filename_classifier = '../lab3.machine_learning/models/svm_linear_clf_bow.sav'

    # load the classifier and the vectorizer from disk
    loaded_classifier = pickle.load(open(filename_classifier, 'rb'))
    loaded_vectorizer = pickle.load(open(filename_vectorizer, 'rb'))
    loaded_transformer = pickle.load(open(filename_transformer, 'rb'))
    loaded_label_encoder = pickle.load(open(filename_encoder, 'rb'))

    return loaded_vectorizer, loaded_transformer, loaded_classifier, loaded_label_encoder

In [3]:
def load_embeddings():
    path_to_model = '/Users/selbaez/Documents/PhD/CLTL/data/word_embeddings/GoogleNews-vectors-negative300.bin'
    embedding_model = KeyedVectors.load_word2vec_format(path_to_model, binary=True)

    return embedding_model

### Classifying emotions

We also have to define the funtions to classify the emotion and to get similar words. To classify the emotion in the message we will need the message and the classifier.

In [4]:
def classify_emotion(message, vectorizer, transformer, classifier, label_encoder):
    # Remember our classifier expects a list of texts
    message = [message]

    counts = vectorizer.transform(message)
    tfidf = transformer.transform(counts)
    predictions = classifier.predict(tfidf)

    for predicted_label in predictions:
        predicted_emotion = label_encoder.classes_[predicted_label]

    return predicted_emotion

### Detecting keywords

Now we have to define the function by which we will try to match the topic keywords (e.g. music) to the tokens found in the message. First we will have to expand the meaning of the message by finding similar words to the ones the user sent.

In [5]:
def get_similar_words(embedding_model, message, num_similar_words=10, verbose=False):
    # TODO filter by content words
    tokens = nltk.tokenize.word_tokenize(message)

    similar_words = defaultdict(list)
    for token in set(tokens):
        try:
            word_neighborhood = embedding_model.most_similar(positive=[token], topn=num_similar_words)
            for item in word_neighborhood:
                word = item[0].lower()
                similar_words[word].append(token)

        except KeyError as e:
            print("token '%s' not in embedding vocabulary" % token)

    if verbose:
        PrettyPrinter(indent=2).pprint(similar_words)

    return similar_words

Then we can find the intersection between our enriched message tokens and the pre-defined keywords in our qa dataset

In [6]:
def get_keyword_intersection(questions, similar_words):
    message_words = similar_words.keys()

    word_intersection = list(set(questions) & set(message_words))

    matched_words = {w: similar_words[w] for w in word_intersection}

    return matched_words

### Create a response

The last thing we need to do is create a response, given an incoming message. Here we can call the functions we defined before to classify emotion, enrich the meaning of the message, and match keywords.

In [7]:
def create_response(message, qa_data, vectorizer, transformer, classifier, label_encoder, embedding_model):
    response = "I cannot respond to this"
    emotion = classify_emotion(message, vectorizer, transformer, classifier, label_encoder)
    similar_words = get_similar_words(embedding_model, message)

    for i in qa_data['intents']:
        if emotion == i['category']:
            word_intersection = get_keyword_intersection(i['questions'], similar_words)

            if word_intersection:
                print("Emotion detected: {emotion}".format(emotion=emotion))
                print("Keywords detected [(keyword): (message_token)]: \n\t{intersection}".format(intersection=word_intersection))

                response = random.choice(i['responses'])
                break

    return response

### Try it out!

As in previous notebooks, we create out BotHandler and respond to the last message sent to the Telegram chatbot by a specific user

In [8]:
CLTL_TOKEN = read_token()
user_id = 408043639

qa_data = read_qa(qa_path = './data/emotions.json')
vectorizer, transformer, classifier, label_encoder = load_models()
embedding_model = load_embeddings()
bot = BotHandler(CLTL_TOKEN)

In [9]:
last_message = bot.get_last_message_by(user_id)
response = create_response(last_message, qa_data, vectorizer, transformer, classifier, label_encoder, embedding_model)
bot.send_message_to(user_id, response)


print("Received: {message}".format(message=last_message))
print("Responded: {response}".format(response=response))



Emotion detected: neutral
Keywords detected [(keyword): (message_token)]: 
	{'song': ['lyrics', 'songs'], 'melody': ['lyrics']}
Received: lyrics in their songs
Responded: That music is fine


## End of this notebook