# Final Assignment: adapt the chatbot to answer to emotions and topics

Copyright: Vrije Universiteit Amsterdam, Faculty of Humanities, CLTL

RMA/Text Mining MA, Introduction to HLT

This notebook describes the final assignment of the Human Language Technology course. 

**Learning goals**
* train a supervised classifier (SVM)
* evaluate a supervised classifier (SVM)
* working with pretrained emotion classifiers
* working with pretrained word embeddings
* compute the similarity and relatedness of sets of words

We assume you have worked through the following notebook:

* **Lab2.1.NLTK_wordnet.ipynb**
* **Lab2.2.Wikipedia2vec.ipynb**
* **Lab3.2.ml.evaluation.ipynb**
* **Lab3.5.ml.emotion-detection.ipynb**
* **Lab3.6.ml.emotion-detection-embeddings.ipynb**
* **Lab3.7.ml.emotion-detection-on-tweets.ipynb**
* **Lab4.1.ml.introduction-to-telegram.ipynb**
* **Lab4.2.ml.question-anwering.ipynb**
* **Lab4.3.ml.empathic-chatbot.ipynb**

This notebook is very similar to Lab4.3. Look at the main logic of our solution in that lab and use it as an example for this assignment.

For this asisgnment you will need to edit the `assignment_data.json` file under the `data` folder of this lab. There you **should** change the list of responses per *intent*, as well as the 


The notebook provides placeholders to put your own code. You will not need to change this notebook except for importing your solutions and applying your notebook to the test sentences that are given to you.

## Add your code

In [1]:
# TODO: You can add any import statements you need here

import random

from utils import read_token, read_qa, BotHandler

You will have to implement three main functions: 

* First, you will need to load the pretrained emotion classifier of your choice. 
* Second, you will need a function to classify the emotion of a given message, using the loaded model. 
* Finally, the third function must match the message to a list of keywords by measuring its semantic similarity.

We have provided the function definitions with hints for each function's parameters and returned objects/variables. You can use this as guidance, and later on add more parameters or returned objects as needed. You may also add any helper functions you need for these three main functions.

In [2]:
def load_semantic_model():
    """ Function to load word embedding models needed """
    # TODO: Add code to load a word semantic model (i.e. word embedding)
    embedding_model = None
    
    return embedding_model

In [3]:
def load_classifier():
    """ Function to load pre-trained machine learning models needed """
    # TODO: Add code to load pre-trained models (e.g. the classifiers you trained in Lab3.3, 3.4 or 3.5)
    
    # TIP: There might be more than one model you need to load, for example any transformer objects 
    # or encoder objects you need to process data before applying a classifier
    classifier = None
    preprocessing_tools = None

    return classifier, preprocessing_tools


In [4]:
def classify_emotion(message, classifier, preprocessing_tools):
    """ Function to process a message and predict the emotion it reflects """
    # TODO: Add code to classify emotion in a message
    
    # TIP: Remember you need to preprocess the incoming message in the same way as you 
    # processed the training data for the classifier
    predicted_emotion = None

    return predicted_emotion

In [5]:
def get_similar_words(embedding_model, message):
    """ Function to enrich the message with similar words for better keyword detection """
    # TODO Add code to retrieve similar words
    # The return format is a dictionary {'similar_word': ['message_token']}
    similar_words = None

    return similar_words


In [6]:
def semantic_similarity(message, keywords):
    """ Function to determine if the message matches certain keywords according to some semantic similarity or relatedness"""
    # TODO: Add code to measure the semantic similarity between the message and some given keywords (e.g. methods you learned in Lab 2.1, 2.2 and 2.3)
    
    # TIP: You can process the message in any way you prefer (e.g. exploiting some of the linguistic 
    # features we explored in Lab 1.2 and 1.3)
    matched_words = None
    
    return matched_words

## Create a response

Next, we use a very similar function to create a response as in Lab 3.4. We create a response given an incoming message, using the functions we defined before to 1) classify emotion, and 2) match keywords according to the meaning of the message.

The main logic of this function is given, but you might need to adapt it if you change the returned objects of a function or if you use any helper functions.

In [7]:
def create_response(message, qa, classifier, preprocessing_tools, embedding_model):
    # Determine default response
    reply = "Default response"
    
    # Classify emotion in message
    emotion = classify_emotion(message, classifier, preprocessing_tools)
    
    # Enrich message
    similar_words = get_similar_words(embedding_model, message)

    # Loop through the predefined intents, and generate a response if there is a match (emotion + keywords)
    word_intersection = {}
    for i in qa['intents']:
        
        # Only consider intents related to the emotion detected 
        if emotion == i['category']:
            
            # Try to match the message to the set of predefined keywords
            word_intersection = semantic_similarity(message, keywords=i['keywords'])

            # If there is a match, generate a response response 
            if word_intersection:
                reply = random.choice(i['responses'])
                break

    return reply, emotion, word_intersection


## Test your approach

We now setup our BotHandler as we have done throughout Lab4. We also load our ML models and the Q&A data.

In [8]:
CLTL_TOKEN = read_token()
user_id = 408043639 # TODO: Remember to put here YOUR user id
bot = BotHandler(CLTL_TOKEN)

qa_data = read_qa(qa_path = './data/assignment_data.json')
classifier, preprocessing_tools = load_classifier()
embedding_model = load_semantic_model()

To test the chatbot on Telegram, you can use the same logic as in Lab 4.3

In [9]:
last_message = bot.get_last_message_by(user_id)
response, emotion, word_intersection = create_response(last_message, 
                                                       qa_data, 
                                                       classifier,
                                                       preprocessing_tools, 
                                                       embedding_model)
bot.send_message_to(user_id, response)

print("Received: {message}".format(message=last_message))
print("Responded: {response}".format(response=response))
print("Emotion detected: {emotion}".format(emotion=emotion))
print("Keywords detected [(keyword): (message_token)]: \n\t{intersection}".format(intersection=word_intersection))


Received: I can't believe Adele is singing the next reggaeton song with Maluma, I thought she was not that type of artist
Responded: Default response
Emotion detected: None
Keywords detected [(keyword): (message_token)]: 
	{}


Alternatively, you can test your approach simply on the notebook as follows:

In [10]:
last_message = "I love hiphop, especially when dancing in the club"
response, emotion, word_intersection = create_response(last_message, 
                                                       qa_data, 
                                                       classifier,
                                                       preprocessing_tools, 
                                                       embedding_model)

print("Received: {message}".format(message=last_message))
print("Responded: {response}".format(response=response))
print("Emotion detected: {emotion}".format(emotion=emotion))
print("Keywords detected [(keyword): (message_token)]: \n\t{intersection}".format(intersection=word_intersection))


Received: I love hiphop, especially when dancing in the club
Responded: Default response
Emotion detected: None
Keywords detected [(keyword): (message_token)]: 
	{}


## End of the assignment