# PA2.1 - Building your first Chatbot: Aleeza

### Introduction

In this notebook, you will be implementing your own version of the first ever Chatbot, ELIZA.

### Instructions

- Follow along with the notebook, filling out the necessary code where instructed.

- <span style="color: red;">Read the Submission Instructions and Plagiarism Policy in the attached PDF.</span>

- <span style="color: red;">Make sure to run all cells for credit.</span>

- <span style="color: red;">Do not remove any pre-written code.</span> We will be using the `print` statements to grade your assignment.

- <span style="color: red;">You must attempt all parts.</span> Do not assume that because something is for 0 marks, you can leave it - it will definitely be used in later parts.

## Background: ELIZA
#### The first ever Chatbot.

<div style="display: flex; justify-content: center;">
<img src="https://upload.wikimedia.org/wikipedia/commons/7/79/ELIZA_conversation.png" width="400" height="250" alt="ELIZA Conversation">
</div>

This Chatbot was released in 1966 (before the term Chatbot even existed) by Joseph Weizenbaum of MIT. This was a revolutionary program that allowed humans to converse with a computer. As you may have predicted, this was a retrieval-based system, very different from the "language models" of today. However, you may be surprised by how well it performed despite the simplicity of the system. Here is part of a conversation someone had with ELIZA, from the original paper (the capitalised sentences are ELIZA's responses):
\
\
Men are all alike.\
IN WHAT WAY\
They're always bugging us about something or other.\
CAN YOU THINK OF A SPECIFIC EXAMPLE\
Well, my boyfriend made me come here.\
YOUR BOYFRIEND MADE YOU COME HERE\
He says i'm depressed much of the time.\
I AM SORRY TO HEAR YOU ARE DEPRESSED\
It's true. I am unhappy.\
DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPPY\
......

The program used certain programmed rules to "transform" the input into the output. In order to do this, the program must first decompose the sentence based on certain criteria and then reassemble it based on the predefined assembly specifications. For example, if it is provided with the input sentence, "It seems that you hate me", it may be decomposed into:

1) It seems that
2) you
3) hate
4) me

Of these, (2) and (4) are recognised as key words. The program can then use the remaining sections of the sentence based on pre-defined rules to construct an output. For example, it may be programmed with the rule:

decomposition template:\
(0 YOU 0 ME)\
and the reassembly rule:\
(WHAT MAKES YOU THINK I 3 YOU).

Here, the "0" represents any number of words, whereas the "3" represents the 3rd part of the sentence from before. Hopefully, this makes the implementation a little clearer. If not, don't worry as you'll understand how it works once you start implementing your own version!

For more details on the original ELIZA implementation, [Click Here](https://web.stanford.edu/class/cs124/p36-weizenabaum.pdf).


## Specifications

As described above, your task will be to first read in a user string, then modify it to provide an output (sometimes subtly, sometimes drastically, depending on the input string). This should be easy to do with the regex library, the specifics of which were discussed in class.

\
Your program should be able to handle all 1st and 2nd person pronouns, all 1st and 2nd person subject-verb pairs with the verb be and all possible forms of the verb. If it is unclear what is meant by this, you might want to do some googling.

\
An example is as follows:

Regular Expression: I am (.*)\
Response: How long have you been %1?

Example Input that matches: I am sad.\
Example Response: How long have you been sad?

Please note that this is a simplified version of the chatbot, and the original bot had a much more complex algorithm behind it.

You will have two tables to store all the logic of your bot:
1. Reflection Table
2. Response Table

These will be described in detail in the cells below.

## Imports

These are the ONLY imports you can use for this part of the assignment.

In [1]:
import json
import re
import random

## Tables

These are your reflection and response tables.

#### Reflection Table

This table serves to convert your pronouns from first person to second person and vice versa. You should list all forms of the pronouns and their corresponding "reflection". (eg. i : you)\
\
You should also do the same for all the forms of the verb "be". (eg. am : are)\
\
Note: You do not need to add plural pronouns such as "we".\
\
This table will be represented as a dictionary. (The first entry is listed as an example below)

In [2]:
reflectionTable = {
    "i": "you",
    "me": "you",
    "my": "your",
    "mine": "yours",
    "am": "are",
    "was": "were",
    "were": "was",
    "i'd": "you would",
    "i'll": "you will",
    "i've": "you have",
    "you": "I",
    "your": "my",
    "yours": "mine",
    "are": "am",
    "you're": "I'm",
    "you'll": "I'll",
    "you've": "I've"
}

#### Response Table

This table is in the form of a nested list. Each entry is a list, with the first term being your regular expression and the second term being a list of possible responses. "%n" represents the nth match. You will need to handle this in your code later when replacing the relevant parts of the text.

Since this is a fairly large table, you will fill out the regular expressions and the responses in a json file: "responseTable.json"

\
In this table, you must include ALL subject-verb pairs for the verb "be". Do this for first, second and third person pronouns. (eg. I am ...) You must add at least 3 appropriate responses for each of these pairs. You need not account for the contracted versions of the pairs. But, DO include the corresponding question statements for each of these pairs. You can assume there will be no past-tense or future-tense inputs.\
\
Furthermore, in the case that you encounter no matches, you must have fallbacks. Due to this, you must also account for the following cases:
1. (I feel ...), (I want ...), (I think ...)
2. Subject with an unknown verb
3. An unrecognised question
4. Any string

Include 4 or more responses for these cases as they will likely be encountered more often.\
\
Lastly, add at least 3 more subject-verb pairs, with at least 1 response each. These can be anything you like. Have fun with it (but keep it appropriate).\
\
For example:

Regex: I voted for (.*)

Response: How did voting for (.*) make you feel?

Please ensure the correct order, as you will only be checking the first match later on.\
Once again, an example entry has been provided.

In [3]:
# Add entries in the JSON file

responseTable = json.load(open('responseTable.json'))

## Helper Functions (Optional)

If you wish to modularise your code to make your life simpler in the upcoming cells. Please define your helper functions here.

In [4]:
# Code here
#tokenizes a sentence into a list of strings with delimiter being a white space
def word_tokenizer(sentence):
  """Splits a string into a list based on whitespace as a delimiter.

  Args:
    text: The string to be split.

  Returns:
    A list of words from the original string.
  """
  return sentence.split()



## Aleeza Class

This is the class you will be implementing all of your bot's functionality in. As you will see, this is very straightforward and most of the actual work will be done while writing the response table. We will call our version Aleeza.

In [5]:
class Aleeza:
  def __init__(self, reflectionTable, responseTable):
    """
    Initiliase your bot by storing both the tables as instance variables.
    You can store them any way you want. (Dictionary, List, etc.)
    """

    # Code here
    self.reflectionTable = reflectionTable
    self.responseTable = responseTable

  def reflect(self, text):
    """
    Take a string and "reflect" based on the reflectionTable.

    Return the modified string.
    """
    
    # Code here
    text = text.lower()
    if text.endswith("."):
      text = text[:-1]
    word_tokens = word_tokenizer(text)
    for idx, word in enumerate(word_tokens):
      if word in self.reflectionTable:
        word_tokens[idx] = self.reflectionTable[word]
    
    return " ".join(word_tokens)
      
      
    

  def respond(self, text):
    """
      Take a string, find a match, and return a randomly
      chosen response from the corresponding list.

      Do not forget to "reflect" appropriate parts of the string.

      If there is no match, return None.
    """

    # Code here
    text = text.lower()
    if text.endswith("."):
      text = text[:-1]
    for pattern, responses in self.responseTable:
      pattern = pattern.lower()
      match = re.match(pattern, text)
      if match:
        chosen_response = (random.choice(responses)).lower()
        reflected_captures = []
        for i in range(1, match.lastindex + 1):
          capture_group = match.group(i)
          reflected_capture = self.reflect(capture_group)
          reflected_captures.append(reflected_capture)

        for i, reflected_capture in enumerate(reflected_captures):
          placeholder = "%" + str(i + 1)
          chosen_response = chosen_response.replace(placeholder, reflected_capture)
        return chosen_response
    
    return None

    


## Test your Bot

You can use this interface to manually check your bot's responses.

In [6]:
def command_interface():
    print('Aleeza\n---------')
    print('Talk to the program by typing in plain English.')
    print('='*72)
    print('Hello.  How are you feeling today?')

    s = ''
    therapist = Aleeza(reflectionTable, responseTable)
    while s != 'quit':
        try:
            s = input('> ')
        except EOFError:
            s = 'quit'
        print(s)
        while s[-1] in '!.':
            s = s[:-1]
        print(therapist.respond(s))

In [8]:
command_interface()

Aleeza
---------
Talk to the program by typing in plain English.
Hello.  How are you feeling today?
i am good
why do you think you are good?
i dont know
ok, lets change the topic a little. tell me about your family.
quit
hmm, whats your favourite dish.


## Test Sentences

After testing your bot, you have likely seen that it does not work very well yet. This goes to show the immense amount of work that was put into the original ELIZA program.\
In any case, having concocted all of your (hopefully) appropriate responses, you now need to demonstrate your bot handling all the cases listed above. To do this, you must provide an example sentence handling each of the regular expressions you have listed in your response table.

In [9]:
test_sentences = [
    "I am happy.",
    "You are sad.",
    "He is tired.",
    "She is angry.",
    "It is raining.",
    "I feel excited.",
    "I want pizza.",
    "I think therefore I am.",
    "John is a doctor.",
    "I like ice cream.",
    "I hate Mondays.",
    "I love music.",
    "I have a cat.",
    "I eat breakfast.",
    "I sleep eight hours a night.",
    "I study computer science.",
    "This is a test sentence."
]


In [10]:
def get_responses(sentence_list, bot):
    """
    Get a response for each sentence from the list and return as a list.
    """

    # Code here
    responses = []
    for sentence in sentence_list:
        responses.append(bot.respond(sentence))
    return responses

In [11]:
therapist = Aleeza(reflectionTable, responseTable)

for pair in zip(test_sentences, get_responses(test_sentences, therapist)):
    print('='*72)
    print(pair[0])
    print(pair[1])

I am happy.
you are happy?
You are sad.
why do you think you are sad?
He is tired.
why do you think he is tired?
She is angry.
what makes you say she is angry?
It is raining.
how long has it been raining?
I feel excited.
why do you feel excited?
I want pizza.
what makes you want pizza?
I think therefore I am.
how does thinking therefore you are make you feel?
John is a doctor.
how does a doctor affect john?
I like ice cream.
why do you like ice cream?
I hate Mondays.
what makes you hate mondays?
I love music.
why do you love music?
I have a cat.
why do you have a cat?
I eat breakfast.
why do you eat breakfast?
I sleep eight hours a night.
how does sleeping eight hours a night make you feel?
I study computer science.
why do you study computer science?
This is a test sentence.
what makes you say this is a test sentence?


# Giving Aleeza Emotional Intelligence

In the next part of the assignment, you will be giving your chatbot some emotional intelligence. This will be done by training a simple emotion classification model. You will then use this model to classify the sentiment of the user's input and respond accordingly.\
\
How our logic will work is as follows:
1. If there is a match in the response table, we will use the response from the table.
2. If there is no match, we will classify the emotion of the input and respond accordingly.

The model we will use is a simple Naive Bayes Classifier. This is a simple model that works well with text data. You will be using the `scikit-learn` library to train the model, and the huggingface `datasets` library to get the data.

## Imports

These are the ONLY imports you can use for this part of the assignment.

In [12]:
import datasets
import sklearn
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report

  from .autonotebook import tqdm as notebook_tqdm


## Dataset

We will be using the `emotion` dataset from the `datasets` library. This dataset contains text data and the corresponding emotion. You will use this data to train your model. Load this dataset using the `load_dataset` function from the `datasets` library.

Next, split the dataset into training and testings sets.\
(HINT: This has already been done for you in the dataset you loaded)

In [13]:
"""
Load the emotion dataset from Hugging Face
"""
from datasets import load_dataset

dataset = load_dataset("emotion")


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


In [14]:
# Access the training split
train_dataset = dataset['train']

# Access the testing split
test_dataset = dataset['test']

# Access the validation split
validation_dataset = dataset['validation']

In [15]:
print(train_dataset)
print(validation_dataset)
print(test_dataset)

Dataset({
    features: ['text', 'label'],
    num_rows: 16000
})
Dataset({
    features: ['text', 'label'],
    num_rows: 2000
})
Dataset({
    features: ['text', 'label'],
    num_rows: 2000
})


In [16]:
print(train_dataset.shape)
print(validation_dataset.shape)
print(test_dataset.shape)

(16000, 2)
(2000, 2)
(2000, 2)


In [17]:
"""
Split the dataset into training and testing sets
"""

# Code below
train_data = train_dataset['text']
train_labels = train_dataset['label']
test_data = test_dataset['text']
test_labels = test_dataset['label']

## Training the Model

Just like in your previous assignment, you will now train the model and evaluate it.

In [24]:
"""
Vectorise the data and train the model
"""
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score

# Code here
count_vectorizer = CountVectorizer()
tfidf_vectorizer = TfidfVectorizer()

logistic_regression_model = LogisticRegression(max_iter=1000)

pipeline = Pipeline([
    ('count_vectorizer', count_vectorizer),
    ('logistic_regression', logistic_regression_model)
])

pipeline.fit(train_data, train_labels)


"""
Predict on the test set
"""

predicted_labels = pipeline.predict(test_data)

accuracy = accuracy_score(test_labels, predicted_labels)
print(f"accuracy: {accuracy}")

"""
Print classification report
"""
print(classification_report(test_labels, predicted_labels))

accuracy: 0.8895
              precision    recall  f1-score   support

           0       0.94      0.94      0.94       581
           1       0.90      0.93      0.92       695
           2       0.78      0.72      0.75       159
           3       0.88      0.89      0.89       275
           4       0.86      0.85      0.86       224
           5       0.70      0.59      0.64        66

    accuracy                           0.89      2000
   macro avg       0.84      0.82      0.83      2000
weighted avg       0.89      0.89      0.89      2000



In [39]:
pipeline.predict(["this is very good"])[0]

1

## Putting it all together

Now that we have our classification model, we can modify our chatbot to use it.

First, we will remove the fallback responses from our response table, i.e. the following cases:
1. (I feel ...), (I want ...), (I think ...)
2. Subject with an unknown verb
3. An unrecognised question
4. Any string

Remove these and save your response table as "responseTable2.json".

In [26]:
# Make a new file "responseTable2.json" and add your modified table to it

responseTable = json.load(open('responseTable2.json'))

#### Emotion Response Table

This table will be a dictionary with the emotions as keys and a list of possible responses as values. You should include at least 2 responses for each emotion.

In [27]:
emotionTable = {
    0: [ # sadness
        "I'm sorry to hear that.",
        "It's okay to feel sad sometimes."
    ],
    1: [ # joy
        "That's wonderful!",
        "I'm happy for you."
    ],
    2: [ # love
        "Love is a beautiful thing.",
        "Cherish those you love."
    ],
    3: [ # anger
        "Take a deep breath and try to stay calm.",
        "It's important to address your anger in a healthy way."
    ],
    4: [ # fear
        "You're not alone. We'll get through this together.",
        "Remember to take things one step at a time."
    ],
    5: [ # surprise
        "Wow, that's unexpected!",
        "I didn't see that coming!"
    ]
}


### Modifying your Chatbot

You will now modify your chatbot to use the emotion classifier. If there is a match in the response table, we will use the response from the table. If there is no match, we will classify the emotion of the input and respond accordingly.

In [40]:
class IntelligentAleeza(Aleeza):
    def __init__(self, reflectionTable, responseTable, emotionTable, classifier):
        """
        Initialise your bot by calling the parent class's __init__ method,
        and then storing the emotionTable as an instance variable.

        Next, store the classification model as an instance variable.
        """
        
        # Code here
        super().__init__(reflectionTable, responseTable)
        self.emotionTable = emotionTable
        self.classifier = classifier

    def smart_respond(self, text):
        """
        Take a string, call the parent class's respond method.
        If the response is None, then respond based on the emotion.
        """

        # Code here
        response = super().respond(text)
        if response is None:
            emotion = self.classifier.predict([text])[0]
            if emotion in self.emotionTable:
                return random.choice(self.emotionTable[emotion])
            else:
                return "I'm not sure how to respond to that."
        return response

## Test your New Bot

Randomly select 5 sentences from the test set and test your bot. You should see that it now responds with an appropriate message based on the emotion detected in the input (when there is no match).

In [41]:
def get_responses(sentence_list, bot):
    """
    Get a response for each sentence from the list and return as a list.
    Use your new smart_respond method.
    """

    # Code here
    sentences = []
    for sentence in sentence_list:
        sentences.append(bot.smart_respond(sentence))
    return sentences


In [42]:
"""
Create an instance of the IntelligentAleeza class
"""
intelligent_therapist = IntelligentAleeza(reflectionTable, responseTable, emotionTable, pipeline) # Code here

"""
Get 5 random test instances from the test data
"""

# Code here
test_instances = random.sample(test_dataset['text'], 5)


""" 
Get responses from the intelligent_therapist 
"""

responses = get_responses(test_instances, intelligent_therapist)


"""
Print the test instances and the responses
"""
for pair in zip(test_instances, responses):
    print('='*72)
    print(pair[0])
    print(pair[1])

i feel very honoured to have been asked
how does feeling very honoured to have been asked affect you?
i am feeling so reluctant and overwhelmed i try to think of the alternative abandoning that dream
you are feeling so reluctant and overwhelmed you try to think of the alternative abandoning that dream?
i havent been sick in the winter very often since i quit smoking years ago so seldom in fact that now when i do get sick i feel outraged hows that for rational thinking
It's important to address your anger in a healthy way.
i had a fab christmas and an amazing new year with my family and friends and against all odds i feel very optimistic about
That's wonderful!
i mean the idea is intoxicating of course and it feels amazing when its happening but what happens in the morning when you wake up and you have to go to work and so amp so is all up in your shit about something that is completely impractical
what makes you say you mean the idea is intoxicating of course and it feels amazing when 

# Fin.