## Eliza implementation

This notebook is taken from: https://github.com/itu-qsp/eliza

Eliza is a very basic chatbot created by Joseph Weizenbaum in 1964. It was intended to acts as a psychotherapist by simply echoing questions on the basis of cues in the human input. It uses very simple rules and patterns to create the responses of the system.

You can inspect the **eliza_language.py** to see the patterns for the cues and the corresponding responses. You can easly adapt the variable **PSYCHOBABBLE** yourself to adapt the system to your own insights, creating your own Eliza. The **eliza.py** is a simple script that gets each input from the human, matches it with the patterns and returns the response.

In this notebook, we show how you can import the Eliza code, have a conversation, store this information in a socalled dataframe from the Pandas package. We show you how to iterate over the dataframe to add an emotion label to the human input, which is added to the dataframe.

Finally, we show how you can save the conversation with the annotations to disk for later usage.

## 1. Loading and running Eliza

We adapted the Eliza code in the *eliza.py* file from the original Github so that the Speaker information and a turn identifier are saved in a JSON structure for each turn. We wil use the stored conversation to create a Pandas frame and try to assign emotions to each utterance.

In [1]:
import eliza as el

The next cell starts the chat with Eliza. To stop type "stop", "quit" or "bye".

In [2]:
el.talk_to_me()

My name is Eliza. What is your name?


>  Pi


Hello Pi. How are you feeling today?


>  Sad


Let's change focus a bit... Tell me about your family.


>  No


I see.  And what does that tell you?


>  Nothing


nothing.


>  bye


We can now inspect the stored conversation and load the JSON in a Pandas dataframe.

In [3]:
for turn in el.conversation:
    print(turn)

{'utterance': 'Hello Pi. How are you feeling today?', 'speaker': 'Eliza', 'turn_id': 0}
{'utterance': 'Sad', 'speaker': 'Pi', 'turn_id': 1}
{'utterance': "Let's change focus a bit... Tell me about your family.", 'speaker': 'Eliza', 'turn_id': 1}
{'utterance': 'No', 'speaker': 'Pi', 'turn_id': 2}
{'utterance': 'I see.  And what does that tell you?', 'speaker': 'Eliza', 'turn_id': 2}
{'utterance': 'Nothing', 'speaker': 'Pi', 'turn_id': 3}
{'utterance': 'nothing.', 'speaker': 'Eliza', 'turn_id': 3}


### Using Pandas

Although there is nothing wrong in using just the JSON structure shown above, we are loading the JSON in a Pandas dataframe for viewing and saving the data in more human readable table format. [Pandas](https://pandas.pydata.org) is a populair packages for loading and saving data.

Pandas needs to be installed separately on your local machine first. If you have not done this please follow the next instruction, otherwise you can skip the installation and directly proceed with importing it.

As with other packages, make sure you install it within the same environment that you used to install Anaconda or install it within this notebook through the next cell. Please install pandas locally from the command line, using either of the two following instructions:


* >`conda install pandas`
* >`python -m pip install --upgrade pandas`


If you succesfully installed Pandas, you should be able to import it in the next cell. Note that you may have to restart the Kernel of the notebook or even restart Jupyter to make it work.

In [4]:
import pandas as pd

If the import worked, we can now create a DataFrame by loading the JSON structure of the conversation which is represented in memory as a dictionary.

In [5]:
df = pd.DataFrame.from_dict(el.conversation)

Pandas provides to function to easily inspect the data that is loaded.

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   utterance  7 non-null      object
 1   speaker    7 non-null      object
 2   turn_id    7 non-null      int64 
dtypes: int64(1), object(2)
memory usage: 296.0+ bytes


The **info()** function gived high level information and statistics on the data: what rows, columns and what type of values. The **head()** function gives you a list of the top rows.

In [7]:
df.head()

Unnamed: 0,utterance,speaker,turn_id
0,Hello Pi. How are you feeling today?,Eliza,0
1,Sad,Pi,1
2,Let's change focus a bit... Tell me about your...,Eliza,1
3,No,Pi,2
4,I see. And what does that tell you?,Eliza,2


# 2. Annotate the conversation with emotion labels

We now define a simple loop over the dataframe with turns in the conversation to add an Ekman emotion label. 

We do not want to annotate the all utterances but only the human input. We assume that Eliza has no emotions and therefore always assign the emotion *neutral* to Eliza's utterances.

How to iterate over the data in a Pandas frame is explained in more detail in **Lab3.5**.

In [11]:
#### Here are the 6 basic emotions that Ekman defined for facial expression. Neutral is the the 7th value
ekman_labels = ["anger", "disgust", "fear", "joy", "sadness", "surprise", "neutral"]

gold_labels = []
gold = ""
## We use a for-loop over the enumeration of all rows in the speaker column 
## to get an index of the row as well as the speaker information
## We can use the index with the "iloc" data element to get other column values from the same row
## We get the utterance with the 
for index, speaker in enumerate(df['speaker']):
    utterance = df['utterance'].iloc[index]
    print(speaker, utterance)
    if speaker=='Eliza':
        gold= 'neutral'
    else: 
        ### we keep getting the user input till one of them matches an Ekman label
        while not gold in ekman_labels:
            gold = input("label> ")
    gold_labels.append(gold)
    gold=""

print(gold_labels)

Eliza Hello Pi. How are you feeling today?
Pi Sad


label>  sadness


Eliza Let's change focus a bit... Tell me about your family.
Pi No


label>  anger


Eliza I see.  And what does that tell you?
Pi Nothing


label>  anger


Eliza nothing.
['neutral', 'sadness', 'neutral', 'anger', 'neutral', 'anger', 'neutral']


In [12]:
### We add a column of gold labels to th dataframe when we are done. 
###The gold_labels list should have the same length as the number of rows.
df['Gold']=gold_labels
df.head()

Unnamed: 0,utterance,speaker,turn_id,Gold
0,Hello Pi. How are you feeling today?,Eliza,0,neutral
1,Sad,Pi,1,sadness
2,Let's change focus a bit... Tell me about your...,Eliza,1,neutral
3,No,Pi,2,anger
4,I see. And what does that tell you?,Eliza,2,neutral


# 3. Save the conversation to disk

You need the annotated conversation for the final assignment. We therefore save it to disk so that we can load it later.

In [13]:
file = "my_emotional_conversation.csv"
df.to_csv(file)

## End of Notebook