# Lab7 Final assignment: putting it all together

The final assignment is an individual assignment in which you put things together. You earn 5 points when you carry out the all the required tasks and an additional max 5 points for analysing and discussing the results poperly. The maximum score is 10 points.

<ol>
    <li><b>Annotated conversations</b>
        <ul>
        <li> Load the annotated conversations in a Pandas data frame (code is given)
        <li> Provide a statistical analysis of the distribution of the emotion labels in the conversations
        </ul>
    <li><b>Baselines</b>
        <ul> 
            <li>Define a majority baseline given the label distribution in the test set
            <li>Create a lexical baseline using the emotion dictionary that you created in Lab2 from Wordnet and the Word2Vec models. This baseline checks each utterance for the Ekman emotion words and chooses the emotion with most hits. If there is no match, the utterance is neutral.
            <li>Apply the two baselines to the conversations
            <li>Generate a classification report for both
        </ul>
    </li>
    <li><b>Create two different BoW-SVM emotion classifiers, one focusing on high recall and the other on high precision</b>
        <ul>
        <li> Load the train, test and development data combining MELD and WASSA data sets
        <li> Give an overview and discuss the statistics of the Ekman emotion labels
        <li> High recall:
            <ul>
                <li>Motivate and define the settings for getting high recall scores
                <li>Create a BoW vector representation and train an SVM classifier
                <li>Save the classifier to disk
            </ul>
        <li> High precision:
            <ul>
                <li>Motivate and define the settings for getting high precision scores
                <li>Create a BoW vector representation and train an SVM classifier
                <li>Save the classifier to disk
            </ul>
        <li> Load both classifiers in this notebook and apply them to the utterances in the conversations
        <li> Generate a classification report and confusion matrix for both.
        </ul>
    <li><b>Discuss the result</b>
        <ul>
        <li> Report on similarities and differences in performances and confusion matrixes 
        <li> What did you expect (recall and precision) and is this confirmed or falsified?
        <li> How do the classifiers compare to the two baselines?
        </ul>
    <li><b>Error analysis</b>
        <ul>
        <li> Manually select 20 utterances for an error analysis
        <li> Motivate your selection to get a deeper understanding of the cases where the classifiers underperform
        <li> Apply the BERT-GO model to the 20 utterances and map the emotion labels to the Ekman emotions
        <li> Prompt an LLM to classify the 20 utterances with the Ekman emotions
        <li> Compare the BERT-GO and Llama annotations with your SVM classifier results and the baselines.
        <li> Discuss how you may improve the SVM classifiers
        </ul>    
</ol>

For 2.), you build two BoW SVM classifiers in a separate notebook (use **lab5.meld-tweet-bow-svm-emotion-classifier.ipynb**) by combining the MELD and Tweet data into a single set of training data. Note that you can also include the test and development data for training since we are applying the model to the Llama conversations and not to the MELD and Wassa tests. 

## Submission

The assignment should be made individually and submiited on CANVAS as a zip file that includes the following:

   1. The notebook to create two different BoW-SVM classifiers: **lab5.meld-tweet-bow-svm-emotion-classifier.ipynb**
   2. The current notebook **lab5.final_assignment.ipynb** with your code
   3. A PDF report of max 6/7 pages:
       1. Section 1 (1 page): what you have done and what choices did you make: be explicit about the settings and any data changes you made to train the two different classifiers on MELD+Tweets to get a high recall and a high precision version.
       2. Section 2 (2 pages): report on the type/token ratio and the label distribution of the conversations, compare this to MELD and WASSA type/token ratios and label distributions. Motivate your choice of settings for the SVM classifiers on the type/token rations and label distributions.
       2. Section 3 (2 pages): report on the Ekman classification results of the two SVM versions and the baselines. systems. Use a single table for recall, precision and f-score for all four classifiers and put confusion matrixes in the appendix.
       3. Section 4: (1 page): motivate the sample for the error analysis compare the baselines and the SVM classifier with the results for the LLM (e.g. Llama or Qwen) and the go-BERT-classifier for the 20 cases.
       4. Section 5: (1 page): how to improve the SVM classifiers.
    
Use the notebooks that are given as a guide with the code and the output. You should NOT discuss the results in the notebooks but in the report. Use the notebooks to run the experiments and get the results. Include the tables and figures in the report.

Some utility functions presented during this course are needed for this assignment. They are all included in the Python file **lab_util.py**. The next import makes these functions available in this notebook. There is no need to copy these functions into this notebook explicitly. After the import, you can call the function from **util**, e.g.:

```emotion_labels = util.sort_predictions(emotion_labels[0])```

```ekman_labels = util.get_averaged_mapped_scores_by_threshold(ekman_map, emotion_labels, threshold)```

```util.plot_labels_with_counts(labels, values)```

In [14]:
import lab_util as util

## 1. Annotated conversations

### 1.1 Loading the conversation

You will receive a file with all the annoated conversations in JSON format. Adapt the path below to load this file using Pandas to create a data frame.

In [7]:
# THE CODE TO LOAD THE TEST DATA AND LABELS
# replace the file path shown here to the JSON file that is given for the final assignment
import pandas as pd
annotation_file = "/Users/piek/Desktop/t-MA-HLT-introduction-2024/ma-hlt-labs/lab0.llama/other/adjudicated_annotations.json"
df = pd.read_json(annotation_file)
df = df.dropna()

In [8]:
df.head()

Unnamed: 0,utterance,speaker,turn_id,Annotator,Gold,Votes,Annotators,Adjudication
0,Hi there! Going going well!,Raul,4,Pawel,neutral,"[neutral, joy, joy]","[Pawel, Leo Mylonadis, Matt]",joy
1,From hike? Do yo want to convince me you're a ...,Raul,6,Pawel,neutral,"[neutral, neutral, surprise]","[Pawel, Leo Mylonadis, Matt]",neutral
2,I hope that your dried llama food isn't anythi...,Raul,8,Pawel,disgust,"[disgust, disgust, neutral]","[Pawel, Leo Mylonadis, Matt]",disgust
3,Don't wink wink at me!,Raul,10,Pawel,anger,"[anger, anger, anger]","[Pawel, Leo Mylonadis, Matt]",anger
4,I think all the LLM hype is quite suspicious!,Raul,12,Pawel,anger,"[anger, fear, surprise]","[Pawel, Leo Mylonadis, Matt]",anger


In [9]:
for index, adjudication in enumerate(df['Adjudication']):
    gold = df['Gold'].iloc[index]
    if not gold == adjudication:
        votes = df['Votes'].iloc[index]
        print(gold, votes, adjudication)    

neutral ['neutral', 'joy', 'joy'] joy
neutral ['neutral', 'sadness', 'sadness'] sadness
anger ['anger', 'sadness', 'sadness'] sadness
neutral ['neutral', 'sadness', 'sadness'] sadness
anger ['anger', 'sadness', 'sadness'] sadness
neutral ['neutral', 'anger', 'neutral', 'anger', 'disgust', 'anger'] anger
disgust ['disgust', 'anger', 'anger'] anger
neutral ['neutral', 'joy', 'joy'] joy
neutral ['neutral', 'joy', 'joy'] joy
neutral ['neutral', 'anger', 'anger'] anger
disgust ['disgust', 'anger', 'anger'] anger
sadness ['sadness', 'disgust', 'disgust'] disgust
surprise ['surprise', 'joy', 'joy'] joy
neutral ['neutral', 'joy', 'joy'] joy
anger ['anger', 'disgust', 'disgust'] disgust
neutral ['neutral', 'joy', 'joy'] joy
sadness ['sadness', 'fear', 'fear'] fear
joy ['joy', 'surprise', 'surprise'] surprise
fear ['fear', 'surprise', 'surprise'] surprise
anger ['anger', 'neutral', 'neutral'] neutral
joy ['joy', 'neutral', 'neutral'] neutral
sadness ['sadness', 'anger', 'anger'] anger
neutral ['

### 1.2 Provide a statistical analysis of the conversational data with the label distribution

In [15]:
# [YOUR CODE FOR THE ANALYSIS GOES HERE]

## 2. Baselines

### 2.1 Majority baseline

In [19]:
# [YOUR CODE FOR APPLYING THE MAJORITY BASELINE GOES HERE]

### 2.2 Lexical baseline

In [22]:
# [YOUR CODE FOR APPLYING THE LEXICAL BASELINE GOES HERE]

### 2.3 Classification report

In [23]:
#[HERE COMES THE CODE TO GENERATE THE CLASSIFICATION REPORT]

## 3 Apply the BoW classifiers for high recall and high precision

### 3.1 Load the BoW SVM classifier from MELD and TWEETS

In [24]:
# HERE COMES THE CODE TO LOAD THE BOW SVM CLASSIFIER BUILT FROM MELD AND WASSA TWEETS

### 3.2 Apply each classifier to the conversation

In [25]:
# HERE COMES THE CODE TO APPLY THE CLASSIFIER TO THE UTTERANCES

## 3.3 Evaluation

In [13]:
# HERE COMES THE CODE TO GENERATE THE CLASSIFICATION REPORT AND CONFUSION MATRIX

## 4. Discussion of the results

The dicussion of the results of the baselines and the SVM classfier should go to the report.

## 5. Error analysis

Motivate a sampel 20 critical cases. Compare the performance of the SVM classifiers, the baselines, GO-Emotions and the LLM-annotation on the 20 critical cases. Results should be described in the report.

## End of the assignment