# DEMO: Empathy classification using a pattern classifier

In this notebook, it is possible to use a previously trained contrast-pattern classification algorithm to obtain the empathy level of a conversation between two people. 

A conversation prompt is presented, pulled from the EmpatheticExchanges database subset for testing classification algorithms. 

## Setup 

This subsection focuses on setting up the environment, functions, utilities, and models required for the demo. Likewise, it is where the variables are manually declared. 

In [1]:
import pickle
import pandas as pd
import torch
import os
import sys
import random 
import re
#import classifier
from PBC4cip import PBC4cip
from PBC4cip.core.Evaluation import obtainAUCMulticlass
from PBC4cip.core.Helpers import get_col_dist, get_idx_val

#utilities for database management
import numpy as np
import pandas as pd
from tqdm import tqdm, trange
import os
import argparse

import train_classifier as trainer
import test_classifier as tester
import database_processing_package as data_processer

#relevant classifiers for annotating exchange feature
from classifiers.empathetic_intent import intent_prediction as ip
from classifiers.sentiment import sentiment_prediction as sp
from classifiers.epitome_mechanisms import epitome_predictor as epitome
from classifiers.nrc_vad_lexicon import lexicon_analysis as lexicon
from classifiers.course_grained_emotion import pretrained_32emotions as em32
from classifiers.course_grained_emotion import emotion_reductor as em_red
import database_processing_package as data_processer

from spellchecker import SpellChecker


### Selection of features

In this cell, we define the model that will be used for this task. We declare its location directory, and its name "trained_pbc4cip.sav" 

Likewise, we declare a "feature vector" which contains binary flags for the features used by the model to predict empathy. 

Finally, we declare the database from which the prompts will be extracted from by declaring its directory. 

In [2]:
#Relevant directories
current_dir = os.getcwd() #get directory of the repository
#Select an appropriate classification model in the Experiments folder
model_directory = current_dir + '/Experiments/outputs/Experiment '+ str(70) + '/' + 'trained_pbc4cip.sav'


feature2number = {'database_to_classify':0,'intent' : 1, 'sentiment' : 2, 'epitome':3, 'VAD_vectors':4, 'utterance_length':5,
                  '32_emotion_labels':6,'20_emotion_labels':7, 
                  '8_emotion_labels':8, 'emotion_mimicry':9, 'Reduce_empathy_labels':10, 
                  'exchange_number' : 11}

feature_vector = [1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1]
'''
                 [1,#database to pull from 0 = empatheticconversations (old), 1 empatheticexchanges (new)
                  1,#intent
                  1,#sentiment
                  0,#epitome
                  1,#vad lexicon
                  1,#length
                  0,#emotion 32
                  0,#emotion 20
                  1,#emotion 8
                  1,#emotion mimicry
                  1, #reduce empathy labels
                  1 #exchange number
                  ]
'''

if feature_vector[feature2number['database_to_classify']] == 1: 
    database_dir = '/processed_databases/EmpatheticExchanges/EmpatheticExchanges_test.csv'
else: 
    database_dir = '/processed_databases/EmpatheticConversationsExchangeFormat/EmpatheticConversations_ex.csv'

### Loading classification models

In this cell, we prepare the classification models for obtaining empathy-related features. These models must be pretrained before they are loaded by this demo. 

### WARNING: DO NOT RUN THIS TWICE. It will cause memory errors

In [3]:
#load intent model
if feature_vector[feature2number['intent']] == 1: 
    empIntSubDir = './classifiers/empathetic_intent/'
    model_intent,tokenizer_intent,device = ip.loadModelTokenizerAndDevice(empIntSubDir) #get model and parameters
#load sentiment model
if feature_vector[feature2number['sentiment']] == 1: 
    empIntSubDir = './classifiers/empathetic_intent/'
    sent_model, sent_tokenzr = sp.loadSentimentModel() #get model and tokenizer
#epitome model is loaded during inference due to the code of its classifier
if feature_vector[feature2number['epitome']] == 1:
    epitome_empathy_classifier = epitome.load_epitome_classifier('classifiers/epitome_mechanisms/trained_models')
#load lexicon
if feature_vector[feature2number['VAD_vectors']] == 1:
    lexicon_df, wnl, stp_wrds = lexicon.setup_lexicon('classifiers/nrc_vad_lexicon/BipolarScale/NRC-VAD-Lexicon.txt')
#load emotion classifier with 32 labels for any of the emotion labels options
if (feature_vector[feature2number['32_emotion_labels']] == 1) or (feature_vector[feature2number['20_emotion_labels']] == 1) or (feature_vector[feature2number['8_emotion_labels']] == 1):
    emo32_model, emo32_tokenzr = em32.load32EmotionsModel() #get model and tokenizer
#it is necessary to get the VAD vectors for obtaining emotion mimicry
if feature_vector[feature2number['emotion_mimicry']] == 1:
    lexicon_df, wnl, stp_wrds = lexicon.setup_lexicon('classifiers/nrc_vad_lexicon/BipolarScale/NRC-VAD-Lexicon.txt')

### Definition of data processing function

This is a function used to transform a text exchange into the format necessary for classification. It adds the following features: 

* Sentiment
* EPITOME mechanisms (Sharma, 2019)
* Valence, Arousal, and Dominance emotion vectors
* Utterance lengths for both participants
* Emotion labels
* Empathetic Intent
* Whether there is emotion mimicry

This features are dependent on the feature vector defined at the start of this notebook


In [4]:
def process_answer(sample_df,control_vector):
    print('processing data....')
    if control_vector[feature2number['sentiment']] == 1: 
        sample_df['speaker_sentiment'] = sample_df.apply(data_processer.get_sentiment_probabilities,axis = 1, args = (sent_model,sent_tokenzr,'speaker_utterance')) 
        sample_df[['s_negative','s_neutral', 's_positive']] = pd.DataFrame(sample_df.speaker_sentiment.tolist(),index = sample_df.index)
        sample_df['listener_sentiment'] = sample_df.apply(data_processer.get_sentiment_probabilities,axis = 1, args = (sent_model,sent_tokenzr,'listener_utterance')) 
        sample_df[['l_negative','l_neutral', 'l_positive']] = pd.DataFrame(sample_df.listener_sentiment.tolist(),index = sample_df.index)
        sample_df = sample_df.drop(columns=['speaker_sentiment','listener_sentiment'])
    if control_vector[feature2number['epitome']] == 1:
        sample_df = epitome.classify_epitome_values(epitome_empathy_classifier, sample_df)
    if control_vector[feature2number['VAD_vectors']] == 1:
        sample_df['vad_speaker'] = sample_df['speaker_utterance'].apply(lexicon.get_avg_vad, args = (lexicon_df,wnl,stp_wrds)) 
        sample_df['vad_listener'] = sample_df['listener_utterance'].apply(lexicon.get_avg_vad, args = (lexicon_df,wnl,stp_wrds)) 
        sample_df[['valence_speaker','arousal_speaker','dominance_speaker']] = pd.DataFrame(sample_df.vad_speaker.tolist(),index = sample_df.index)
        sample_df[['valence_listener','arousal_listener','dominance_listener']] = pd.DataFrame(sample_df.vad_listener.tolist(),index = sample_df.index)
        sample_df = sample_df.drop(columns = ['vad_speaker','vad_listener'])
    if control_vector[feature2number['utterance_length']] == 1:
        sample_df['s_word_len'] = sample_df['speaker_utterance'].apply(data_processer.get_word_len) 
        sample_df['l_word_len'] = sample_df['listener_utterance'].apply(data_processer.get_word_len) 
    if (control_vector[feature2number['32_emotion_labels']] == 1) or (control_vector[feature2number['20_emotion_labels']] == 1) or (control_vector[feature2number['8_emotion_labels']] == 1):
        sample_df['speaker_emotion'] = sample_df.apply(data_processer.get_emotion_label,axis = 1, args = (emo32_model,emo32_tokenzr,'speaker_utterance')) 
        sample_df['listener_emotion'] = sample_df.apply(data_processer.get_emotion_label,axis = 1, args = (emo32_model,emo32_tokenzr,'listener_utterance')) 
        if (control_vector[feature2number['20_emotion_labels']] == 1): 
            sample_df = em_red.reduce_emotion_labels('speaker_emotion',sample_df)
            sample_df = em_red.reduce_emotion_labels('listener_emotion',sample_df)
        if (control_vector[feature2number['8_emotion_labels']] == 1): 
            sample_df = em_red.reduce_emotion_labels_to_8('speaker_emotion',sample_df)
            sample_df = em_red.reduce_emotion_labels_to_8('listener_emotion',sample_df)
    if control_vector[feature2number['intent']] == 1: 
        sample_df['utterance'] = str(answer)
        sample_df['is_response'] = 1
        sample_df['empathetic_intent'] = sample_df.apply(data_processer.get_emp_intent_probabilities, axis=1, args = (model_intent,tokenizer_intent,device,'listener_utterance'))
        sample_df[data_processer.intent_labels] = pd.DataFrame(sample_df.empathetic_intent.tolist(),index = sample_df.index)
        sample_df = sample_df.drop(columns=['empathetic_intent','utterance','is_response'])
    if control_vector[feature2number['emotion_mimicry']] == 1:
        if(control_vector[4] == 1):
            #get the emotional similarity, if it is more than 0.7 set mimicry to 1
            sample_df['emotional_similarity'] = sample_df.apply(data_processer.get_cosine_similarity,axis = 1) 
            sample_df['mimicry'] = sample_df.apply(lambda x: 1 if x['emotional_similarity'] > 0.7 else 0, axis = 1)
            sample_df = sample_df.drop(columns = ['emotional_similarity'])
        else: 
            sample_df['vad_speaker'] = sample_df['speaker_utterance'].apply(lexicon.get_avg_vad, args = (lexicon_df,wnl,stp_wrds,spll)) 
            sample_df['vad_listener'] = sample_df['listener_utterance'].apply(lexicon.get_avg_vad, args = (lexicon_df,wnl,stp_wrds,spll)) 
            sample_df[['valence_speaker','arousal_speaker','dominance_speaker']] = pd.DataFrame(sample_df.vad_speaker.tolist(),index = sample_df.index)
            sample_df[['valence_listener','arousal_listener','dominance_listener']] = pd.DataFrame(sample_df.vad_listener.tolist(),index = sample_df.index)
            sample_df = sample_df.drop(columns = ['vad_speaker','vad_listener'])                
            sample_df['emotional_similarity'] = sample_df.apply(data_processer.get_cosine_similarity,axis = 1) 
            sample_df['mimicry'] = sample_df.apply(lambda x: 1 if x['emotional_similarity'] > 0.7 else 0, axis = 1)
            sample_df = sample_df.drop(columns =  ['valence_speaker','arousal_speaker','dominance_speaker','valence_listener','arousal_listener','dominance_listener','emotional_similarity'])
        sample_df['mimicry'] = sample_df['mimicry'].astype('category')
        sample_df['mimicry'] = sample_df['mimicry'].astype('string')
        #sample_df = sample_df.drop(columns =  ['predictions_EX'])
    print('done')
    return sample_df


### database setup

We load the database. Next, we filter it to have only samples that start a conversation. This is done by selecting those that have an "exchange_number" variable of 1. 

In [5]:
database = pd.read_csv(current_dir + database_dir)

starting_exchange_db = database[database['exchange_number'] == 1]
starting_exchange_db = starting_exchange_db.reset_index(drop = True)

### Load our classification model

In this cell, we run the empathy classification model that we have previously trained. The model selection is done through specifying the directory in which the model was saved. 

In [6]:
model_directory = current_dir + '/Experiments/outputs/Experiment '+ str(70) + '/' + 'trained_pbc4cip.sav'

pbc = pickle.load(open(model_directory, 'rb'))

## Application

In this subsection, we present that working parts of the demo. 

### Conversation starter

We randomly sample the database for a conversation prompt. This is equivalent to an utterance of a first agent, to which we will provide a response. 

In [7]:
len_of_db = len(starting_exchange_db)
index_of_sample = random.randint(0, len_of_db)
sample_text = starting_exchange_db.loc[index_of_sample,'speaker_utterance']
sample_text = re.sub("_comma_", ',', sample_text)
print(f'Prompt: "{sample_text}"') 

Prompt: "My girlfriend always tells me what to do."


### Response

We provide a response to the prompt

In [8]:
flag = True
while(flag):
    answer = input("Provide your response: ")
    if answer.lower() == '':
        print('No answer received, please provide a response')
    else:
        flag = False

Provide your response:  Oh, well, if you are both ok with that, there is no problem


### Inference

In this cell, the prompt-response pair is processed to have the format and features required for the classification algorithm. 

Subsequently, the data is passed to the classifier, and a prediction it made. 

In [9]:
df = starting_exchange_db.iloc[[index_of_sample]]
df = df.reset_index(drop=True)
columns_list = starting_exchange_db.columns.to_list()
df.loc[0, 'listener_utterance'] = str(answer)
C = list(set(columns_list) - set(['speaker_utterance','listener_utterance','empathy','exchange_number']))
df = df.drop(columns = C)
df = process_answer(df,feature_vector)
df = df.drop(columns = ['speaker_utterance', 'listener_utterance'])
x_test = df.drop(columns=['empathy'])
y_test = df.drop(columns=x_test.columns)
y_pred = pbc.predict(x_test)
print(f'Our classification algorithm predicts a level of {int(y_pred[0]) + 1} out of 3 for the perceived empathy of your response')

processing data....
done


                                                                                

Our classification algorithm predicts a level of 1 out of 3 for the perceived empathy of your response




### Multi-turn inference

In this cell, we run multiple inferences through utterance exchanges. Together, these exchanges form a conversation centered in an emotional topic. 

In [40]:
flag = 0

#1470 <---- index of a good conversation
# hit:7872_conv:15745

while(flag == 0):
    len_of_db = len(starting_exchange_db)
    index_of_sample = random.randint(0, len_of_db-1)
    sample_conv_start = starting_exchange_db.loc[index_of_sample,'speaker_utterance']
    conv_id = starting_exchange_db.loc[index_of_sample,'conv_id']
    #print(conv_id)
    #print(f'Prompt: "{sample_text}"') 
    
    full_database = pd.read_csv(current_dir + '/processed_databases/EmpatheticExchanges/EmpatheticExchanges.csv')
    conv_df = full_database[full_database['conv_id'] == conv_id]
    conv_df = conv_df.reset_index(drop=True)
    if len(conv_df) > 2 and conv_df.loc[0,'empathy'] > 2:
        flag = 1
    else:
        flag = 0
print('Example of an empathetic multi-turn conversation')
for i in range(len(conv_df)):
    print(f'Turn {conv_df.loc[i,'exchange_number']}')
    print(f'Agent A: {conv_df.loc[i,'speaker_utterance']}')
    print(f'Agent B: {conv_df.loc[i,'listener_utterance']}')

Example of an empathetic multi-turn conversation
Turn 1
Agent A: I have an old truck that I enjoy working on and driving around.
Agent B: What great fun! My husband had a 1960 Ford F150 he completely redid. It was quite an attraction around town!
Turn 2
Agent A: That's cool. Mine is a 68 chevy c10
Agent B: Nice! Where did you find it?
Turn 3
Agent A: It's a family heirloom.
Agent B: Even better. Are you doing all the work yourself or do you have a restoration person helping you?
Turn 4
Agent A: I've done it all myself. It takes a long time_comma_ but that's part of the fun.
Agent B: That's true. You must be a very good mechanic to do all that - I know the amount of work it takes!


In [37]:
mt_df = pd.DataFrame()
for i in range(len(conv_df)):
    query = input("Provide an uttereance by Agent A: ")
    if ('end' in query) or ('quit' in query):
        break
    answer = input("Provide an uttereance by Agent B: ")
    if ('end' in aswer) or ('quit' in aswer):
        break
    datarow = {'speaker_utterance': [str(query)], 'listener_utterance': [str(answer)],'exchange_number': [i+1], 'empathy': [3]}
    mt_df = pd.concat([mt_df,pd.DataFrame.from_dict(datarow)])
    ex_df = process_answer(mt_df.iloc[[i]].reset_index(drop=True), feature_vector)
    ex_df = ex_df.drop(columns = ['speaker_utterance', 'listener_utterance'])
    x_test = ex_df.drop(columns=['empathy'])
    y_test = ex_df.drop(columns=x_test.columns)
    y_pred = pbc.predict(x_test)
    print(f'Predicted empathy of the exchange: {y_pred[0]+1}/3')

Provide an uttereance by Agent A:  end
