# **Representation preparation**

This document will explore the preparation of the database EmpatheticConversations for the Empathy classification task. It was necessary to prepare and separate the different features on the database in order for the data to be processed by the Waikato Environment for Knowledge Analysis (WEKA).

In [None]:
#Pandas and numpy imports
import pandas as pd
import numpy as np

In [None]:
#Read the database
df = pd.read_excel('EmpatheticConversations.xlsx')
df

Unnamed: 0,conv_id,utterance_idx,context,prompt,speaker_idx,utterance,ut_len,Empathy,Talker,Sentiment,Emotion,Taxonomy,Intent
0,hit:11054_conv:22108,1,surprised,a job I applied and interview for a couple of ...,675,a job I applied and interview for a couple of ...,95,5,1,"{'sentiment': {'negative': 0.275, 'neutral': 0...","{'emotion': {'Happy': 0.155295802, 'Fear': 0.4...",{'taxonomy': [{'confidence_score': 0.410472810...,"{'intent': {'news': 0.087, 'query': 0.072, 'sp..."
1,hit:11054_conv:22108,2,surprised,a job I applied and interview for a couple of ...,746,Congrats! How exciting for you.,32,5,2,"{'sentiment': {'negative': 0.005, 'neutral': 0...","{'emotion': {'Angry': 0.0883550096, 'Sad': 0.0...",{'taxonomy': [{'confidence_score': 0.371980845...,"{'intent': {'news': 0.028, 'query': 0.112, 'sp..."
2,hit:11054_conv:22108,3,surprised,a job I applied and interview for a couple of ...,675,"thank you, when they called, I did not know wh...",119,5,1,"{'sentiment': {'negative': 0.031, 'neutral': 0...","{'emotion': {'Angry': 0.0568077244, 'Excited':...",{'taxonomy': [{'confidence_score': 0.356034636...,"{'intent': {'news': 0.005, 'query': 0.013, 'sp..."
3,hit:11054_conv:22108,4,surprised,a job I applied and interview for a couple of ...,746,That's a long time to hold out hope. I bet yo...,61,5,2,"{'sentiment': {'negative': 0.572, 'neutral': 0...","{'emotion': {'Angry': 0.1211754311, 'Excited':...",{'taxonomy': [{'confidence_score': 0.621948182...,"{'intent': {'news': 0.018, 'query': 0.145, 'sp..."
4,hit:6194_conv:12389,1,joyful,Got a Costco cake. For myself.,1,Got a Costco cake. For myself.,30,3,1,"{'sentiment': {'negative': 0.018, 'neutral': 0...","{'emotion': {'Happy': 0.4695486472, 'Fear': 0....",{'taxonomy': [{'confidence_score': 0.554982304...,"{'intent': {'news': 0.012, 'query': 0.037, 'sp..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1708,hit:481_conv:963,4,surprised,When I got an A on my math final I was super s...,55,Nice! Good to hear that things worked out for ...,61,4,2,"{'sentiment': {'negative': 0.019, 'neutral': 0...","{'emotion': {'Angry': 0.0803235033, 'Sad': 0.0...",{'taxonomy': [{'confidence_score': 0.561580479...,"{'intent': {'news': 0.049, 'query': 0.011, 'sp..."
1709,hit:5257_conv:10514,1,guilty,I was too tired to mow the lawn and do garden ...,511,"One day, I was to tired to mow the lawn.",46,3,1,"{'sentiment': {'negative': 0.648, 'neutral': 0...","{'emotion': {'Happy': 0.0155348269, 'Fear': 0....",{'taxonomy': [{'confidence_score': 0.507551670...,"{'intent': {'news': 0.037, 'query': 0.427, 'sp..."
1710,hit:5257_conv:10514,2,guilty,I was too tired to mow the lawn and do garden ...,445,"I feel that way every day almost, Did you ever...",69,3,2,"{'sentiment': {'negative': 0.358, 'neutral': 0...","{'emotion': {'Excited': 0.0685982253, 'Fear': ...",{'taxonomy': [{'confidence_score': 0.517133534...,"{'intent': {'news': 0.001, 'query': 0.584, 'sp..."
1711,hit:5257_conv:10514,3,guilty,I was too tired to mow the lawn and do garden ...,511,"No, my brother had to do it another even thoug...",80,3,1,"{'sentiment': {'negative': 0.807, 'neutral': 0...","{'emotion': {'Angry': 0.4105899265, 'Sad': 0.2...",{'taxonomy': [{'confidence_score': 0.858519196...,"{'intent': {'news': 0.007, 'query': 0.095, 'sp..."


The data in EmpatheticConversations presents some attributes as a single string. Therefore, it is necessary to separate it.

In [None]:
df['Sentiment'][0]

"{'sentiment': {'negative': 0.275, 'neutral': 0.437, 'positive': 0.287}}"

### **Sentiment analysis**

In [None]:
#Sentiment analysis separation

def sentiment_separator(s,n):
  s2 = s[s.find(":")+1+s.find('{'):s.find("}")]
  s3 = s2[s2.find('{')+1:]
  array = s3.split(", ")
  return float(array[n][array[n].find(':')+1:])


df['Negative_score'] = df['Sentiment'].apply(sentiment_separator,args = (0,))
df['Neutral_score'] = df['Sentiment'].apply(sentiment_separator,args = (1,))
df['Positivity_score'] = df['Sentiment'].apply(sentiment_separator,args = (2,))

df = df.drop(columns=['Sentiment'])
df

Unnamed: 0,conv_id,utterance_idx,context,prompt,speaker_idx,utterance,ut_len,Empathy,Talker,Emotion,Taxonomy,Intent,Negative_score,Neutral_score,Positivity_score
0,hit:11054_conv:22108,1,surprised,a job I applied and interview for a couple of ...,675,a job I applied and interview for a couple of ...,95,5,1,"{'emotion': {'Happy': 0.155295802, 'Fear': 0.4...",{'taxonomy': [{'confidence_score': 0.410472810...,"{'intent': {'news': 0.087, 'query': 0.072, 'sp...",0.275,0.437,0.287
1,hit:11054_conv:22108,2,surprised,a job I applied and interview for a couple of ...,746,Congrats! How exciting for you.,32,5,2,"{'emotion': {'Angry': 0.0883550096, 'Sad': 0.0...",{'taxonomy': [{'confidence_score': 0.371980845...,"{'intent': {'news': 0.028, 'query': 0.112, 'sp...",0.005,0.056,0.939
2,hit:11054_conv:22108,3,surprised,a job I applied and interview for a couple of ...,675,"thank you, when they called, I did not know wh...",119,5,1,"{'emotion': {'Angry': 0.0568077244, 'Excited':...",{'taxonomy': [{'confidence_score': 0.356034636...,"{'intent': {'news': 0.005, 'query': 0.013, 'sp...",0.031,0.122,0.847
3,hit:11054_conv:22108,4,surprised,a job I applied and interview for a couple of ...,746,That's a long time to hold out hope. I bet yo...,61,5,2,"{'emotion': {'Angry': 0.1211754311, 'Excited':...",{'taxonomy': [{'confidence_score': 0.621948182...,"{'intent': {'news': 0.018, 'query': 0.145, 'sp...",0.572,0.383,0.046
4,hit:6194_conv:12389,1,joyful,Got a Costco cake. For myself.,1,Got a Costco cake. For myself.,30,3,1,"{'emotion': {'Happy': 0.4695486472, 'Fear': 0....",{'taxonomy': [{'confidence_score': 0.554982304...,"{'intent': {'news': 0.012, 'query': 0.037, 'sp...",0.018,0.232,0.750
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1708,hit:481_conv:963,4,surprised,When I got an A on my math final I was super s...,55,Nice! Good to hear that things worked out for ...,61,4,2,"{'emotion': {'Angry': 0.0803235033, 'Sad': 0.0...",{'taxonomy': [{'confidence_score': 0.561580479...,"{'intent': {'news': 0.049, 'query': 0.011, 'sp...",0.019,0.226,0.756
1709,hit:5257_conv:10514,1,guilty,I was too tired to mow the lawn and do garden ...,511,"One day, I was to tired to mow the lawn.",46,3,1,"{'emotion': {'Happy': 0.0155348269, 'Fear': 0....",{'taxonomy': [{'confidence_score': 0.507551670...,"{'intent': {'news': 0.037, 'query': 0.427, 'sp...",0.648,0.291,0.061
1710,hit:5257_conv:10514,2,guilty,I was too tired to mow the lawn and do garden ...,445,"I feel that way every day almost, Did you ever...",69,3,2,"{'emotion': {'Excited': 0.0685982253, 'Fear': ...",{'taxonomy': [{'confidence_score': 0.517133534...,"{'intent': {'news': 0.001, 'query': 0.584, 'sp...",0.358,0.609,0.033
1711,hit:5257_conv:10514,3,guilty,I was too tired to mow the lawn and do garden ...,511,"No, my brother had to do it another even thoug...",80,3,1,"{'emotion': {'Angry': 0.4105899265, 'Sad': 0.2...",{'taxonomy': [{'confidence_score': 0.858519196...,"{'intent': {'news': 0.007, 'query': 0.095, 'sp...",0.807,0.171,0.022


## **Emotion analysis**

In [None]:
#Emotion analysis separation
#Order: 'Happy','Fear','Sad','Bored','Angry','Excited'

def emotion_separator(s):
  emotions = ['Happy','Fear','Sad','Bored','Angry','Excited']
  em_arr = [0,0,0,0,0,0]
  s = s[s.find(': {')+3:s.find('}}')]
  array = s.split(", ")

  for x in array:
    for y in range(len(emotions)):
      if emotions[y] in x:
        em_arr[y] = float(x[x.find(':')+1:])
  return em_arr

df['Emotions'] = df['Emotion'].apply(emotion_separator)

def emo_arr_sep(s,n):
  return s[n]

emo_names = ['Happy','Fear','Sad','Bored','Angry','Excited']

for x in range(len(emo_names)):
  df[emo_names[x]] = 0

for x in range(len(emo_names)):
  df[emo_names[x]] = df['Emotions'].apply(emo_arr_sep,args=(x,))

df = df.drop(columns=['Emotions','Emotion'])

df

Unnamed: 0,conv_id,utterance_idx,context,prompt,speaker_idx,utterance,ut_len,Empathy,Talker,Taxonomy,Intent,Negative_score,Neutral_score,Positivity_score,Happy,Fear,Sad,Bored,Angry,Excited
0,hit:11054_conv:22108,1,surprised,a job I applied and interview for a couple of ...,675,a job I applied and interview for a couple of ...,95,5,1,{'taxonomy': [{'confidence_score': 0.410472810...,"{'intent': {'news': 0.087, 'query': 0.072, 'sp...",0.275,0.437,0.287,0.155296,0.446474,0.147019,0.021908,0.044091,0.185211
1,hit:11054_conv:22108,2,surprised,a job I applied and interview for a couple of ...,746,Congrats! How exciting for you.,32,5,2,{'taxonomy': [{'confidence_score': 0.371980845...,"{'intent': {'news': 0.028, 'query': 0.112, 'sp...",0.005,0.056,0.939,0.487500,0.163485,0.034996,0.038620,0.088355,0.187045
2,hit:11054_conv:22108,3,surprised,a job I applied and interview for a couple of ...,675,"thank you, when they called, I did not know wh...",119,5,1,{'taxonomy': [{'confidence_score': 0.356034636...,"{'intent': {'news': 0.005, 'query': 0.013, 'sp...",0.031,0.122,0.847,0.339552,0.198578,0.039805,0.013174,0.056808,0.352083
3,hit:11054_conv:22108,4,surprised,a job I applied and interview for a couple of ...,746,That's a long time to hold out hope. I bet yo...,61,5,2,{'taxonomy': [{'confidence_score': 0.621948182...,"{'intent': {'news': 0.018, 'query': 0.145, 'sp...",0.572,0.383,0.046,0.231816,0.233946,0.176958,0.128674,0.121175,0.107431
4,hit:6194_conv:12389,1,joyful,Got a Costco cake. For myself.,1,Got a Costco cake. For myself.,30,3,1,{'taxonomy': [{'confidence_score': 0.554982304...,"{'intent': {'news': 0.012, 'query': 0.037, 'sp...",0.018,0.232,0.750,0.469549,0.173306,0.035059,0.015980,0.027758,0.278348
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1708,hit:481_conv:963,4,surprised,When I got an A on my math final I was super s...,55,Nice! Good to hear that things worked out for ...,61,4,2,{'taxonomy': [{'confidence_score': 0.561580479...,"{'intent': {'news': 0.049, 'query': 0.011, 'sp...",0.019,0.226,0.756,0.376266,0.054610,0.011847,0.006279,0.080324,0.470675
1709,hit:5257_conv:10514,1,guilty,I was too tired to mow the lawn and do garden ...,511,"One day, I was to tired to mow the lawn.",46,3,1,{'taxonomy': [{'confidence_score': 0.507551670...,"{'intent': {'news': 0.037, 'query': 0.427, 'sp...",0.648,0.291,0.061,0.015535,0.107610,0.189886,0.539146,0.128350,0.019473
1710,hit:5257_conv:10514,2,guilty,I was too tired to mow the lawn and do garden ...,445,"I feel that way every day almost, Did you ever...",69,3,2,{'taxonomy': [{'confidence_score': 0.517133534...,"{'intent': {'news': 0.001, 'query': 0.584, 'sp...",0.358,0.609,0.033,0.225942,0.205662,0.338206,0.069818,0.091774,0.068598
1711,hit:5257_conv:10514,3,guilty,I was too tired to mow the lawn and do garden ...,511,"No, my brother had to do it another even thoug...",80,3,1,{'taxonomy': [{'confidence_score': 0.858519196...,"{'intent': {'news': 0.007, 'query': 0.095, 'sp...",0.807,0.171,0.022,0.040715,0.116646,0.291302,0.071360,0.410590,0.069387


## **Obtain different Taxonomical features**

In [None]:
#Obtain Taxonomical features
def obtain_tax(s):
  #s = df['Taxonomy'][0]
  s = s[s.find("[")+1:]
  #s = s.split('}, {')
  s = s.split(',')
  s
  for x in s:
    if 'tag' not in s:
      s.remove(x)
  for i in range(len(s)):
    s[i] = s[i][s[i].find(': ')+3:s[i].find("}")-1]
  return s

df['Tax_array'] = df['Taxonomy'].apply(obtain_tax)

#Obtain Taxonomical scores

def obtain_tax_sc(s):
#s = df['Taxonomy'][0]
  s = s[s.find("[")+1:]
  s = s[s.find("{")+1:s.find("}]}")]
  s = s.split('}, {')
  for i in range(len(s)):
    arr = s[i].split(',')
    #print(arr)
    new_arr = [arr[1][arr[1].find(": '")+3:len(arr[1])-1],float(arr[0][arr[0].find(': ')+1:])]
    #print(new_arr)
    s[i] = new_arr
  return s
#s = s.split(',')

df['Tax_values'] = df['Taxonomy'].apply(obtain_tax_sc)
#df6
df

Unnamed: 0,conv_id,utterance_idx,context,prompt,speaker_idx,utterance,ut_len,Empathy,Talker,Taxonomy,...,Neutral_score,Positivity_score,Happy,Fear,Sad,Bored,Angry,Excited,Tax_array,Tax_values
0,hit:11054_conv:22108,1,surprised,a job I applied and interview for a couple of ...,675,a job I applied and interview for a couple of ...,95,5,1,{'taxonomy': [{'confidence_score': 0.410472810...,...,0.437,0.287,0.155296,0.446474,0.147019,0.021908,0.044091,0.185211,"[IMPACT, EDUCATION, POLITICS]","[[IMPACT, 0.4104728103], [EDUCATION, 0.1999589..."
1,hit:11054_conv:22108,2,surprised,a job I applied and interview for a couple of ...,746,Congrats! How exciting for you.,32,5,2,{'taxonomy': [{'confidence_score': 0.371980845...,...,0.056,0.939,0.487500,0.163485,0.034996,0.038620,0.088355,0.187045,"[ENTERTAINMENT, IMPACT, POLITICS]","[[ENTERTAINMENT, 0.3719808459], [IMPACT, 0.184..."
2,hit:11054_conv:22108,3,surprised,a job I applied and interview for a couple of ...,675,"thank you, when they called, I did not know wh...",119,5,1,{'taxonomy': [{'confidence_score': 0.356034636...,...,0.122,0.847,0.339552,0.198578,0.039805,0.013174,0.056808,0.352083,"[POLITICS, IMPACT, ENTERTAINMENT]","[[POLITICS, 0.3560346365], [IMPACT, 0.33480674..."
3,hit:11054_conv:22108,4,surprised,a job I applied and interview for a couple of ...,746,That's a long time to hold out hope. I bet yo...,61,5,2,{'taxonomy': [{'confidence_score': 0.621948182...,...,0.383,0.046,0.231816,0.233946,0.176958,0.128674,0.121175,0.107431,"[POLITICS, IMPACT, ENTERTAINMENT]","[[POLITICS, 0.6219481826], [IMPACT, 0.17078509..."
4,hit:6194_conv:12389,1,joyful,Got a Costco cake. For myself.,1,Got a Costco cake. For myself.,30,3,1,{'taxonomy': [{'confidence_score': 0.554982304...,...,0.232,0.750,0.469549,0.173306,0.035059,0.015980,0.027758,0.278348,"[TASTE, ENTERTAINMENT, IMPACT]","[[TASTE, 0.5549823046], [ENTERTAINMENT, 0.2484..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1708,hit:481_conv:963,4,surprised,When I got an A on my math final I was super s...,55,Nice! Good to hear that things worked out for ...,61,4,2,{'taxonomy': [{'confidence_score': 0.561580479...,...,0.226,0.756,0.376266,0.054610,0.011847,0.006279,0.080324,0.470675,"[ENTERTAINMENT, TASTE, HEALTHY LIVING]","[[ENTERTAINMENT, 0.5615804791], [TASTE, 0.3552..."
1709,hit:5257_conv:10514,1,guilty,I was too tired to mow the lawn and do garden ...,511,"One day, I was to tired to mow the lawn.",46,3,1,{'taxonomy': [{'confidence_score': 0.507551670...,...,0.291,0.061,0.015535,0.107610,0.189886,0.539146,0.128350,0.019473,"[ENTERTAINMENT, IMPACT, POLITICS]","[[ENTERTAINMENT, 0.5075516701], [IMPACT, 0.357..."
1710,hit:5257_conv:10514,2,guilty,I was too tired to mow the lawn and do garden ...,445,"I feel that way every day almost, Did you ever...",69,3,2,{'taxonomy': [{'confidence_score': 0.517133534...,...,0.609,0.033,0.225942,0.205662,0.338206,0.069818,0.091774,0.068598,"[IMPACT, ENTERTAINMENT, ARTS & CULTURE]","[[IMPACT, 0.517133534], [ENTERTAINMENT, 0.3619..."
1711,hit:5257_conv:10514,3,guilty,I was too tired to mow the lawn and do garden ...,511,"No, my brother had to do it another even thoug...",80,3,1,{'taxonomy': [{'confidence_score': 0.858519196...,...,0.171,0.022,0.040715,0.116646,0.291302,0.071360,0.410590,0.069387,"[IMPACT, SPORTS, HEALTHY LIVING]","[[IMPACT, 0.8585191965], [SPORTS, 0.047775194]..."


## Obtain intent scores

In [None]:
#Obtaining intent values

intents = ['news','query','spam','marketing','feedback','complaint','suggestion','appreciation']

def get_intent(s):
  s = s[s.find("':")+4:]
  arr = s.split(',')
  arr
  if len(arr) > 5:
    #print(arr[4][:arr[4].find('{')])
    #print(arr[4][arr[4].find("re':")+4:])
    s = ''+ arr[4][:arr[4].find('{')]+ arr[4][arr[4].find("re':")+4:]
    arr[4] = s

    k = arr[5]
    k = k[k.find(': {')+3:]

    arr[5] = k

    t = arr[7]
    t = t[:t.find("}}")]
    #print(t)
    arr[7] = t
  else:
    arr[4] = arr[4][:arr[4].find("}}")]
    arr.append("'complaint': 0.0")
    arr.append("'suggestion': 0.0")
    arr.append("'appreciation': 0.0")

  for i in range(len(arr)):
    arr[i] = arr[i].replace("'", "")
    arr[i] = arr[i].replace(" ", "")
    val = arr[i].split(':')
    arr[i] = [val[0],float(val[1])]
  return arr

df['Intent_values'] = df['Intent'].apply(get_intent)
df = df.drop(columns=['Taxonomy','Intent'])


def intent_separation(s,i):
  return s[i][1]

for i in range(len(intents)):
  df[intents[i]] = df['Intent_values'].apply(intent_separation, args= (i,))

df = df.drop(columns=['Intent_values'])


df

Unnamed: 0,conv_id,utterance_idx,context,prompt,speaker_idx,utterance,ut_len,Empathy,Talker,Negative_score,...,Tax_array,Tax_values,news,query,spam,marketing,feedback,complaint,suggestion,appreciation
0,hit:11054_conv:22108,1,surprised,a job I applied and interview for a couple of ...,675,a job I applied and interview for a couple of ...,95,5,1,0.275,...,"[IMPACT, EDUCATION, POLITICS]","[[IMPACT, 0.4104728103], [EDUCATION, 0.1999589...",0.087,0.072,0.311,0.052,0.477,0.110,0.035,0.855
1,hit:11054_conv:22108,2,surprised,a job I applied and interview for a couple of ...,746,Congrats! How exciting for you.,32,5,2,0.005,...,"[ENTERTAINMENT, IMPACT, POLITICS]","[[ENTERTAINMENT, 0.3719808459], [IMPACT, 0.184...",0.028,0.112,0.368,0.065,0.426,0.001,0.003,0.996
2,hit:11054_conv:22108,3,surprised,a job I applied and interview for a couple of ...,675,"thank you, when they called, I did not know wh...",119,5,1,0.031,...,"[POLITICS, IMPACT, ENTERTAINMENT]","[[POLITICS, 0.3560346365], [IMPACT, 0.33480674...",0.005,0.013,0.354,0.011,0.616,0.003,0.002,0.995
3,hit:11054_conv:22108,4,surprised,a job I applied and interview for a couple of ...,746,That's a long time to hold out hope. I bet yo...,61,5,2,0.572,...,"[POLITICS, IMPACT, ENTERTAINMENT]","[[POLITICS, 0.6219481826], [IMPACT, 0.17078509...",0.018,0.145,0.313,0.010,0.514,0.409,0.313,0.278
4,hit:6194_conv:12389,1,joyful,Got a Costco cake. For myself.,1,Got a Costco cake. For myself.,30,3,1,0.018,...,"[TASTE, ENTERTAINMENT, IMPACT]","[[TASTE, 0.5549823046], [ENTERTAINMENT, 0.2484...",0.012,0.037,0.452,0.196,0.304,0.000,0.000,0.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1708,hit:481_conv:963,4,surprised,When I got an A on my math final I was super s...,55,Nice! Good to hear that things worked out for ...,61,4,2,0.019,...,"[ENTERTAINMENT, TASTE, HEALTHY LIVING]","[[ENTERTAINMENT, 0.5615804791], [TASTE, 0.3552...",0.049,0.011,0.307,0.043,0.589,0.000,0.005,0.994
1709,hit:5257_conv:10514,1,guilty,I was too tired to mow the lawn and do garden ...,511,"One day, I was to tired to mow the lawn.",46,3,1,0.648,...,"[ENTERTAINMENT, IMPACT, POLITICS]","[[ENTERTAINMENT, 0.5075516701], [IMPACT, 0.357...",0.037,0.427,0.241,0.007,0.288,0.000,0.000,0.000
1710,hit:5257_conv:10514,2,guilty,I was too tired to mow the lawn and do garden ...,445,"I feel that way every day almost, Did you ever...",69,3,2,0.358,...,"[IMPACT, ENTERTAINMENT, ARTS & CULTURE]","[[IMPACT, 0.517133534], [ENTERTAINMENT, 0.3619...",0.001,0.584,0.186,0.001,0.228,0.000,0.000,0.000
1711,hit:5257_conv:10514,3,guilty,I was too tired to mow the lawn and do garden ...,511,"No, my brother had to do it another even thoug...",80,3,1,0.807,...,"[IMPACT, SPORTS, HEALTHY LIVING]","[[IMPACT, 0.8585191965], [SPORTS, 0.047775194]...",0.007,0.095,0.505,0.004,0.389,0.000,0.000,0.000


In [None]:
#Obtain taxonomical variables as values

tax_array = []


for i in range(len(df8['Tax_array'])):
  for y in df8['Tax_array'][i]:
    tax_array.append(y)
#tax_array


def unique(list1):
    # intilize a null list
    unique_list = []
    # traverse for all elements
    for x in list1:
        # check if exists in unique_list or not
        if x not in unique_list:
            unique_list.append(x)
    # print list
    return unique_list

unique_tax_array = unique(tax_array)

print(len(tax_array))
print(len(unique_tax_array))
#unique_tax_array


def find_tax(s,cat):
  for x in s:
    if x[0] == unique_tax_array[cat]:
      return x[1]
  return 0

for i in range(len(unique_tax_array)):
  df[unique_tax_array[i]] = df['Tax_values'].apply(find_tax,args=(i,))


df

5139
16


Unnamed: 0,conv_id,utterance_idx,context,prompt,speaker_idx,utterance,ut_len,Empathy,Talker,Negative_score,...,HEALTHY LIVING,GREEN,BUSINESS,WORLDPOST,TECH,SCIENCE,ARTS & CULTURE,CRIME,TRAVEL,RELIGION
0,hit:11054_conv:22108,1,surprised,a job I applied and interview for a couple of ...,675,a job I applied and interview for a couple of ...,95,5,1,0.275,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0
1,hit:11054_conv:22108,2,surprised,a job I applied and interview for a couple of ...,746,Congrats! How exciting for you.,32,5,2,0.005,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0
2,hit:11054_conv:22108,3,surprised,a job I applied and interview for a couple of ...,675,"thank you, when they called, I did not know wh...",119,5,1,0.031,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0
3,hit:11054_conv:22108,4,surprised,a job I applied and interview for a couple of ...,746,That's a long time to hold out hope. I bet yo...,61,5,2,0.572,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0
4,hit:6194_conv:12389,1,joyful,Got a Costco cake. For myself.,1,Got a Costco cake. For myself.,30,3,1,0.018,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1708,hit:481_conv:963,4,surprised,When I got an A on my math final I was super s...,55,Nice! Good to hear that things worked out for ...,61,4,2,0.019,...,0.023453,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0
1709,hit:5257_conv:10514,1,guilty,I was too tired to mow the lawn and do garden ...,511,"One day, I was to tired to mow the lawn.",46,3,1,0.648,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0
1710,hit:5257_conv:10514,2,guilty,I was too tired to mow the lawn and do garden ...,445,"I feel that way every day almost, Did you ever...",69,3,2,0.358,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.042951,0.0,0.0,0.0
1711,hit:5257_conv:10514,3,guilty,I was too tired to mow the lawn and do garden ...,511,"No, my brother had to do it another even thoug...",80,3,1,0.807,...,0.044799,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0


In [None]:
#Creation of "Talker" feature: 1 Speaker, 2 Listener
df = df.drop(columns=['prompt','Tax_array','Tax_values'])

def q1_q2(s):
  if s%2 == 0:
    return 2
  else:
    return 1

df['Talker'] = df['utterance_idx'].apply(q1_q2)
df.drop(columns=['conv_id','utterance','speaker_idx'])
df

Unnamed: 0,conv_id,utterance_idx,context,speaker_idx,utterance,ut_len,Empathy,Talker,Negative_score,Neutral_score,...,HEALTHY LIVING,GREEN,BUSINESS,WORLDPOST,TECH,SCIENCE,ARTS & CULTURE,CRIME,TRAVEL,RELIGION
0,hit:11054_conv:22108,1,surprised,675,a job I applied and interview for a couple of ...,95,5,1,0.275,0.437,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0
1,hit:11054_conv:22108,2,surprised,746,Congrats! How exciting for you.,32,5,2,0.005,0.056,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0
2,hit:11054_conv:22108,3,surprised,675,"thank you, when they called, I did not know wh...",119,5,1,0.031,0.122,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0
3,hit:11054_conv:22108,4,surprised,746,That's a long time to hold out hope. I bet yo...,61,5,2,0.572,0.383,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0
4,hit:6194_conv:12389,1,joyful,1,Got a Costco cake. For myself.,30,3,1,0.018,0.232,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1708,hit:481_conv:963,4,surprised,55,Nice! Good to hear that things worked out for ...,61,4,2,0.019,0.226,...,0.023453,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0
1709,hit:5257_conv:10514,1,guilty,511,"One day, I was to tired to mow the lawn.",46,3,1,0.648,0.291,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0
1710,hit:5257_conv:10514,2,guilty,445,"I feel that way every day almost, Did you ever...",69,3,2,0.358,0.609,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.042951,0.0,0.0,0.0
1711,hit:5257_conv:10514,3,guilty,511,"No, my brother had to do it another even thoug...",80,3,1,0.807,0.171,...,0.044799,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0


In [None]:
#Encoding context values into a categorical variable

"""
Equivalence:

1 : afraid
2 : angry
3 : annoyed
4 : anticipatinc
5 : anxious
6 : apprehensive
7 : ashamed
8 : caring
9 : confident
10 : content
11 : devastated
12 : disappointed
13 : disgusted
14 : embarassed
15 : excited
16 : faithful
17 : furious
18 : grateful
19 : guilty
20 : hopeful
21 : impressed
22 : jealous
23 : joyful
24 : lonely
25 : nostalgic
26 : prepared
27 : proud
28 : sad
29 : sentimental
30 : surprised
31 : terrified
32 : trusting
"""


df["context"] = df["context"].astype('category')

df["context_encoded"] = df["context"].cat.codes

df["context_encoded"] = df["context_encoded"].astype('category')


df = df.drop(columns=['context'])
df = df.drop(columns=['conv_id','utterance','speaker_idx'])

# Plutchik representation

In our work, we explored the viability of transforming the context from a single encoded value into a Plutchik vectorial value. For this, we change the encoded context categorical variable into various vectorial variables corresponding to Plutchik's emotion model.

In [None]:

"""

#Transforming context to Plutchik representation


plutchik_equivalencies = [[[0,0,1,0,0,0,0,0],2], #afraid
 [[0,0,0,0,0,0,1,0],2], #angry
 [[0,0,0,0,0,0,1,0],3], #annoyed
 [[0,0,0,0,0,0,0,1],2], #anticipating
 [[0,0,1,0,0,0,0,0],2], #anxious
 [[0,0,1,0,0,0,0,0],3], #apprehensive
 [[0,0,1,0,0,1,0,0],2], #ashamed
 [[1,1,0,0,0,0,0,0],2], #caring
 [[1,0,0,0,0,0,0,1],2], #confident
 [[1,0,0,0,0,0,0,0],3], #content
 [[0,0,0,1,1,1,0,0],1], #devastated
 [[0,0,0,1,1,0,0,0],2], #disappointed
 [[0,0,0,0,0,1,0,0],2], #disgusted
 [[0,0,1,0,0,1,0,0],3], #embarassed
 [[1,0,0,0,0,0,0,1],3], #excited
 [[1,1,0,1,0,0,0,0],1], #faithful
 [[0,0,0,0,0,0,1,0],1], #furious
 [[1,1,0,1,0,0,0,0],2], #grateful
 [[1,0,1,0,0,0,0,0],2], #guilty
 [[0,1,0,0,0,0,0,1],2], #hopeful
 [[0,1,0,1,0,0,0,0],1], #impressed
 [[0,0,0,0,1,0,1,0],2], #jealous
 [[1,0,0,0,0,0,0,0],2], #joyful
 [[0,0,0,0,1,0,0,0],1], #lonely
 [[1,0,0,0,1,0,0,0],2], #nostalgic
 [[0,0,0,0,0,0,0,1],2], #prepared
 [[1,0,0,0,0,0,1,0],2], #proud
 [[0,0,0,0,1,0,0,0],2], #sad
 [[0,1,0,0,0,0,0,0],2], #sentimental
 [[0,0,0,1,0,0,0,0],2], #surprised
 [[0,0,1,0,0,0,0,0],1], #terrified
 [[0,1,0,0,0,0,0,0],2]] #trusting


def plut_arr_separator(s,n):
  return plutchik_equivalencies[s][0][n]

def plut_intesity_separator(s):
  return plutchik_equivalencies[s][1]

emotion_categories = ["joy", "trust", "fear", "surprise", "sadness", "disgust", "anger", "anticipation"]


for x in range(len(emotion_categories)):
  df12['PL '+emotion_categories[x]] = 0

df12['PL intensity'] = 0

for x in range(len(emotion_categories)):
  df12['PL '+emotion_categories[x]] = df12['context_encoded'].apply(plut_arr_separator,args=(x,))
  df12['PL '+emotion_categories[x]] = df12['PL '+emotion_categories[x]].astype('string')

df12['PL intensity'] = df12['context_encoded'].apply(plut_intesity_separator)
#df12['PL intensity'] = df12['PL intensity'].astype('string')

df12 = df12.drop(columns=['context','context_encoded'])

#End of Plutchik
"""

'\n\n#Transforming context to Plutchik\n\n\nplutchik_equivalencies = [[[0,0,1,0,0,0,0,0],2], #afraid\n [[0,0,0,0,0,0,1,0],2], #angry\n [[0,0,0,0,0,0,1,0],3], #annoyed\n [[0,0,0,0,0,0,0,1],2], #anticipating\n [[0,0,1,0,0,0,0,0],2], #anxious\n [[0,0,1,0,0,0,0,0],3], #apprehensive\n [[0,0,1,0,0,1,0,0],2], #ashamed\n [[1,1,0,0,0,0,0,0],2], #caring\n [[1,0,0,0,0,0,0,1],2], #confident\n [[1,0,0,0,0,0,0,0],3], #content\n [[0,0,0,1,1,1,0,0],1], #devastated\n [[0,0,0,1,1,0,0,0],2], #disappointed\n [[0,0,0,0,0,1,0,0],2], #disgusted\n [[0,0,1,0,0,1,0,0],3], #embarassed\n [[1,0,0,0,0,0,0,1],3], #excited\n [[1,1,0,1,0,0,0,0],1], #faithful\n [[0,0,0,0,0,0,1,0],1], #furious\n [[1,1,0,1,0,0,0,0],2], #grateful\n [[1,0,1,0,0,0,0,0],2], #guilty\n [[0,1,0,0,0,0,0,1],2], #hopeful\n [[0,1,0,1,0,0,0,0],1], #impressed\n [[0,0,0,0,1,0,1,0],2], #jealous\n [[1,0,0,0,0,0,0,0],2], #joyful\n [[0,0,0,0,1,0,0,0],1], #lonely\n [[1,0,0,0,1,0,0,0],2], #nostalgic\n [[0,0,0,0,0,0,0,1],2], #prepared\n [[1,0,0,0,0,0,1,0],2]

# Sending of prepared_csv

We finally output the processed csv that will be used to carry out the classification task

In [None]:
df.to_csv('Empathyabase.csv', index = False)