# **Testing Trained PBC4cip**

This document will explore the testing of individual utterances to dialogue prompts found in the EmpatheticConversations database using the PBC4cip classification algorithm. In order to test the utterances please see further into the document for the random prompt and input a response as you see feet. Remember to keep the following files in the environment:

1.   EmpatheticConversations.xlsx - to obtain the prompts
2.   Empathyabase.csv - to ensure the format of the utterance is correct
3.   trained_pbc4cip.sav - trained PBC4cip classifier on the Empathyabase.csv database.





In [1]:
pip install paralleldots

Note: you may need to restart the kernel to use updated packages.




In [2]:
pip install PBC4cip

Note: you may need to restart the kernel to use updated packages.




In [3]:
#Pandas and numpy imports
import pandas as pd
import numpy as np
from pandas import read_csv

#PBC4cip import
#from PBC4cip import PBC4cip
import os
import argparse
import numpy as np
import pandas as pd

from tqdm import tqdm, trange
from PBC4cip import PBC4cip
from PBC4cip.core.Evaluation import obtainAUCMulticlass
from PBC4cip.core.Helpers import get_col_dist, get_idx_val

#Pickle import
import pickle

#Import paralleldots
import paralleldots
from paralleldots import taxonomy
from paralleldots import set_api_key, get_api_key

#Import extras

import requests
import json
import re
import random

In [4]:
df = pd.read_excel('EmpatheticConversations.xlsx')
df_prepared = pd.read_csv('Empathyabase.csv')
prompt_df = df[df['utterance_idx'] == 1]
prompt_df = prompt_df.reset_index()
len(prompt_df)

400

## Set up paralleldots licence

This license is the one used by the creators of this document, the availability of the API might be inconsistent as a result. If you want to make sure you have access, please sign up through paralleldots and use an available license.

In [5]:
paralleldots.set_api_key('9x4Ya0ooRZDwypZZzsIXOaNywIM6szzkk6yGZMX8e2U')
print( "API Key: %s" % paralleldots.get_api_key() )

API Key: 9x4Ya0ooRZDwypZZzsIXOaNywIM6szzkk6yGZMX8e2U


#**INTERACTION SECTION**

Please read the prompt given to you and give an appropriate input.

In [6]:
number_of_prompt = random.randint(0,len(prompt_df))
prompt = prompt_df.iloc[number_of_prompt]
print("Your conversation partner says: ")
print(prompt['utterance'])

Your conversation partner says: 
I was recently on a mule ride in the Grand Canyon of Arizona! I really had to put a lot of faith in the mule though, because there were a bunch of sheer drops on the trail!


In [7]:
response = input()

Mules are very trustworthy animals, I really love them


In [8]:
print("You say: ")
print(response)

You say: 
Mules are very trustworthy animals, I really love them


In [9]:
#Initializing values
df.columns
prompt_values = {}

for c in df.columns:
  prompt_values.update({c:prompt[c]})

prompt_values['utterance_idx'] = 2
prompt_values['speaker_idx'] = prompt_values['speaker_idx']+1
prompt_values['utterance'] = response
prompt_values['ut_len'] = len(response)
prompt_values['Talker'] = 2

# **Paralleldots**

This part of the document gets the data from paralleldots.


# WARNING: ERRORS MIGHT APPEAR DURING FETCHING OF DATA, CHECK OUTPUT.

## YOU MIGHT NEED TO RUN THESE STEPS UNTIL NO ERRORS ARE STATED TO HAVE BEEN FOUND WHILE CONTACTING PARALLELDOTS

In [10]:
api_key  = get_api_key()

### **Sentiment analysis**

In [11]:
prompt_values['Sentiment'] = str(paralleldots.sentiment(response))

### **Emotion analysis**

In [12]:
#prompt_values['Emotion'] = paralleldots.emotion(response,'en')
prompt_values['Emotion'] = requests.post( "https://apis.paralleldots.com/v4/emotion", data= { "api_key": api_key, "text": response, "lang_code": 'en' } ).text
re_emo = re.sub(r"\"", "'", prompt_values['Emotion'])
prompt_values['Emotion'] = re_emo

if "error-details" in str(prompt_values['Emotion']):
  print("Error contacting paralleldots!!!!")
  print("Setting up dummy values for emotion... ")
  prompt_values['Emotion'] = "{'emotion':{'Happy': 0.000000, 'Fear': 0.000000, 'Sad': 0.000000, 'Bored': 0.000000, 'Angry': 0.000000, 'Excited': 0.000000}}"
  print("DO NOT TRUST CLASSIFICATION RESULT")
  print("RECOMMENDATION: RESTART CODE BLOCK")
  print()
  print()

### **Taxonomic analysis**

In [15]:
#prompt_values['Taxonomy'] = paralleldots.taxonomy(response)

tax = requests.post( "https://apis.paralleldots.com/v4/taxonomy", data= { "api_key": api_key, "text": response } ).text
re_tax = re.sub(r"\"", "'", tax)
prompt_values['Taxonomy'] = re_tax

if "error-details" in str(prompt_values['Taxonomy']):
  print("Error contacting paralleldots!!!!")
  print("Setting up dummy values for Taxonomy... ")
  prompt_values['Taxonomy'] = "{'taxonomy':[{'confidence_score': 0.00000, 'tag': 'IMPACT'}, {'confidence_score': 0.00000, 'tag': 'EDUCATION'}, {'confidence_score': 0.00000, 'tag': 'POLITICS'}]}"
  print("DO NOT TRUST CLASSIFICATION RESULT")
  print("RECOMMENDATION: RESTART CODE BLOCK")
  print()
  print()

### **Intent analysis**

In [16]:
prompt_values['Intent'] = str(paralleldots.intent(str(response)))

# Utterance processing

In this section, we process the utterance so that it can be fed to the classifier. In general, it is just the separation of the paralleldots data and setting up the EC features so that it is compatible with the database format used to train the classifier.

In [17]:
#Sentiment separation

def sentiment_separator(s,n):
  s2 = s[s.find(":")+1+s.find('{'):s.find("}")]
  s3 = s2[s2.find('{')+1:]
  array = s3.split(", ")
  #print(array)
  return float(array[n][array[n].find(':')+1:])

prompt_values['Negative_score'] = sentiment_separator(prompt_values['Sentiment'],0)
prompt_values['Neutral_score'] = sentiment_separator(prompt_values['Sentiment'],1)
prompt_values['Positivity_score'] = sentiment_separator(prompt_values['Sentiment'],2)

#Context encoding

num_to_context = {0: 'afraid',
 1: 'angry',
 2: 'annoyed',
 3: 'anticipating',
 4: 'anxious',
 5: 'apprehensive',
 6: 'ashamed',
 7: 'caring',
 8: 'confident',
 9: 'content',
 10: 'devastated',
 11: 'disappointed',
 12: 'disgusted',
 13: 'embarrassed',
 14: 'excited',
 15: 'faithful',
 16: 'furious',
 17: 'grateful',
 18: 'guilty',
 19: 'hopeful',
 20: 'impressed',
 21: 'jealous',
 22: 'joyful',
 23: 'lonely',
 24: 'nostalgic',
 25: 'prepared',
 26: 'proud',
 27: 'sad',
 28: 'sentimental',
 29: 'surprised',
 30: 'terrified',
 31: 'trusting'}

context_dict = dict((v, k) for k, v in num_to_context.items())

prompt_values['context_encoded'] = context_dict[prompt_values['context']]

#Emotion separation

emo_str = str(prompt_values['Emotion'])
emo_str = emo_str[emo_str.find(':{')+3:emo_str.find('}}')]
emo_arr = emo_str.split(',')
emo_dic = {}
for x in emo_arr:
  val = x.split(":")
  val[0] = re.sub(r"\'", "", val[0])
  emo_dic.update({re.match(r'[A-Za-z]*',val[0])[0]:float(val[1])})

for emo in emo_dic:
  prompt_values[emo] = emo_dic[emo]

#Intent separation

intents = ['news','query','spam','marketing','feedback','complaint','suggestion','appreciation']

def get_intent(s):
  s = s[s.find("':")+4:]
  arr = s.split(',')
  arr
  if len(arr) > 5:
    s = ''+ arr[4][:arr[4].find('{')]+ arr[4][arr[4].find("re':")+4:]
    arr[4] = s
    k = arr[5]
    k = k[k.find(': {')+3:]
    arr[5] = k
    t = arr[7]
    t = t[:t.find("}}")]
    arr[7] = t
  else:
    arr[4] = arr[4][:arr[4].find("}}")]
    arr.append("'complaint': 0.0")
    arr.append("'suggestion': 0.0")
    arr.append("'appreciation': 0.0")
  for i in range(len(arr)):
    arr[i] = arr[i].replace("'", "")
    arr[i] = arr[i].replace(" ", "")
    val = arr[i].split(':')
    arr[i] = [val[0],float(val[1])]
  return arr

intent_array = get_intent(prompt_values['Intent'])
for x in intent_array:
  prompt_values[str(x[0])] = x[1]

#Taxonomy separation

unique_tax_array = ['IMPACT', 'EDUCATION', 'POLITICS', 'ENTERTAINMENT', 'TASTE', 'SPORTS', 'HEALTHYLIVING', 'GREEN', 'BUSINESS', 'WORLDPOST', 'TECH', 'SCIENCE', 'ARTS&CULTURE', 'CRIME', 'TRAVEL', 'RELIGION']

def obtain_tax(s):
  s = s[s.find("[")+1:]
  s = s.split(',')
  #print(s)
  for x in s:
    if 'tag' not in s:
      s.remove(x)
  for i in range(len(s)):
    s[i] = s[i][s[i].find(':')+2:s[i].find("}")-1]
  return s

tax_dict = {}

def obtain_tax_sc(s):
#s = df['Taxonomy'][0]
  s = s[s.find("[")+1:]
  s = s[s.find("{")+1:s.find("}]}")]
  s = s.split('},{')
  for i in range(len(s)):
    arr = s[i].split(',')
    new_arr = [arr[1][arr[1].find(":'")+2:len(arr[1])-1],float(arr[0][arr[0].find(':')+1:])]
    #print(new_arr)
    s[i] = new_arr
  return s

tax_array = obtain_tax_sc(prompt_values['Taxonomy'].strip())

for tax in tax_array:
  tax_dict.update({tax[0]:tax[1]})

tax_dict

for tax in unique_tax_array:
  if tax in tax_dict:
    prompt_values[tax] = tax_dict[tax]
  else:
    prompt_values[tax] = float(0)

prompt_values


{'conv_id': 'hit:4980_conv:9960',
 'utterance_idx': 2,
 'context': 'trusting',
 'prompt': 'I was recently on a mule ride in the Grand Canyon of Arizona!  I really had to put a lot of faith in the mule though_comma_ because there were a bunch of sheer drops on the trail!',
 'speaker_idx': 278,
 'utterance': 'Mules are very trustworthy animals, I really love them',
 'ut_len': 54,
 'Empathy': 4,
 'Talker': 2,
 'Sentiment': "{'sentiment': {'negative': 0.114, 'neutral': 0.293, 'positive': 0.593}}",
 'Emotion': "{'emotion':{'Happy':0.391347547,'Fear':0.1449694537,'Angry':0.040888843,'Bored':0.0,'Excited':0.1329019168,'Sad':0.2898922395}}",
 'Taxonomy': "{'taxonomy':[{'confidence_score':0.8478633165,'tag':'ENTERTAINMENT'},{'confidence_score':0.0963525102,'tag':'TASTE'},{'confidence_score':0.0159519427,'tag':'POLITICS'}]}",
 'Intent': "{'intent': {'news': 0.016, 'query': 0.031, 'spam': 0.417, 'marketing': 0.007, 'feedback': {'score': 0.53, 'tag': {'complaint': 0.017, 'suggestion': 0.012, 'appr

In [18]:

#Getting rid of unnecessary features
prompt_values.pop('context')
prompt_values.pop('conv_id')
prompt_values.pop('utterance')
prompt_values.pop('prompt')
prompt_values.pop('Sentiment')
prompt_values.pop('Emotion')
prompt_values.pop('Intent')
prompt_values.pop('speaker_idx')
prompt_values.pop('Taxonomy')

prompt_values

{'utterance_idx': 2,
 'ut_len': 54,
 'Empathy': 4,
 'Talker': 2,
 'Negative_score': 0.114,
 'Neutral_score': 0.293,
 'Positivity_score': 0.593,
 'context_encoded': 31,
 'Happy': 0.391347547,
 'Fear': 0.1449694537,
 'Angry': 0.040888843,
 'Bored': 0.0,
 'Excited': 0.1329019168,
 'Sad': 0.2898922395,
 'news': 0.016,
 'query': 0.031,
 'spam': 0.417,
 'marketing': 0.007,
 'feedback': 0.53,
 'complaint': 0.017,
 'suggestion': 0.012,
 'appreciation': 0.971,
 'IMPACT': 0.0,
 'EDUCATION': 0.0,
 'POLITICS': 0.0159519427,
 'ENTERTAINMENT': 0.8478633165,
 'TASTE': 0.0963525102,
 'SPORTS': 0.0,
 'HEALTHYLIVING': 0.0,
 'GREEN': 0.0,
 'BUSINESS': 0.0,
 'WORLDPOST': 0.0,
 'TECH': 0.0,
 'SCIENCE': 0.0,
 'ARTS&CULTURE': 0.0,
 'CRIME': 0.0,
 'TRAVEL': 0.0,
 'RELIGION': 0.0}

## We carry out the preparation of the database with our new row

In [29]:
#df_prepared2 = read_csv('Empathyabase-1tst.csv')
df_prepared2 = df_prepared.copy()

df_prepared2.loc[len(df_prepared2)] = prompt_values

test = pd.DataFrame(df_prepared2, index=[len(df_prepared2)-1])
test['Empathy'] = test['Empathy'].astype('string')
test["utterance_idx"] = test["utterance_idx"].astype('category')
test["Talker"] = test["Talker"].astype('category')
test["context_encoded"] = test["context_encoded"].astype('category')
df_prepared2
print(len(df_prepared))
print(len(df_prepared2))
test

112
113


Unnamed: 0,utterance_idx,ut_len,Empathy,Talker,Negative_score,Neutral_score,Positivity_score,Happy,Fear,Sad,...,GREEN,BUSINESS,WORLDPOST,TECH,SCIENCE,ARTS&CULTURE,CRIME,TRAVEL,RELIGION,context_encoded
112,2,54,4,2,0.114,0.293,0.593,0.391348,0.144969,0.289892,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,31


# CLASSIFICATION

In this section, we carry out classification using the saved PBC4cip model

In [43]:

X1_test = test.drop(columns=['Empathy'])
Y1_test = test.drop(columns=X1_test.columns)

# save the model to disk
filename = 'trained_pbc4cip.sav'

# load the model from disk
loaded_model = pickle.load(open(filename, 'rb'))

y_pred = loaded_model.predict(X1_test)

                                                                                                                       

In [44]:
print("The classifier predicts that your response presents an empathy level of: "+ str(y_pred[-1]) +" on a scale from 1 to 5")

The classifier predicts that your response presents an empathy level of: 4 on a scale from 1 to 5
