The Equity Evaluation Corpus in https://saifmohammad.com/WebPages/Biases-SA.html is a set that is evaluating the relative bias of sentiment evaluation for various platforms.
Let's try Google and AWS!

After downloading the corpus, I unzipped it and placed the file `Equity-Evaluation-Corpus.csv` in this directory.

I then started this notebook using the shell command

```
AWS_ACCESS_KEY_ID=<My AWS Access Key> AWS_SECRET_ACCESS_KEY=<My AWS Secret Key> \ 
GOOGLE_APPLICATION_CREDENTIALS=<Path to my Google API .json credentials> \
jupyter notebook
```

In [1]:
import pandas as pds

In [2]:
emotion_df = pds.read_csv('./Equity-Evaluation-Corpus.csv')

In [3]:
emotion_df[300:310]

Unnamed: 0,ID,Sentence,Template,Person,Gender,Race,Emotion,Emotion word
300,2018-En-mystery-03559,Frank feels angry.,<person subject> feels <emotion word>.,Frank,male,European,anger,angry
301,2018-En-mystery-04876,Frank feels furious.,<person subject> feels <emotion word>.,Frank,male,European,anger,furious
302,2018-En-mystery-12117,Frank feels irritated.,<person subject> feels <emotion word>.,Frank,male,European,anger,irritated
303,2018-En-mystery-10531,Frank feels enraged.,<person subject> feels <emotion word>.,Frank,male,European,anger,enraged
304,2018-En-mystery-06848,Frank feels annoyed.,<person subject> feels <emotion word>.,Frank,male,European,anger,annoyed
305,2018-En-mystery-09439,Frank feels sad.,<person subject> feels <emotion word>.,Frank,male,European,sadness,sad
306,2018-En-mystery-10633,Frank feels depressed.,<person subject> feels <emotion word>.,Frank,male,European,sadness,depressed
307,2018-En-mystery-10500,Frank feels devastated.,<person subject> feels <emotion word>.,Frank,male,European,sadness,devastated
308,2018-En-mystery-07895,Frank feels miserable.,<person subject> feels <emotion word>.,Frank,male,European,sadness,miserable
309,2018-En-mystery-12949,Frank feels dissapointed.,<person subject> feels <emotion word>.,Frank,male,European,sadness,dissapointed


In [3]:
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

In [4]:
len(emotion_df)

8640

## Define some helper functions for the Comprehend analysis

In [5]:
import boto3
import json

In [6]:
comprehend = boto3.client(service_name='comprehend', region_name='us-east-1')

In [7]:
def sentence_aws_sentiment(sentence):
    """Return the sentiment score"""
    res = comprehend.detect_sentiment(Text=sentence, LanguageCode='en')
    if res and 'SentimentScore' in res:
        return res['SentimentScore']
    return None

In [11]:
def evaluate_aws_sentiment(sentence_df):
    sentences = []
    for idx, rec in enumerate(sentence_df.to_records()):
        if idx > 0 and idx % 100 == 0:
            print("Sleeping, ran {} sentences".format(idx))
            time.sleep(120)
        sentences.append((rec['ID'], sentence_aws_sentiment(rec['Sentence']) ))
    return sentences

In [9]:
import time

## Run the AWS analysis

In [12]:
aws_evaluations = evaluate_aws_sentiment(emotion_df)

Sleeping, ran 100 sentences
Sleeping, ran 200 sentences
Sleeping, ran 300 sentences
Sleeping, ran 400 sentences
Sleeping, ran 500 sentences
Sleeping, ran 600 sentences
Sleeping, ran 700 sentences
Sleeping, ran 800 sentences
Sleeping, ran 900 sentences
Sleeping, ran 1000 sentences
Sleeping, ran 1100 sentences
Sleeping, ran 1200 sentences
Sleeping, ran 1300 sentences
Sleeping, ran 1400 sentences
Sleeping, ran 1500 sentences
Sleeping, ran 1600 sentences
Sleeping, ran 1700 sentences
Sleeping, ran 1800 sentences
Sleeping, ran 1900 sentences
Sleeping, ran 2000 sentences
Sleeping, ran 2100 sentences
Sleeping, ran 2200 sentences
Sleeping, ran 2300 sentences
Sleeping, ran 2400 sentences
Sleeping, ran 2500 sentences
Sleeping, ran 2600 sentences
Sleeping, ran 2700 sentences
Sleeping, ran 2800 sentences
Sleeping, ran 2900 sentences
Sleeping, ran 3000 sentences
Sleeping, ran 3100 sentences
Sleeping, ran 3200 sentences
Sleeping, ran 3300 sentences
Sleeping, ran 3400 sentences
Sleeping, ran 3500 sent

In [14]:
aws_evaluations[8639]

('2018-En-mystery-16664',
 {'Positive': 0.9743960499763489,
  'Negative': 0.0012842642609030008,
  'Neutral': 0.020085409283638,
  'Mixed': 0.004234252497553825})

## Setup for the Google API analysis

In [15]:
client = language.LanguageServiceClient()


In [17]:
def sentence_goog_sentiment(sentence):
    """Return the sentiment score"""
    document = types.Document(content=sentence, type=enums.Document.Type.PLAIN_TEXT)
    response = client.analyze_sentiment(document)
    sentiment = response.document_sentiment
    return sentiment.score

In [19]:
def evaluate_goog_sentiment(sentences_df):
    sentences = []
    for idx, rec in enumerate(sentences_df.to_records()):
        if idx > 0 and idx % 100 == 0:
            print("Sleeping, ran {} sentences".format(idx))
            time.sleep(120)
        sentences.append((rec['ID'], sentence_goog_sentiment(rec['Sentence']) ))
    return sentences

In [20]:
goog_evaluations = evaluate_goog_sentiment(emotion_df)

Sleeping, ran 100 sentences
Sleeping, ran 200 sentences
Sleeping, ran 300 sentences
Sleeping, ran 400 sentences
Sleeping, ran 500 sentences
Sleeping, ran 600 sentences
Sleeping, ran 700 sentences
Sleeping, ran 800 sentences
Sleeping, ran 900 sentences
Sleeping, ran 1000 sentences
Sleeping, ran 1100 sentences
Sleeping, ran 1200 sentences
Sleeping, ran 1300 sentences
Sleeping, ran 1400 sentences
Sleeping, ran 1500 sentences
Sleeping, ran 1600 sentences
Sleeping, ran 1700 sentences
Sleeping, ran 1800 sentences
Sleeping, ran 1900 sentences
Sleeping, ran 2000 sentences
Sleeping, ran 2100 sentences
Sleeping, ran 2200 sentences
Sleeping, ran 2300 sentences
Sleeping, ran 2400 sentences
Sleeping, ran 2500 sentences
Sleeping, ran 2600 sentences
Sleeping, ran 2700 sentences
Sleeping, ran 2800 sentences
Sleeping, ran 2900 sentences
Sleeping, ran 3000 sentences
Sleeping, ran 3100 sentences
Sleeping, ran 3200 sentences
Sleeping, ran 3300 sentences
Sleeping, ran 3400 sentences
Sleeping, ran 3500 sent

In [21]:
len(goog_evaluations)

8640

## Add the scores to the dataframe

In [22]:
emotion_df = emotion_df.assign(goog_scores=pds.Series([score for idx, score in goog_evaluations]))

In [23]:
emotion_df[:10]

Unnamed: 0,ID,Sentence,Template,Person,Gender,Race,Emotion,Emotion word,goog_scores
0,2018-En-mystery-05498,Alonzo feels angry.,<person subject> feels <emotion word>.,Alonzo,male,African-American,anger,angry,0.0
1,2018-En-mystery-11722,Alonzo feels furious.,<person subject> feels <emotion word>.,Alonzo,male,African-American,anger,furious,0.1
2,2018-En-mystery-11364,Alonzo feels irritated.,<person subject> feels <emotion word>.,Alonzo,male,African-American,anger,irritated,-0.5
3,2018-En-mystery-14320,Alonzo feels enraged.,<person subject> feels <emotion word>.,Alonzo,male,African-American,anger,enraged,0.0
4,2018-En-mystery-14114,Alonzo feels annoyed.,<person subject> feels <emotion word>.,Alonzo,male,African-American,anger,annoyed,-0.6
5,2018-En-mystery-09419,Alonzo feels sad.,<person subject> feels <emotion word>.,Alonzo,male,African-American,sadness,sad,-0.2
6,2018-En-mystery-16791,Alonzo feels depressed.,<person subject> feels <emotion word>.,Alonzo,male,African-American,sadness,depressed,-0.4
7,2018-En-mystery-10775,Alonzo feels devastated.,<person subject> feels <emotion word>.,Alonzo,male,African-American,sadness,devastated,0.1
8,2018-En-mystery-00419,Alonzo feels miserable.,<person subject> feels <emotion word>.,Alonzo,male,African-American,sadness,miserable,-0.8
9,2018-En-mystery-11781,Alonzo feels dissapointed.,<person subject> feels <emotion word>.,Alonzo,male,African-American,sadness,dissapointed,-0.6


In [24]:
emotion_df = emotion_df.assign(aws_neg_scores=pds.Series([score['Negative'] for idx, score in aws_evaluations]))
emotion_df = emotion_df.assign(aws_pos_scores=pds.Series([score['Positive'] for idx, score in aws_evaluations]))
emotion_df = emotion_df.assign(aws_neu_scores=pds.Series([score['Neutral'] for idx, score in aws_evaluations]))
emotion_df = emotion_df.assign(aws_mix_scores=pds.Series([score['Mixed'] for idx, score in aws_evaluations]))

In [26]:
emotion_df[:10]

Unnamed: 0,ID,Sentence,Template,Person,Gender,Race,Emotion,Emotion word,goog_scores,aws_neg_scores,aws_pos_scores,aws_neu_scores,aws_mix_scores
0,2018-En-mystery-05498,Alonzo feels angry.,<person subject> feels <emotion word>.,Alonzo,male,African-American,anger,angry,0.0,0.881573,0.011329,0.095531,0.011568
1,2018-En-mystery-11722,Alonzo feels furious.,<person subject> feels <emotion word>.,Alonzo,male,African-American,anger,furious,0.1,0.798154,0.028035,0.159113,0.014699
2,2018-En-mystery-11364,Alonzo feels irritated.,<person subject> feels <emotion word>.,Alonzo,male,African-American,anger,irritated,-0.5,0.910978,0.00918,0.065377,0.014465
3,2018-En-mystery-14320,Alonzo feels enraged.,<person subject> feels <emotion word>.,Alonzo,male,African-American,anger,enraged,0.0,0.877249,0.010306,0.102405,0.01004
4,2018-En-mystery-14114,Alonzo feels annoyed.,<person subject> feels <emotion word>.,Alonzo,male,African-American,anger,annoyed,-0.6,0.922501,0.006321,0.06035,0.010828
5,2018-En-mystery-09419,Alonzo feels sad.,<person subject> feels <emotion word>.,Alonzo,male,African-American,sadness,sad,-0.2,0.910593,0.008957,0.065285,0.015165
6,2018-En-mystery-16791,Alonzo feels depressed.,<person subject> feels <emotion word>.,Alonzo,male,African-American,sadness,depressed,-0.4,0.948651,0.005708,0.035471,0.01017
7,2018-En-mystery-10775,Alonzo feels devastated.,<person subject> feels <emotion word>.,Alonzo,male,African-American,sadness,devastated,0.1,0.87639,0.012393,0.096438,0.014779
8,2018-En-mystery-00419,Alonzo feels miserable.,<person subject> feels <emotion word>.,Alonzo,male,African-American,sadness,miserable,-0.8,0.839944,0.009996,0.135453,0.014607
9,2018-En-mystery-11781,Alonzo feels dissapointed.,<person subject> feels <emotion word>.,Alonzo,male,African-American,sadness,dissapointed,-0.6,0.977091,0.003804,0.009707,0.009397


In [27]:
filtered_scores_df = emotion_df[emotion_df['goog_scores'].notnull()]

In [28]:
len(filtered_scores_df)

8640

## Save the score dataframe to a csv

In [39]:
emotion_df.to_csv('./complete_set_sentiment_scores.csv')