Steps in this notebook:
1. install libraries and pretrained from HuggingFace
2. Define function to turn scored results into a dataframe
3. Ingest input file, get HuggingFace model output (scored columns), combine with input data and then export

# Install Libraries and Import Deep Learning model

In [None]:
!pip install torch torchvision

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.8/5.8 MB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m19.3 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.4/182.4 KB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.11.1 tokenizers-0.13.2 transformers-4.25.1


In [None]:
from transformers import RobertaTokenizerFast, TFRobertaForSequenceClassification, pipeline

In [None]:
tokenizer = RobertaTokenizerFast.from_pretrained("arpanghoshal/EmoRoBERTa")
model = TFRobertaForSequenceClassification.from_pretrained("arpanghoshal/EmoRoBERTa")

emotion = pipeline('text-classification', 
                    model='arpanghoshal/EmoRoBERTa')


Downloading:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.72k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/501M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

All the layers of TFRobertaForSequenceClassification were initialized from the model checkpoint at arpanghoshal/EmoRoBERTa.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

All the layers of TFRobertaForSequenceClassification were initialized from the model checkpoint at arpanghoshal/EmoRoBERTa.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


In [7]:
emotion_labels = emotion("Happy 2023! For all those who've been laid off, don't be afraid to embrace change and the opportunities it brings for growth.")
print(emotion_labels)

[{'label': 'caring', 'score': 0.5582920908927917}]


# Define Functions to process email subject lines

In [8]:
emotions_list = ['admiration', 'amusement', 'anger', 'annoyance', 'approval',
'caring', 'confusion', 'curiosity', 'desire',
'disappointment',
'disapproval',
'disgust',
'embarrassment',
'excitement',
'fear',
'gratitude',
'grief',
'joy',
'love',
'nervousness',
'optimism',
'pride',
'realization',
'relief',
'remorse',
'sadness',
'surprise',
'neutral']

In [33]:

def build_classification_results_table(list_of_text_inputs):
  ''' 
  function to get classification results for each row for all 28 emotions and turn it into a data frame
  '''

  scores_total = pd.DataFrame(columns=emotions_list)

  for text_input in list_of_text_inputs:
    emotion_object = emotion(text_input, top_k = 28)
    list_o_emotions = []
    list_o_scores = []
    
    for i in range(len(emotion_object)):
      list_o_emotions.append(emotion_object[i]['label'])
      list_o_scores.append(emotion_object[i]['score'])
    
    scores_WIP = pd.DataFrame(zip(list_o_emotions, list_o_scores)).T 
    scores_WIP = scores_WIP.rename(columns=scores_WIP.iloc[0]).loc[1:]
    scores_WIP = scores_WIP.reindex(emotions_list, axis='columns')
    
    scores_total = scores_total.append(scores_WIP, ignore_index=True)
    #scores_total = pd.concat([scores_total, scores_WIP])
    #scores_total.concat(scores_WIP)


  return scores_total

# Load Data 

Importing file via a Google Drive plugin, then reading as a pandas DataFrame

In [28]:
from google.colab import files
 
 
uploaded = files.upload()

Saving input_for_step_2.csv to input_for_step_2 (1).csv


In [29]:
import pandas as pd
import io
 
df = pd.read_csv(io.BytesIO(uploaded['input_for_step_2.csv']))
df.head()

Unnamed: 0,send_dt,send_time,send_number,campaign,email_name,category,text,emails_sent,emails_delivered,undeliverable,...,Audience,send_group,Open_Rate_nw,Click_Rate_nw,Donation_Rate_nw,revenue_1k_new,month,polarity_score,subjectivity_score,processed_text
0,2022-03-10,8:52:36 PM,1.0,DAF,2022-03-10-DAF-Postcard,fundraising,[NONPROFIT]'s Ukraine response and how you can...,968,966,2.0,...,Other,2022-03-10-DAF-P,0.393595,0.004132,0.0,0.0,3,0.5,0.5,nonprofit ukraine response great impact
1,2021-06-29,2:03:54 PM,1.0,EOQ,2021-06-EOQ-deadline-E1-Partner-B,fundraising,You make our work possible,6829,6819,10.0,...,Partners,2021-06-EOQ-dead,0.234734,0.002636,0.001025,563.772148,6,0.0,1.0,work possible
2,2021-06-29,2:03:48 PM,1.0,EOQ,2021-06-EOQ-deadline-E1-FM-PNB-B,fundraising,You make our work possible,84,84,0.0,...,PNB,2021-06-EOQ-dead,0.25,0.0,0.0,0.0,6,0.0,1.0,work possible
3,2021-06-29,2:03:57 PM,1.0,EOQ,2021-06-EOQ-deadline-E1-Partner-A,fundraising,DEADLINE: We?re just short of our goal,6882,6874,8.0,...,Partners,2021-06-EOQ-dead,0.217088,0.003923,0.002034,893.635571,6,0.0,0.3,deadline we?re short goal
4,2021-06-29,2:03:49 PM,1.0,EOQ,2021-06-EOQ-deadline-E1-FM-PNB-A,fundraising,DEADLINE: We?re just short of our goal,80,80,0.0,...,PNB,2021-06-EOQ-dead,0.2,0.0125,0.0,0.0,6,0.0,0.3,deadline we?re short goal


In [30]:
df.at[0,'text']

"[NONPROFIT]'s Ukraine response and how you can make a greater impact"

In [31]:
emotion(df.at[0,'text'])

[{'label': 'neutral', 'score': 0.47515010833740234}]

# Apply emotion scoring using hugging face model

The function to classify each row with k=28 emotions and the assemble the results into a dataframe takes a while to run. Starting with a small sample.

Note a datset of ~3500 rows took 20 minutes to run through the function. Open to suggestions on improving speed! :) 

In [34]:
sample_output = build_classification_results_table(list(df.text[0:10]))

sample_output.head()


Unnamed: 0,admiration,amusement,anger,annoyance,approval,caring,confusion,curiosity,desire,disappointment,...,love,nervousness,optimism,pride,realization,relief,remorse,sadness,surprise,neutral
0,0.00572,0.00064,0.00041,0.001399,0.123966,0.179079,0.004212,0.021661,0.000603,0.000341,...,0.000586,0.001054,0.107223,8.4e-05,0.045712,0.000675,0.004789,0.001135,0.003115,0.47515
1,0.000136,0.000234,0.000102,0.002072,0.000493,3.5e-05,0.000561,0.000131,0.001767,0.002076,...,2.2e-05,5.9e-05,0.016925,1.6e-05,0.003681,1e-05,5.7e-05,0.000554,0.000143,0.969983
2,0.000136,0.000234,0.000102,0.002072,0.000493,3.5e-05,0.000561,0.000131,0.001767,0.002076,...,2.2e-05,5.9e-05,0.016925,1.6e-05,0.003681,1e-05,5.7e-05,0.000554,0.000143,0.969983
3,7.3e-05,0.000539,0.001494,0.000966,0.001029,0.001623,0.010058,0.003588,0.000433,0.132312,...,0.011763,0.002442,0.000285,0.000132,0.0467,2.7e-05,0.220185,0.542841,0.000938,0.004663
4,7.3e-05,0.000539,0.001494,0.000966,0.001029,0.001623,0.010058,0.003588,0.000433,0.132312,...,0.011763,0.002442,0.000285,0.000132,0.0467,2.7e-05,0.220185,0.542841,0.000938,0.004663
5,4.2e-05,0.000168,3.9e-05,6.3e-05,0.00029,3.5e-05,7.9e-05,0.001443,0.000134,2.4e-05,...,6.4e-05,0.001147,2.7e-05,0.000105,0.000281,2.9e-05,5.7e-05,4.8e-05,0.001081,0.002006
6,4.2e-05,0.000168,3.9e-05,6.3e-05,0.00029,3.5e-05,7.9e-05,0.001443,0.000134,2.4e-05,...,6.4e-05,0.001147,2.7e-05,0.000105,0.000281,2.9e-05,5.7e-05,4.8e-05,0.001081,0.002006
7,4.2e-05,0.000168,3.9e-05,6.3e-05,0.00029,3.5e-05,7.9e-05,0.001443,0.000134,2.4e-05,...,6.4e-05,0.001147,2.7e-05,0.000105,0.000281,2.9e-05,5.7e-05,4.8e-05,0.001081,0.002006
8,0.000256,0.000105,0.002292,0.001027,0.001397,0.000814,3.2e-05,6.8e-05,0.000144,0.032269,...,0.000353,0.000483,0.000247,2.3e-05,0.000753,2.9e-05,0.000636,0.909134,8.5e-05,0.038517
9,0.000256,0.000105,0.002292,0.001027,0.001397,0.000814,3.2e-05,6.8e-05,0.000144,0.032269,...,0.000353,0.000483,0.000247,2.3e-05,0.000753,2.9e-05,0.000636,0.909134,8.5e-05,0.038517


In [35]:
emotion_columns = build_classification_results_table(list(df.text))
emotion_columns.shape


(3363, 28)

In [36]:
emotion_columns.head()


Unnamed: 0,admiration,amusement,anger,annoyance,approval,caring,confusion,curiosity,desire,disappointment,...,love,nervousness,optimism,pride,realization,relief,remorse,sadness,surprise,neutral
0,0.00572,0.00064,0.00041,0.001399,0.123966,0.179079,0.004212,0.021661,0.000603,0.000341,...,0.000586,0.001054,0.107223,8.4e-05,0.045712,0.000675,0.004789,0.001135,0.003115,0.47515
1,0.000136,0.000234,0.000102,0.002072,0.000493,3.5e-05,0.000561,0.000131,0.001767,0.002076,...,2.2e-05,5.9e-05,0.016925,1.6e-05,0.003681,1e-05,5.7e-05,0.000554,0.000143,0.969983
2,0.000136,0.000234,0.000102,0.002072,0.000493,3.5e-05,0.000561,0.000131,0.001767,0.002076,...,2.2e-05,5.9e-05,0.016925,1.6e-05,0.003681,1e-05,5.7e-05,0.000554,0.000143,0.969983
3,7.3e-05,0.000539,0.001494,0.000966,0.001029,0.001623,0.010058,0.003588,0.000433,0.132312,...,0.011763,0.002442,0.000285,0.000132,0.0467,2.7e-05,0.220185,0.542841,0.000938,0.004663
4,7.3e-05,0.000539,0.001494,0.000966,0.001029,0.001623,0.010058,0.003588,0.000433,0.132312,...,0.011763,0.002442,0.000285,0.000132,0.0467,2.7e-05,0.220185,0.542841,0.000938,0.004663


In [37]:
full_results = pd.concat([df, emotion_columns], axis = 1)

In [38]:
full_results.to_csv('input_for_step_3a.csv', index=False)

Note -- if running on Google Collab this file ^^ is avalable from the 'Files' on the left-hand side nav