# Homework 3
CS 510 Large Language Models - Winter 2024
Bradley Thompson

## 1 - Dataset Annotation

To create the dataset, the authors utilized the Twitter API to retrieve 198 million tweets posted between May 2018 and March 2020. The original dataset included tweets in over thirty languages. It was filtered to consider only tweets with at least three tokens and without URLs to exclude bot tweets and spam advertising. Further, to create a balanced dataset for sentiment analysis, the authors made 8 distinct
monolingual datasets. To ensure balance, a maximum number of tweets were established based on the size of the smallest dataset (3,033 tweets for Hindi) -- all other languages' datasets were pruned. The resulting dataset consists of 1,839 training tweets and 870 testing tweets, with a total size of 24,262 tweets. In terms of label distribution, the dataset was created with an equal distribution across the three labels, and the distribution is maintained throughout the train, test and validation sets.

A few potential weaknesses of the annotation process are:
- Intermixing languages with diverse scripts (only Arabic from a separate language family) could make it tough for the model to perform well on sentiment analysis tasks.
- The labeling process itself is subjective, because an emotional interpretation of a given tweet's sentiment comes down to the annotator, and may not accurately track the tweet's actual sentiment. This is exacerbated by cultural differences between language speaking populations.
- Aside from that, there isn't much information in the paper about the actual annotation process, specifically about the annotators themselves, so we can't know how qualified they were to analyze tweet sentiment.



## 2 - Language Diversity

| Language | Family | Resource Level (High / Low)|
|----------|--------|----------------------------|
|Arabic|Afro-Asiatic|Low|
|English|Indo-European|High|
|French|Indo-European|High|
|German|Indo-European|High|
|Hindi|Indo-European|Low|
|Italian|Indo-European|Low|
|Portoguese|Indo-European|Low|
|Spanish|Indo-European|High

Given this information, I believe that the high resource level languages will perform best, and I believe Arabic will perform significantly
worse. This is a result of the overwhelming representation of data in the "Indo-European" family: Because these languages all will share common traits, it's possible that multi-language models will benefit from an understanding built on shared language patterns. Conversely, the only language (again, Arabic) representing an entirely different language family will garner no such benefit.

Aside from these highlights, below I include the languages ranked from highest to lowest, as well as my thoughts on why I ranked each language as such inline:

| Rank | Language | Notes |
|------|----------|-------|
|1|English|Obvious choice; Highest representation in data online|
|2|Spanish|Close call with French/German, but wins because more Spanish speaking countries, so probably more representation|
|3|French|Similar to German, but subjectively, seems like it could be more emotive, which will help with sentiment analysis|
|4|German|Last slot of the high resource languages|
|5|Hindi|Subjectively, I think this would be the highest representation low resource language|
|6|Portoguese|Close behind Hindi, also because of assumed representation online|
|7|Italian|I imagine that this is the lowest representation Indo-European language|
|8|Arabic|Reasons referenced in preceding paragraph|

On second read-through of the assignment, I realized that I only need to use languages from the Indo-European family. I'm leaving in the Arabic analysis because it
still seems relevant as the language is included as a subset of the available data, regardless.

## 3 - Multilingual Sentiment Analysis

To establish a base case before investigating advanced prompting strategies, I started out with simple / regular prompting for the sentiment analysis task. The models we are studying are OpenAI's ChatGPT 3.5 and Google's Gemini.

The prompt that I used for the base case is provided below:

```
Given this json array of strings containing tweets, can you analyze the tweet's sentiment and label it as POSITIVE, or NEGATIVE?
[Array of texts here]
Please organize the texts alongside the result Sentiment label in a python dict so that I can easily turn the results into a pandas dataframe.
```

And I used the following code cells to import the dataset and organize it by label. I then pulled out data for use in subsequent prompting on the web UI for both models.

In [1]:
from datasets import load_dataset

POSITIVE="positive"
NEUTRAL="neutral"
NEGATIVE="negative"

ID_TO_LABEL = {
    0: NEGATIVE,
    1: NEUTRAL,
    2: POSITIVE,
}

dataset = load_dataset("cardiffnlp/tweet_sentiment_multilingual", "english")
target_dataset = dataset["train"]

# Check out what the data looks like:

positive = [ sample["text"] for sample in target_dataset if ID_TO_LABEL[sample["label"]] == POSITIVE ]
neutral = [ sample["text"] for sample in target_dataset if ID_TO_LABEL[sample["label"]] == NEUTRAL ]
negative = [ sample["text"] for sample in target_dataset if ID_TO_LABEL[sample["label"]] == NEGATIVE ]

In [2]:
print("Positives:", positive[:25])
print("Negatives:", negative[:25])
positive[100]

Positives: ['"Frank Gaffrey\\u002c Cliff May\\u002c Steve Emerson: Brilliant. \\""""Looming Threats: Iran\\u002c Hezbollah Hamas\\"""" is the best #cufidc session I\\u2019ve had thus far." ', '"People always forget the fact that Shawn achieved so much in the age of 16 like his 1st single, EP and FIRST album ALL went #1 on charts" ', '"Winnipeg Sun: ""But make no mistake: Janet Jackson played to win. And did."" #UnbreakableWinnipeg #UnbreakableWorldTour ', 'Better be with Kendrick Lamar ', '#EDsummit15 is an opportunity for candidates like Carly Fiorina to flesh out education platforms for the 1st time: ', "Great article in Rolling Stone on Rod Picott. Check it out!  I'll be interviewing Rod on the August 9th episode... ", 'Fianlly gaming review of Moto G3 and it is a solid performer! 7 Graphics heavy games with Moto G 3rd generation! ', 'Ooshma Garg started her 2nd successful company\\u002c Gobble. Watch how she got to one of her biggest investors--Reid Hoffman ', "I'm getting strong m

'The Apple Watch may just change the way we travel. How do you think smart watches will aid our trips? '

In [3]:
# Base case result aggregation (not bothering with storing in an external file)
import pandas as pd

# CHATGPT

# Used the models themselves to arrange the results in this form for easy use in a dataframe.
pos_results = [
    {"Text": "Frank Gaffrey, Cliff May, Steve Emerson: Brilliant. \"Looming Threats: Iran, Hezbollah Hamas\" is the best #cufidc session I’ve had thus far.", "Sentiment": "POSITIVE"},
    {"Text": "People always forget the fact that Shawn achieved so much in the age of 16 like his 1st single, EP and FIRST album ALL went #1 on charts", "Sentiment": "POSITIVE"},
    {"Text": "Winnipeg Sun: \"But make no mistake: Janet Jackson played to win. And did.\" #UnbreakableWinnipeg #UnbreakableWorldTour", "Sentiment": "POSITIVE"},
    {"Text": "Better be with Kendrick Lamar", "Sentiment": "POSITIVE"},
    {"Text": "#EDsummit15 is an opportunity for candidates like Carly Fiorina to flesh out education platforms for the 1st time", "Sentiment": "POSITIVE"},
    {"Text": "Great article in Rolling Stone on Rod Picott. Check it out! I'll be interviewing Rod on the August 9th episode...", "Sentiment": "POSITIVE"},
    {"Text": "Fianlly gaming review of Moto G3 and it is a solid performer! 7 Graphics heavy games with Moto G 3rd generation!", "Sentiment": "POSITIVE"},
    {"Text": "Ooshma Garg started her 2nd successful company, Gobble. Watch how she got to one of her biggest investors--Reid Hoffman", "Sentiment": "POSITIVE"},
    {"Text": "I'm getting strong mail that Eden Hazard will become a Roo tomorrow. Pick 39 & Garlett to wherever the f*ck Hazard plays soccer. Good deal", "Sentiment": "NEGATIVE"},
    {"Text": "@user @user David Cameron is like god & guide to Syrian refugees. God may bless the people like David Cameron", "Sentiment": "POSITIVE"},
    {"Text": "Last tweet of the night, reading will not lose to spurs tomorrow", "Sentiment": "POSITIVE"},
    {"Text": "Not even 20 pages into Paper Towns and the book centers around the night of May 5th. I know its gonna be a good one now! #cincodemayo", "Sentiment": "POSITIVE"},
    {"Text": "Red Sox off to another great start. They lead the Phillies after two innings at Fenway 6-0, scoring 2 in the 1st and 4 more in the second", "Sentiment": "POSITIVE"},
    {"Text": "Watched a Pride and Prejudice play and then the season finale of the 2nd season of Downton Abbey. Tonight is so British.", "Sentiment": "POSITIVE"},
    {"Text": "Just made a plot for a fan fic about How To Rock. It's gonna be called before gravity 5. I'll post it by Friday can't wait for u to read it.", "Sentiment": "POSITIVE"},
    {"Text": "@user @user ha! Probably google. Get some pro ones done for the next time! Off to @user tomorrow! Can't wait", "Sentiment": "POSITIVE"},
    {"Text": "I went to the Polish Festival on Roncesvalles on Saturday awesome time!", "Sentiment": "POSITIVE"},
    {"Text": "@user I had Arian Foster so I'm on 26 already, but having Luck and Hilton puts you in a good place going into NFL Sunday.", "Sentiment": "POSITIVE"},
    {"Text": "Woot! So excited that I get to watch tonight's game. Go Colts!!", "Sentiment": "POSITIVE"},
    {"Text": "Very excited for @user #SummerSlam paperview this Sunday. Man I really hope the Undertaker tombstones Brock Lesnar back to the UFC", "Sentiment": "POSITIVE"},
    {"Text": "A BIG day at Cardiff Airport tomorrow for Iron Maiden fans! Watch this space to find out what's going...", "Sentiment": "POSITIVE"},
    {"Text": "Did you know that 'Janet Jackson' was Trending Topic on Thursday 3 for 8 hours in Calgary? #trndnl", "Sentiment": "POSITIVE"},
    {"Text": "@user Hi Martina - am at St. Patrick's tomorrow for 10:30 Mass. Hope to see you then.", "Sentiment": "POSITIVE"},
    {"Text": "My Sunday nights haven’t been the same since @user has been gone from Breakout Kings. I wonder what he’s doing next? Can’t wait 2 c!", "Sentiment": "POSITIVE"},
    {"Text": ".@LenKasper: \"Bryant has hit some big home runs...\" [Kris Bryant hits a game-tying two-run HR in the 8th]", "Sentiment": "POSITIVE"}
]
neg_results = [
    {"Text": "okay i\\u2019m sorry but TAYLOR SWIFT LOOKS NOTHING LIKE JACKIE O SO STOP COMPARING THE TWO. c\\u2019mon America aren\\u2019t you sick of her yet? (sorry)", "Sentiment": "NEGATIVE"},
    {"Text": "The tragedy of only thinking up hilarious tweets for the Summer Olympics now is that in four years there may be no place for them.", "Sentiment": "NEGATIVE"},
    {"Text": "it looks like a beautiful night to throw myself off the Brooklyn Bridge ---@Tim_Hecht", "Sentiment": "NEGATIVE"},
    {"Text": "I wanna go to the studio with Ulysses n them tomorrow\\u002cbut i cant. #BARS", "Sentiment": "NEGATIVE"},
    {"Text": "@user a bit frustrating. I don\\u2019t think I\\u2019ve added you on my new PSN account. I\\u2019ll do it tomorrow.", "Sentiment": "NEGATIVE"},
    {"Text": "\"I just sat through Kanye West's MTV speech, what the fuck was that...\"", "Sentiment": "NEGATIVE"},
    {"Text": "Hillary's campaign now reset for the 4th time. Adding humor and heart to a person that has #neither #sadtrombone", "Sentiment": "NEGATIVE"},
    {"Text": "Blow to the Lions...Joel Patfull's out for the season after breaking his hand in Sunday's loss to Adelaide. Surgery Wed morn.  #afl", "Sentiment": "NEGATIVE"},
    {"Text": "\"#BritishBuddhu if Rahul Gandhi is really a British Citizen, I'm very concerned about the comedians out there. They may be without jobs\"", "Sentiment": "NEGATIVE"},
    {"Text": "Hulk Hogan picked the wrong time to be an ass the 1st class of WWE wrestlers are dying off like flies around a zapper #RIPRoddyPiper", "Sentiment": "NEGATIVE"},
    {"Text": "\"This weekend on the Fair &amp; Balanced network, Fox News Sunday's guests are crazy conservative Rick Perry and nutty conservative John Kasich.\"", "Sentiment": "NEGATIVE"},
    {"Text": "@user I installed Madden 16 Deluxe last Monday night for PS4 and still haven't received my packs today nor the reward for opening 50", "Sentiment": "NEGATIVE"},
    {"Text": "\"\\""""men tomorrow you will have one of your hardest patrols...CIF turn in\\"""" lets hope i have everything\"", "Sentiment": "NEGATIVE"},
    {"Text": "\"investigative video reveals Planned Parenthood may be committing infanticide, babies born alive, murdered, and sold.\" ACLJ \"with child\" KJV", "Sentiment": "NEGATIVE"},
    {"Text": "\"Yup, guess what? Citizen weren't the happiest supporter last night, Liverpudlian were. The Fact is: finis in 8th, below Everton Asses #LOL\"", "Sentiment": "NEGATIVE"},
    {"Text": "@user you might not wanna come to anatomy tomorrow\\u002c we have a test lol", "Sentiment": "NEGATIVE"},
    {"Text": "Donald Trump: I will be in D.C. on Wednesday,1 PM, in front of the Capitol, to protest the horrible &amp; incompetent deal being made with Iran.", "Sentiment": "NEGATIVE"},
    {"Text": "@user all I can say is that it was very unrealistic. The 1st movie was better-storyline\\u002c dialogue! And of course \\""""The Grey\\"""" thumbs up!", "Sentiment": "NEGATIVE"},
    {"Text": "@user BY HAVING Seth Rollins as number 1? All credibility is lost. May be the worst WWE champion in history! WWE owns yall?", "Sentiment": "NEGATIVE"},
    {"Text": "Satan worshipers align with Planned Parenthood to defend the practice of chopping up babies for profit", "Sentiment": "NEGATIVE"},
    {"Text": "@user may i also remind you Milan  was one the original clubs punished in the scandle", "Sentiment": "NEGATIVE"},
    {"Text": "@user not one word deploring attacks on Charlie Hebdo nor barbaric nature of islam in the 21st C as long as it's sharia compliant", "Sentiment": "NEGATIVE"},
    {"Text": "Saw it late but Carlos Gomez may have passed Ryan Braun in most hated baseball players", "Sentiment": "NEGATIVE"},
    {"Text": "Christians snapchat story makes me want to kill myself..like I feel like a depressed 8th grader going through that emo phase", "Sentiment": "NEGATIVE"},
    {"Text": "Was just talking about Frank Gifford Sat &amp; sadly he dies on Sun. Maybe I'll be talking about @user today. #gopclowncar", "Sentiment": "NEGATIVE"}
]

pos_df = pd.DataFrame(pos_results)
neg_df = pd.DataFrame(neg_results)

In [4]:
# Base case result aggregation (not bothering with storing in an external file)
import pandas as pd

# Gemini

# Couldn't complete.

In [5]:
pos_df

Unnamed: 0,Text,Sentiment
0,"Frank Gaffrey, Cliff May, Steve Emerson: Brill...",POSITIVE
1,People always forget the fact that Shawn achie...,POSITIVE
2,"Winnipeg Sun: ""But make no mistake: Janet Jack...",POSITIVE
3,Better be with Kendrick Lamar,POSITIVE
4,#EDsummit15 is an opportunity for candidates l...,POSITIVE
5,Great article in Rolling Stone on Rod Picott. ...,POSITIVE
6,Fianlly gaming review of Moto G3 and it is a s...,POSITIVE
7,Ooshma Garg started her 2nd successful company...,POSITIVE
8,I'm getting strong mail that Eden Hazard will ...,NEGATIVE
9,@user @user David Cameron is like god & guide ...,POSITIVE


Right off the bat, ChatGPT performed extremely well without any advanced prompting techniques. It only mislabeled a single sample, and it was not difficult to get it to understand the problem and, further, format the data so that it was easy for me to analyze.

Gemini on the other hand was not able to work with the same base prompt, when I tried the result was "I couldn't complete your request" after a long pause. Further attempts to get Gemini to analyze the sentiment of the texts, without stemming into some of the advanced techniques I intend to evaluate in the next section, all failed. Gemini tried to give me some crappy code to evaluate the sentiment of the texts based on the presence of a few words that it associated into categories (e.g. if text contains "bad" or "hate" then it is negative, if it contains "great" or "love" then it is positive").

So, the base results to compare against are:
- ChatGPT: almost no room for improvement, already at 98% accuracy, 100% precision and 96% recall.
- Gemini: couldn't achieve the task in the base case.

### A - Advanced Prompting Strategies

I decided to focus on two advanced prompting strategies: CoT (kinda auto-CoT) and emotion prompting. I decided to stick to two because there are a decent number of test cases for the assignment given the manny different parts of problem number 3, and because chatGPT already doesn't have much room for improvement.

#### Chain of Thought

My chain of thought prompt was:

```
You're a Twitter employee in charge of analyzing the sentiment of some tweets to determine if they are POSITIVE, or NEGATIVE.
In the following text, break down some key points that point to the tweet being either POSITIVE, or NEGATIVE, and then use those findings
to report the category label (POSITIVE, or NEGATIVE) for the tweet. After providing this information, could you please include a python code
snippet creating a dict containing the input "Text" as well as category label "Sentiment" for each tweet?

Here's the text for you to analyze, in a json array:
[JSON ARRAY HERE]
```

In [6]:
# Chain-of-Thought Result Aggregation
import pandas as pd

# ChatGPT

pos_results = [
    {"Text": "Frank Gaffrey, Cliff May, Steve Emerson: Brilliant. \"Looming Threats: Iran, Hezbollah Hamas\" is the best #cufidc session I’ve had thus far.", "Sentiment": "POSITIVE"},
    {"Text": "People always forget the fact that Shawn achieved so much in the age of 16 like his 1st single, EP and FIRST album ALL went #1 on charts", "Sentiment": "POSITIVE"},
    {"Text": "Winnipeg Sun: \"But make no mistake: Janet Jackson played to win. And did.\" #UnbreakableWinnipeg #UnbreakableWorldTour", "Sentiment": "POSITIVE"},
    {"Text": "Better be with Kendrick Lamar", "Sentiment": "NEUTRAL"},
    {"Text": "#EDsummit15 is an opportunity for candidates like Carly Fiorina to flesh out education platforms for the 1st time:", "Sentiment": "POSITIVE"},
    {"Text": "Great article in Rolling Stone on Rod Picott. Check it out!  I'll be interviewing Rod on the August 9th episode...", "Sentiment": "POSITIVE"},
    {"Text": "Fianlly gaming review of Moto G3 and it is a solid performer! 7 Graphics heavy games with Moto G 3rd generation!", "Sentiment": "POSITIVE"},
    {"Text": "Ooshma Garg started her 2nd successful company, Gobble. Watch how she got to one of her biggest investors--Reid Hoffman", "Sentiment": "POSITIVE"},
    {"Text": "I'm getting strong mail that Eden Hazard will become a Roo tomorrow. Pick 39 & Garlett to wherever the f*ck Hazard plays soccer. Good deal", "Sentiment": "NEGATIVE"},
    {"Text": '@user @user David Cameron is like god & guide to Syrian refugees.God may blees the people like David Cameron', "Sentiment": "POSITIVE"},
    {"Text": 'Last tweet of the night\\u002c reading will not lose to spurs tomorrow', "Sentiment": "NEUTRAL"},
    {"Text": 'Not even 20 pages into Paper Towns and the book centers around the night of May 5th. I know its gonna be a good one now! #cincodemayo', "Sentiment": "POSITIVE"},
    {"Text": '"Red Sox off to another great start. They lead the Phillies after two innings at Fenway 6-0, scoring 2 in the 1st and 4 more in the second"', "Sentiment": "POSITIVE"},
    {"Text": 'Watched a Pride and Prejudice play and then the season finale of the 2nd season of Downton Abbey. Tonight is so British.', "Sentiment": "POSITIVE"},
    {"Text": "Just made a plot for a fan fic about How To Rock. It's gonna be called before gravity 5. I'll post it by Friday can't wait for u to read it.", "Sentiment": "POSITIVE"},
    {"Text": "@user @user ha! Probably google. Get some pro ones done for the next time! Off to @user tomorrow! Can't wait", "Sentiment": "POSITIVE"},
    {"Text": 'I went to the Polish Festival on Roncesvalles on Saturday awesome time!', "Sentiment": "POSITIVE"},
    {"Text": '@user I had Arian Foster so I\'m on 26 already, but having Luck and Hilton puts you in a good place going into NFL Sunday.', "Sentiment": "POSITIVE"},
    {"Text": "Woot! So excited that I get to watch tonight's game. Go Colts!!", "Sentiment": "POSITIVE"},
    {"Text": 'Very excited for @user #SummerSlam paperview this Sunday. Man I really hope the Undertaker tombstones Brock Lesnar back to the UFC', "Sentiment": "POSITIVE"},
    {"Text": "A BIG day at Cardiff Airport tomorrow for Iron Maiden fans! Watch this space to find out what's going...", "Sentiment": "POSITIVE"},
    {"Text": 'Did you know that \'Janet Jackson\' was Trending Topic on Thursday 3 for 8 hours in Calgary? #trndnl', "Sentiment": "POSITIVE"},
    {"Text": "@user Hi Martina - am at St. Patrick's tomorrow for 10:30 Mass. Hope to see you then.", "Sentiment": "POSITIVE"},
    {"Text": 'My Sunday nights haven\\u2019t been the same since @user has been gone from Breakout Kings. I wonder what he\\u2019s doing next? Can\\u2019t wait 2 c!', "Sentiment": "POSITIVE"},
    {"Text": '".@LenKasper: ""Bryant has hit some big home runs..."" [Kris Bryant hits a game-tying two-run HR in the 8th]"', "Sentiment": "POSITIVE"}
]
neg_results = [
    {'Text': 'okay i\\u2019m sorry but TAYLOR SWIFT LOOKS NOTHING LIKE JACKIE O...', 'Sentiment': 'NEGATIVE'},
    {'Text': 'The tragedy of only thinking up hilarious tweets for the Summer Olympics now...', 'Sentiment': 'POSITIVE'},
    {'Text': 'it looks like a beautiful night to throw myself off the Brooklyn Bridge ---@Tim_Hecht', 'Sentiment': 'NEGATIVE'},
    {'Text': 'I wanna go to the studio with Ulysses n them tomorrow\\u002cbut i cant. #BARS', 'Sentiment': 'NEUTRAL'},
    {'Text': '@user a bit frustrating. I don\\u2019t think I\\u2019ve added you on my new PSN account. I\\u2019ll do it tomorrow.', 'Sentiment': 'NEUTRAL'},
    {'Text': '"I just sat through Kanye West\'s MTV speech, what the fuck was that..."', 'Sentiment': 'NEGATIVE'},
    {'Text': "Hillary's campaign now reset for the 4th time. Adding humor and heart to a person that has #neither #sadtrombone", 'Sentiment': 'NEGATIVE'},
    {'Text': "Blow to the Lions...Joel Patfull's out for the season after breaking his hand in Sunday's loss to Adelaide. Surgery Wed morn.  #afl", 'Sentiment': 'NEGATIVE'},
    {'Text': '"#BritishBuddhu if Rahul Gandhi is really a British Citizen, I\'m very concerned about the comedians out there. They may be without jobs"', 'Sentiment': 'NEGATIVE'},
    {'Text': 'Hulk Hogan picked the wrong time to be an ass the 1st class of WWE wrestlers are dying off like flies around a zapper #RIPRoddyPiper', 'Sentiment': 'NEGATIVE'},
    {'Text': '"This weekend on the Fair &amp; Balanced network, Fox News Sunday\'s guests are crazy conservative Rick Perry and nutty conservative John Kasich."', 'Sentiment': 'NEGATIVE'},
    {'Text': "@user I installed Madden 16 Deluxe last Monday night for PS4 and still haven't received my packs today nor the reward for opening 50", 'Sentiment': 'NEGATIVE'},
    {'Text': '"\\""""men tomorrow you will have one of your hardest patrols...CIF turn in\\"""" lets hope i have everything"', 'Sentiment': 'NEUTRAL'},
    {'Text': '"investigative video reveals Planned Parenthood may be committing infanticide, babies born alive, murdered, and sold." ACLJ "with child" KJV', 'Sentiment': 'NEGATIVE'},
    {'Text': '"Yup, guess what? Citizen weren\'t the happiest supporter last night, Liverpudlian were. The Fact is: finis in 8th, below Everton Asses #LOL"', 'Sentiment': 'NEGATIVE'},
    {'Text': '@user you might not wanna come to anatomy tomorrow\\u002c we have a test lol', 'Sentiment': 'NEUTRAL'},
    {'Text': 'Donald Trump: I will be in D.C. on Wednesday,1 PM, in front of the Capitol, to protest the horrible &amp; incompetent deal being made with Iran.', 'Sentiment': 'NEGATIVE'},
    {'Text': '@user all I can say is that it was very unrealistic. The 1st movie was better-storyline\\u002c dialogue! And of course \\""""The Grey\\"""" thumbs up!', 'Sentiment': 'NEGATIVE'},
    {'Text': '@user BY HAVING Seth Rollins as number 1? All credibility is lost. May be the worst WWE champion in history! WWE owns yall?', 'Sentiment': 'NEGATIVE'},
    {'Text': 'Satan worshipers align with Planned Parenthood to defend the practice of chopping up babies for profit', 'Sentiment': 'NEGATIVE'},
    {'Text': '@user may i also remind you Milan  was one the original clubs punished in the scandle', 'Sentiment': 'NEGATIVE'},
    {'Text': "@user not one word deploring attacks on Charlie Hebdo nor barbaric nature of islam in the 21st C as long as it's sharia compliant", 'Sentiment': 'NEGATIVE'},
    {'Text': 'Saw it late but Carlos Gomez may have passed Ryan Braun in most hated baseball players', 'Sentiment': 'NEGATIVE'},
    {'Text': 'Christians snapchat story makes me want to kill myself..like I feel like a depressed 8th grader going through that emo phase', 'Sentiment': 'NEGATIVE'},
    {"Text": "Was just talking about Frank Gifford Sat &amp; sadly he dies on Sun. Maybe I'll be talking about @user today. #gopclowncar", 'Sentiment': 'NEUTRAL'}
]

pos_df = pd.DataFrame(pos_results)
neg_df = pd.DataFrame(neg_results)

In [7]:
def count_label(df, label):
    return len([ val for val in df["Sentiment"] if val == label ])

print(f"True positive: {count_label(pos_df, 'POSITIVE')} | False positive: {count_label(neg_df, 'POSITIVE')}")
print(f"True negative: {count_label(neg_df, 'NEGATIVE')} | False negative: {count_label(pos_df, 'NEGATIVE')}")
print(f"Neutral: {count_label(pos_df, 'NEUTRAL') + count_label(neg_df, 'NEUTRAL')}")

True positive: 22 | False positive: 1
True negative: 19 | False negative: 1
Neutral: 7


Chain of thought prompting, interestingly, made ChatGPT perform worse. It seems the the additional information confused the model, as the inclusion of the key points it came up with introduced nuance to the sentiment analysis task. Subjectively, it seems that the samples that caused it to introduce a third, unspecified label for (NEUTRAL) were all fairly neutral to be in the corresponding set.

For example, from the positive set, ChatGPT categorized "Better be with Kendrick Lamar" as neutral. The annotators classified it is positive. I think positive is more correct based on assumed cultural context; however, the language used here leaves it open to interpretation and the tone could change wildly based on context. For example, I'd consider this a negative sentiment tweet if it was prefaced with something like "Miss me with this modern trash mumble rap -- Better be with Kendrick Lamar." I think given the wild swing in sentiment possible from this tweet, which honestly seems like a snippet of a larger post anyway, the model was justified in labeling this as NEUTRAL. While the example I provided was not the actual tweet, it still goes to show that the model points out and will potentially be more focused on more nuanced language patterns given the chain-of-thought that it builds.

Another example to further this point, and which also highlights the subjectivity dataset annotation, is actually the direct chain-of-thought provided for the following sample:

"The tragedy of only thinking up hilarious tweets for the Summer Olympics now is that in four years there may be no place for them"

ChatGPT's label: Positive

Reasoning: The tweet is humorous and acknowledges the limitation of tweeting about the Summer Olympics only every four years.

After reading ChatGPT's justification, I honestly agree more with it than the annotator. I think this simply highlights the difficulty of sentiment analysis given the weaknesses discussed at the start of the assignment, and indicates that even though my statistical analysis (Accuracy, precision, recall) might prove worse over time as I try out these advanced prompting techniques -- because of the dataset annotation -- the true performance of the models may actually improve in a more holistic sense (not trusting the annotator's word on a tweet's sentiment, but arriving at a more accurate categorization).


In [8]:
# Chain-of-Thought Result Aggregation
import pandas as pd

# Gemini

pos_results = [
    {"Text": "Frank Gaffrey, Cliff May, Steve Emerson: Brilliant. \"Looming Threats: Iran, Hezbollah Hamas\" is the best #cufidc session I’ve had thus far.", "Sentiment": "POSITIVE"},
    {"Text": "People always forget the fact that Shawn achieved so much in the age of 16 like his 1st single, EP and FIRST album ALL went #1 on charts", "Sentiment": "POSITIVE"},
    {"Text": "Winnipeg Sun: \"But make no mistake: Janet Jackson played to win. And did.\" #UnbreakableWinnipeg #UnbreakableWorldTour", "Sentiment": "POSITIVE"},
    {"Text": "Better be with Kendrick Lamar", "Sentiment": "POSITIVE"},
    {"Text": "#EDsummit15 is an opportunity for candidates like Carly Fiorina to flesh out education platforms for the 1st time", "Sentiment": "NEUTRAL"},
    {"Text": "Great article in Rolling Stone on Rod Picott. Check it out! I'll be interviewing Rod on the August 9th episode...", "Sentiment": "POSITIVE"},
    {"Text": "Fianlly gaming review of Moto G3 and it is a solid performer! 7 Graphics heavy games with Moto G 3rd generation!", "Sentiment": "POSITIVE"},
    {"Text": "Ooshma Garg started her 2nd successful company, Gobble. Watch how she got to one of her biggest investors--Reid Hoffman", "Sentiment": "POSITIVE"},
    {"Text": "I'm getting strong mail that Eden Hazard will become a Roo tomorrow. Pick 39 & Garlett to wherever the f*ck Hazard plays soccer. Good deal", "Sentiment": "MIXED"},
    {"Text": "@user @user David Cameron is like god & guide to Syrian refugees. God may bless the people like David Cameron", "Sentiment": "POSITIVE"},
    {"Text": "Last tweet of the night, reading will not lose to spurs tomorrow", "Sentiment": "POSITIVE"},
    {"Text": "Not even 20 pages into Paper Towns and the book centers around the night of May 5th. I know its gonna be a good one now! #cincodemayo", "Sentiment": "POSITIVE"},
    {"Text": "Red Sox off to another great start. They lead the Phillies after two innings at Fenway 6-0, scoring 2 in the 1st and 4 more in the second", "Sentiment": "POSITIVE"},
    {"Text": "Watched a Pride and Prejudice play and then the season finale of the 2nd season of Downton Abbey. Tonight is so British.", "Sentiment": "POSITIVE"},
    {"Text": "Just made a plot for a fan fic about How To Rock. It's gonna be called before gravity 5. I'll post it by Friday can't wait for u to read it.", "Sentiment": "POSITIVE"},
    {"Text": "@user @user ha! Probably google. Get some pro ones done for the next time! Off to @user tomorrow! Can't wait", "Sentiment": "MIXED"},
    {"Text": "I went to the Polish Festival on Roncesvalles on Saturday awesome time!", "Sentiment": "POSITIVE"},
    {"Text": "@user I had Arian Foster so I'm on 26 already, but having Luck and Hilton puts you in a good place going into NFL Sunday.", "Sentiment": "POSITIVE"},
    {"Text": "Woot! So excited that I get to watch tonight's game. Go Colts!!", "Sentiment": "POSITIVE"},
    {"Text": "Very excited for @user #SummerSlam paperview this Sunday. Man I really hope the Undertaker tombstones Brock Lesnar back to the UFC", "Sentiment": "POSITIVE"},
    {"Text": "A BIG day at Cardiff Airport tomorrow for Iron Maiden fans! Watch this space to find out what's going...", "Sentiment": "POSITIVE"},
    {"Text": "Did you know that 'Janet Jackson' was Trending Topic on Thursday 3 for 8 hours in Calgary? #trndnl", "Sentiment": "POSITIVE"},
    {"Text": "@user Hi Martina - am at St. Patrick's tomorrow for 10:30 Mass. Hope to see you then.", "Sentiment": "POSITIVE"},
    {"Text": "My Sunday nights haven’t been the same since @user has been gone from Breakout Kings. I wonder what he’s doing next? Can’t wait 2 c!", "Sentiment": "NEGATIVE"},
    {"Text": ".@LenKasper: \"Bryant has hit some big home runs...\" [Kris Bryant hits a game-tying two-run HR in the 8th]", "Sentiment": "POSITIVE"}
]
neg_results = [
    {"Text": "okay i\\u2019m sorry but TAYLOR SWIFT LOOKS NOTHING LIKE JACKIE O SO STOP COMPARING THE TWO. c\\u2019mon America aren\\u2019t you sick of her yet? (sorry)", "Sentiment": "NEGATIVE"},
    {"Text": "The tragedy of only thinking up hilarious tweets for the Summer Olympics now is that in four years there may be no place for them.", "Sentiment": "NEGATIVE"},
    {"Text": "it looks like a beautiful night to throw myself off the Brooklyn Bridge ---@Tim_Hecht", "Sentiment": "NEGATIVE"},
    {"Text": "I wanna go to the studio with Ulysses n them tomorrow\\u002cbut i cant. #BARS", "Sentiment": "NEGATIVE"},
    {"Text": "@user a bit frustrating. I don\\u2019t think I\\u2019ve added you on my new PSN account. I\\u2019ll do it tomorrow.", "Sentiment": "NEGATIVE"},
    {"Text": "\"I just sat through Kanye West's MTV speech, what the fuck was that...\"", "Sentiment": "NEGATIVE"},
    {"Text": "Hillary's campaign now reset for the 4th time. Adding humor and heart to a person that has #neither #sadtrombone", "Sentiment": "NEGATIVE"},
    {"Text": "Blow to the Lions...Joel Patfull's out for the season after breaking his hand in Sunday's loss to Adelaide. Surgery Wed morn.  #afl", "Sentiment": "NEGATIVE"},
    {"Text": "\"#BritishBuddhu if Rahul Gandhi is really a British Citizen, I'm very concerned about the comedians out there. They may be without jobs\"", "Sentiment": "NEGATIVE"},
    {"Text": "Hulk Hogan picked the wrong time to be an ass the 1st class of WWE wrestlers are dying off like flies around a zapper #RIPRoddyPiper", "Sentiment": "NEGATIVE"},
    {"Text": "\"This weekend on the Fair &amp; Balanced network, Fox News Sunday's guests are crazy conservative Rick Perry and nutty conservative John Kasich.\"", "Sentiment": "NEGATIVE"},
    {"Text": "@user I installed Madden 16 Deluxe last Monday night for PS4 and still haven't received my packs today nor the reward for opening 50", "Sentiment": "NEGATIVE"},
    {"Text": "\"\\""""men tomorrow you will have one of your hardest patrols...CIF turn in\\"""" lets hope i have everything\"", "Sentiment": "NEGATIVE"},
    {"Text": "\"investigative video reveals Planned Parenthood may be committing infanticide, babies born alive, murdered, and sold.\" ACLJ \"with child\" KJV", "Sentiment": "NEGATIVE"},
    {"Text": "\"Yup, guess what? Citizen weren't the happiest supporter last night, Liverpudlian were. The Fact is: finis in 8th, below Everton Asses #LOL\"", "Sentiment": "NEGATIVE"},
    {"Text": "@user you might not wanna come to anatomy tomorrow\\u002c we have a test lol", "Sentiment": "NEGATIVE"},
    {"Text": "Donald Trump: I will be in D.C. on Wednesday,1 PM, in front of the Capitol, to protest the horrible &amp; incompetent deal being made with Iran.", "Sentiment": "NEGATIVE"},
    {"Text": "@user all I can say is that it was very unrealistic. The 1st movie was better-storyline\\u002c dialogue! And of course \\""""The Grey\\"""" thumbs up!", "Sentiment": "NEGATIVE"},
    {"Text": "@user BY HAVING Seth Rollins as number 1? All credibility is lost. May be the worst WWE champion in history! WWE owns yall?", "Sentiment": "NEGATIVE"},
    {"Text": "Satan worshipers align with Planned Parenthood to defend the practice of chopping up babies for profit", "Sentiment": "NEGATIVE"},
    {"Text": "@user may i also remind you Milan  was one the original clubs punished in the scandle", "Sentiment": "NEGATIVE"},
    {"Text": "@user not one word deploring attacks on Charlie Hebdo nor barbaric nature of islam in the 21st C as long as it's sharia compliant", "Sentiment": "NEGATIVE"},
    {"Text": "Saw it late but Carlos Gomez may have passed Ryan Braun in most hated baseball players", "Sentiment": "NEGATIVE"},
    {"Text": "Christians snapchat story makes me want to kill myself..like I feel like a depressed 8th grader going through that emo phase", "Sentiment": "NEGATIVE"},
    {"Text": "Was just talking about Frank Gifford Sat &amp; sadly he dies on Sun. Maybe I'll be talking about @user today. #gopclowncar", "Sentiment": "NEGATIVE"}
]

pos_df = pd.DataFrame(pos_results)
neg_df = pd.DataFrame(neg_results)

In [9]:
def count_label(df, label):
    return len([ val for val in df["Sentiment"] if val == label ])

print(f"True positive: {count_label(pos_df, 'POSITIVE')} | False positive: {count_label(neg_df, 'POSITIVE')}")
print(f"True negative: {count_label(neg_df, 'NEGATIVE')} | False negative: {count_label(pos_df, 'NEGATIVE')}")
print(f"Neutral/Mixed: {count_label(pos_df, 'NEUTRAL') + count_label(neg_df, 'NEUTRAL') + count_label(pos_df, 'MIXED') + count_label(neg_df, 'MIXED')}")

True positive: 21 | False positive: 0
True negative: 25 | False negative: 1
Neutral/Mixed: 3


Chain of thought prompting resulted in a marked improvement in performance for Gemini, in that it was able to complete the task to some extent. Gemini was not able to understand my instructions to help speed up data collection, choosing to provide some bad sentiment analysis code snippets instead; however, it was able to at least provide categories for each text, which I could then copy down manually. Interestingly, it added a fourth label "Mixed" which also wasn't mentioned in the prompt. I grouped in the mixed samples with neutral.

Gemini was relatively accurate across the samples given this strategy, and again like ChatGPT, I think in the instances that it was wrong, it was potentially more accurate than the annotators in reality. Here's an example:

"@user @user ha! Probably google. Get some pro ones done for the next time! Off to @user tomorrow! Can't wait"

Gemini's label: Mixed

Reasoning: Makes fun of someone's photos but expresses excitement to meet someone else.

I believe that Gemini was very capable at capturing the nuance of this text, and potential divergence in intended tone based on context. The annotator it seems was very rash to consider this a solely positive tweet; Perhaps Gemini is right and this tweet was actually making fun of the quality of someone's photos. We can't really tell, but that further captures the fact that there's nuance to this sample which is not captured by the annotators label. 

#### Emotion Prompting

My prompt for emotion prompting was a slightly altered version of the base-case prompt:

```
I really need to complete this important assignment or else I will FAIL SCHOOL! I have this json array of strings containing tweets, can you analyze each tweet's sentiment and label it as POSITIVE, or NEGATIVE? I'm super tight on time, and I'd really appreciate it if you could provide a python dictionary containing the input "Text" in one field, and your categorized label (POSITIVE or NEGATIVE) in a "Sentiment" field. An example output dict would look like this:

{"Text": "a tweet", "Sentiment": "Positive"}

Please help me and paste the list of python dicts containing your analysis in response to this input json array:
[Array of texts here]
```


In [17]:
# Emotion Prompting Result Aggregation
import pandas as pd

# ChatGPT
pos_results = [
    {"Text": '"Frank Gaffrey\\u002c Cliff May\\u002c Steve Emerson: Brilliant. \\""""Looming Threats: Iran\\u002c Hezbollah Hamas\\"""" is the best #cufidc session I\\u2019ve had thus far."', "Sentiment": "POSITIVE"},
    {"Text": '"People always forget the fact that Shawn achieved so much in the age of 16 like his 1st single, EP and FIRST album ALL went #1 on charts"', "Sentiment": "POSITIVE"},
    {"Text": '"Winnipeg Sun: ""But make no mistake: Janet Jackson played to win. And did."" #UnbreakableWinnipeg #UnbreakableWorldTour', "Sentiment": "POSITIVE"},
    {"Text": 'Better be with Kendrick Lamar', "Sentiment": "POSITIVE"},
    {"Text": '#EDsummit15 is an opportunity for candidates like Carly Fiorina to flesh out education platforms for the 1st time:', "Sentiment": "POSITIVE"},
    {"Text": "Great article in Rolling Stone on Rod Picott. Check it out!  I'll be interviewing Rod on the August 9th episode...", "Sentiment": "POSITIVE"},
    {"Text": 'Fianlly gaming review of Moto G3 and it is a solid performer! 7 Graphics heavy games with Moto G 3rd generation!', "Sentiment": "POSITIVE"},
    {"Text": 'Ooshma Garg started her 2nd successful company\\u002c Gobble. Watch how she got to one of her biggest investors--Reid Hoffman', "Sentiment": "POSITIVE"},
    {"Text": "I'm getting strong mail that Eden Hazard will become a Roo tomorrow. Pick 39 & Garlett to wherever the f*ck Hazard plays soccer. Good deal ", "Sentiment": "POSITIVE"},
    {"Text": '@user @user David Cameron is like god &amp; guide to Syrian refugees.God may blees the people like David Cameron ', "Sentiment": "POSITIVE"},
    {"Text": 'Last tweet of the night\\u002c reading will not lose to spurs tomorrow', "Sentiment": "POSITIVE"},
    {"Text": 'Not even 20 pages into Paper Towns and the book centers around the night of May 5th. I know its gonna be a good one now! #cincodemayo ', "Sentiment": "POSITIVE"},
    {"Text": '"Red Sox off to another great start. They lead the Phillies after two innings at Fenway 6-0, scoring 2 in the 1st and 4 more in the second"', "Sentiment": "POSITIVE"},
    {"Text": 'Watched a Pride and Prejudice play and then the season finale of the 2nd season of Downton Abbey. Tonight is so British. ', "Sentiment": "POSITIVE"},
    {"Text": "Just made a plot for a fan fic about How To Rock. It's gonna be called before gravity 5. I'll post it by Friday can't wait for u to read it. ", "Sentiment": "POSITIVE"},
    {"Text": "@user @user ha! Probably google. Get some pro ones done for the next time! Off to @user tomorrow! Can't wait ", "Sentiment": "POSITIVE"},
    {"Text": 'I went to the Polish Festival on Roncesvalles on Saturday awesome time! ', "Sentiment": "POSITIVE"},
    {"Text": '@user I had Arian Foster so I\'m on 26 already, but having Luck and Hilton puts you in a good place going into NFL Sunday." ', "Sentiment": "POSITIVE"},
    {"Text": "Woot!  So excited that I get to watch tonight's game. Go Colts!! ", "Sentiment": "POSITIVE"},
    {"Text": 'Very excited for @user #SummerSlam paperview this Sunday. Man I really hope the Undertaker tombstones Brock Lesnar back to the UFC ', "Sentiment": "POSITIVE"},
    {"Text": "A BIG day at Cardiff Airport tomorrow for Iron Maiden fans! Watch this space to find out what's going... ", "Sentiment": "POSITIVE"},
    {"Text": "Did you know that 'Janet Jackson' was Trending Topic on Thursday 3 for 8 hours in Calgary? #trndnl ", "Sentiment": "POSITIVE"},
    {"Text": "@user Hi Martina - am at St. Patrick's tomorrow for 10:30 Mass.  Hope to see you then. ", "Sentiment": "POSITIVE"},
    {"Text": 'My Sunday nights haven\\u2019t been the same since @user has been gone from Breakout Kings. I wonder what he\\u2019s doing next? Can\\u2019t wait 2 c! ', "Sentiment": "POSITIVE"},
    {"Text": '".@LenKasper: ""Bryant has hit some big home runs...""  [Kris Bryant hits a game-tying two-run HR in the 8th]" ', "Sentiment": "POSITIVE"}
]
neg_results = [
    {"Text": 'okay i\\u2019m sorry but TAYLOR SWIFT LOOKS NOTHING LIKE JACKIE O SO STOP COMPARING THE TWO. c\\u2019mon America aren\\u2019t you sick of her yet? (sorry)', "Sentiment": "NEGATIVE"},
    {"Text": 'The tragedy of only thinking up hilarious tweets for the Summer Olympics now is that in four years there may be no place for them.', "Sentiment": "POSITIVE"},
    {"Text": 'it looks like a beautiful night to throw myself off the Brooklyn Bridge ---@Tim_Hecht', "Sentiment": "NEGATIVE"},
    {"Text": 'I wanna go to the studio with Ulysses n them tomorrow\\u002cbut i cant. #BARS', "Sentiment": "NEUTRAL"},
    {"Text": '@user a bit frustrating. I don\\u2019t think I\\u2019ve added you on my new PSN account. I\\u2019ll do it tomorrow.', "Sentiment": "NEUTRAL"},
    {"Text": '"I just sat through Kanye West\'s MTV speech, what the fuck was that..."', "Sentiment": "NEGATIVE"},
    {"Text": "Hillary's campaign now reset for the 4th time. Adding humor and heart to a person that has #neither #sadtrombone", "Sentiment": "NEGATIVE"},
    {"Text": "Blow to the Lions...Joel Patfull's out for the season after breaking his hand in Sunday's loss to Adelaide. Surgery Wed morn.  #afl", "Sentiment": "NEGATIVE"},
    {"Text": '"#BritishBuddhu if Rahul Gandhi is really a British Citizen, I\'m very concerned about the comedians out there. They may be without jobs"', "Sentiment": "NEGATIVE"},
    {"Text": 'Hulk Hogan picked the wrong time to be an ass the 1st class of WWE wrestlers are dying off like flies around a zapper #RIPRoddyPiper', "Sentiment": "NEGATIVE"},
    {"Text": '"This weekend on the Fair &amp; Balanced network, Fox News Sunday\'s guests are crazy conservative Rick Perry and nutty conservative John Kasich."', "Sentiment": "NEGATIVE"},
    {"Text": "@user I installed Madden 16 Deluxe last Monday night for PS4 and still haven't received my packs today nor the reward for opening 50 ", "Sentiment": "NEGATIVE"},
    {"Text": '"\\""""men tomorrow you will have one of your hardest patrols...CIF turn in\\"""" lets hope i have everything"', "Sentiment": "NEUTRAL"},
    {"Text": '"investigative video reveals Planned Parenthood may be committing infanticide, babies born alive, murdered, and sold." ACLJ "with child" KJV', "Sentiment": "NEGATIVE"},
    {"Text": '"Yup, guess what? Citizen weren\'t the happiest supporter last night, Liverpudlian were. The Fact is: finis in 8th, below Everton Asses #LOL"', "Sentiment": "NEGATIVE"},
    {"Text": '@user you might not wanna come to anatomy tomorrow\\u002c we have a test lol ', "Sentiment": "NEUTRAL"},
    {"Text": 'Donald Trump: I will be in D.C. on Wednesday,1 PM, in front of the Capitol, to protest the horrible &amp; incompetent deal being made with Iran.', "Sentiment": "NEGATIVE"},
    {"Text": '@user all I can say is that it was very unrealistic. The 1st movie was better-storyline\\u002c dialogue! And of course \\""""The Grey\\"""" thumbs up!" ', "Sentiment": "NEGATIVE"},
    {"Text": '@user BY HAVING Seth Rollins as number 1? All credibility is lost. May be the worst WWE champion in history! WWE owns yall? ', "Sentiment": "NEGATIVE"},
    {"Text": 'Satan worshipers align with Planned Parenthood to defend the practice of chopping up babies for profit ', "Sentiment": "NEGATIVE"},
    {"Text": '@user may i also remind you Milan  was one the original clubs punished in the scandle ', "Sentiment": "NEGATIVE"},
    {"Text": "@user not one word deploring attacks on Charlie Hebdo nor barbaric nature of islam in the 21st C as long as it's sharia compliant ", "Sentiment": "NEGATIVE"},
    {"Text": 'Saw it late but Carlos Gomez may have passed Ryan Braun in most hated baseball players ', "Sentiment": "NEGATIVE"},
    {"Text": 'Christians snapchat story makes me want to kill myself..like I feel like a depressed 8th grader going through that emo phase ', "Sentiment": "NEGATIVE"},
    {"Text": "Was just talking about Frank Gifford Sat &amp; sadly he dies on Sun. Maybe I'll be talking about @user today. #gopclowncar ", "Sentiment": "NEUTRAL"}
]

pos_df = pd.DataFrame(pos_results)
neg_df = pd.DataFrame(neg_results)
neg_df

Unnamed: 0,Text,Sentiment
0,okay i\u2019m sorry but TAYLOR SWIFT LOOKS NOT...,NEGATIVE
1,The tragedy of only thinking up hilarious twee...,POSITIVE
2,it looks like a beautiful night to throw mysel...,NEGATIVE
3,I wanna go to the studio with Ulysses n them t...,NEUTRAL
4,@user a bit frustrating. I don\u2019t think I\...,NEUTRAL
5,"""I just sat through Kanye West's MTV speech, w...",NEGATIVE
6,Hillary's campaign now reset for the 4th time....,NEGATIVE
7,Blow to the Lions...Joel Patfull's out for the...,NEGATIVE
8,"""#BritishBuddhu if Rahul Gandhi is really a Br...",NEGATIVE
9,Hulk Hogan picked the wrong time to be an ass ...,NEGATIVE


In [15]:
def count_label(df, label):
    return len([ val for val in df["Sentiment"] if val == label ])

print(f"True positive: {count_label(pos_df, 'POSITIVE')} | False positive: {count_label(neg_df, 'POSITIVE')}")
print(f"True negative: {count_label(neg_df, 'NEGATIVE')} | False negative: {count_label(pos_df, 'NEGATIVE')}")
print(f"Neutral/Mixed: {count_label(pos_df, 'NEUTRAL') + count_label(neg_df, 'NEUTRAL') + count_label(pos_df, 'MIXED') + count_label(neg_df, 'MIXED')}")

True positive: 25 | False positive: 1
True negative: 19 | False negative: 0
Neutral/Mixed: 5


While ChatGPT's performance was statistically hindered, it does still seem to have become more capable in understanding language sentiment nuance in it's findings given the emotional prompt. It flagged many of the same samples in the negative subset that both models pointed out as naunced (neutral or mixed) in chain-of-thought prompting, when the models expressed reasoning as to why the models may not be entirely correct in a category of POSITIVE or NEGATIVE. This exemplifies that the models do have some implicit understanding of the semantics behind the language they analyze, even when not asked to explicitly generate an explanation of those semantics.

Interestingly, the added subjective importance of the emotional prompt has maybe caused the model to spend more computation on evaluating the sentiment of the statements. It would be interesting to spend more dedicated time researching some way to identify the difference in characteristics the model cares about under the different prompt paradigms. For example, when there's no emotional importance expressed, does the model only analyze sentiment based on simple heuristics (contains x, y, z words), and then with the importance of the result stressed, does the model analyze more complex concepts like tone, speech target, context?

In [12]:
# Emotion Prompting Result Aggregation
import pandas as pd

# Gemini

# Unable to complete

Again, Gemini was not able to perform the sentiment analysis task at all. It instead tried to generate a dumb sentiment analysis script in Python that used word presence checks to determine sentiment. The model itself did no sentiment analysis. Gemini seems to need a more thorough prompting strategy to understand the intent of the task.