# **Author Sentitment Prediction With PerSent Dataset** #

The goal of this project is to use large language models to predict the author sentitment on an entity in news articles.

README:
- The Flan T5 model was called using HuggingFace's API: https://api-inference.huggingface.co/models/google/flan-t5-xxl
- To call the API, please change the authorization token in the code below
- Zero-Shot Prompt: "classify the author sentiment on {row['TARGET_ENTITY']} as positive, neutral, or negative:{newline}{document}"
- Few-Shot Prompt: "classify the author sentiment on {row['TARGET_ENTITY']} as positive, neutral, or negative:{newline}{example1}{newline}{example2}{newline}{example3}{newline}{document}"
- Example 1 : "- Example: John Smith, a leading economist, has criticized President Biden's handling of the economy, saying that his policies have contributed to a major downturn in the job market. : This is a negative sentiment on President Biden."
- Example 2 : "- Example: President Smith has committed to taking strong action against climate change, saying that it is one of the greatest threats facing humanity. : This is a positive sentiment on President Smith."
- Example 3 : "- Example: A panel of experts evaluates Dr. Jane Doe's contributions to medical research. : This is a neutral sentiment on Dr. Jane Doe."
- To run the code below, set the location to your Google Drive folder containing the PerSent dataset found here: https://stonybrooknlp.github.io/PerSenT/

## **Zero Shot flan-t5-xxl** ##

In [None]:
import pandas as pd
import requests
import json
from google.colab import drive
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report

In [None]:
drive.mount('/content/drive',force_remount=True)
%cd "/content/drive/MyDrive/NLP Final Project"

Mounted at /content/drive
/content/drive/MyDrive/NLP Final Project


In [None]:
df = pd.read_csv('random_test.csv')

In [None]:
API_URL = "https://api-inference.huggingface.co/models/google/flan-t5-xxl"
headers = {"Authorization": "Bearer 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"}

In [None]:
def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return json.loads(response.content.decode("utf-8"))

In [None]:
#Manually saving partial predictions results because Huggingface's rate limit.
predictions = {}

In [None]:
newline = '\n'
def run_flan_t5_xxl_prompting(df):
    for index, row in df.iterrows():
        #start from where huggingface limit cut from last time
        if index <= 546:
            continue
        document = row['DOCUMENT']
        if len(document) > 3500:
            document = document[0:3500]
        input = f"classify the author sentiment on {row['TARGET_ENTITY']} as positive, neutral, or negative:{newline}{document}"
        parameters = {'inputs': input}
        prediction = query(parameters)[0]['generated_text']
        predictions[index] = prediction
        print(index, len(document), document[0:10], prediction)
    return df

In [None]:
df = run_flan_t5_xxl_prompting(df)

In [None]:
predictions

In [None]:
dic = pd.DataFrame.from_dict(predictions, orient='index', columns=['prediction'])
dic

In [None]:
dic.to_csv('temp_result.csv')

In [None]:
values = list(predictions.values())

In [None]:
df['predictions'] = values
df.to_csv('random_test_output.csv', index=False)

## **Few Shot flan-t5-xxl** ##

In [None]:
df = pd.read_csv('random_test.csv')

In [None]:
#Manually saving partial predictions results because Huggingface's rate limit.
few_predictions = {}

In [None]:
newline = '\n'
def few_run_flan_t5_xxl_prompting(df):
    for index, row in df.iterrows():
        #start from where huggingface limit cut from last time
        if index <= 507:
            continue
        document = row['DOCUMENT']
        if len(document) > 3200:
            document = document[0:3200]
        example1 = "- Example: John Smith, a leading economist, has criticized President Biden's handling of the economy, saying that his policies have contributed to a major downturn in the job market. : This is a negative sentiment on President Biden."
        example2 = "- Example: President Smith has committed to taking strong action against climate change, saying that it is one of the greatest threats facing humanity. : This is a positive sentiment on President Smith."
        example3 = "- Example: A panel of experts evaluates Dr. Jane Doe's contributions to medical research. : This is a neutral sentiment on Dr. Jane Doe."
        input = f"classify the author sentiment on {row['TARGET_ENTITY']} as positive, neutral, or negative:{newline}{example1}{newline}{example2}{newline}{example3}{newline}{document}"
        parameters = {'inputs': input}
        prediction = query(parameters)[0]['generated_text']
        few_predictions[index] = prediction
        print(index, len(document), document[0:10], prediction)

In [None]:
few_run_flan_t5_xxl_prompting(df)

In [None]:
few_predictions

In [None]:
values = list(few_predictions.values())

In [None]:
df['predictions'] = values
df.to_csv('few_random_test_output.csv', index=False)

##**Performance Analysis**##

In [None]:
df = pd.read_csv('random_test_output.csv')
answers = df['TRUE_SENTIMENT'].tolist()
predictions = df['predictions'].tolist()
labels = ['Positive', 'Neutral', 'Negative']
for index, value in enumerate(predictions):
    predictions[index] = predictions[index].capitalize()

In [None]:
print(classification_report(answers, predictions, labels=labels))

              precision    recall  f1-score   support

    Positive       0.66      0.69      0.68       293
     Neutral       0.44      0.05      0.09       213
    Negative       0.25      0.84      0.38        73

   micro avg       0.47      0.47      0.47       579
   macro avg       0.45      0.53      0.38       579
weighted avg       0.53      0.47      0.42       579



In [None]:
df = pd.read_csv('few_random_test_output.csv')
few_answers = df['TRUE_SENTIMENT'].tolist()
few_predictions = df['predictions'].tolist()
labels = ['Positive', 'Neutral', 'Negative']
for index, value in enumerate(few_predictions):
    few_predictions[index] = few_predictions[index].capitalize()

In [None]:
print(classification_report(few_answers, few_predictions, labels=labels))

              precision    recall  f1-score   support

    Positive       0.64      0.68      0.66       293
     Neutral       0.48      0.05      0.09       213
    Negative       0.24      0.79      0.37        73

   micro avg       0.46      0.46      0.46       579
   macro avg       0.45      0.51      0.37       579
weighted avg       0.53      0.46      0.41       579



##**Hypotheses Testing**##

###**Long Inputs**###

In [None]:
df = pd.read_csv('random_test_output.csv')
articles = df['DOCUMENT'].tolist()
true_sentiment = df['TRUE_SENTIMENT'].tolist()
predictions = df['predictions'].tolist()
for index, value in enumerate(articles):
    if len(articles[index]) > 3500:
        print(articles[index], true_sentiment[index], predictions[index])

In [None]:
labels = ['Positive', 'Neutral', 'Negative']
short_true_sentiment = []
short_predictions = []
for index, value in enumerate(articles):
    if len(articles[index]) <= 3500:
        short_true_sentiment.append(true_sentiment[index])
        short_predictions.append(predictions[index].capitalize())
print(classification_report(short_true_sentiment, short_predictions, labels=labels))

              precision    recall  f1-score   support

    Positive       0.65      0.69      0.67       254
     Neutral       0.46      0.06      0.11       185
    Negative       0.24      0.82      0.38        62

    accuracy                           0.47       501
   macro avg       0.45      0.52      0.38       501
weighted avg       0.53      0.47      0.42       501



###**Neutral Sentiment**###

In [None]:
no_neutral_true_sentiment = []
no_neutral_predictions = []
for index, value in enumerate(articles):
    if true_sentiment[index] != 'Neutral':
        no_neutral_true_sentiment.append(true_sentiment[index])
        no_neutral_predictions.append(predictions[index].capitalize())
print(classification_report(no_neutral_true_sentiment, no_neutral_predictions, labels=labels))

              precision    recall  f1-score   support

    Positive       0.95      0.69      0.80       293
     Neutral       0.00      0.00      0.00         0
    Negative       0.44      0.84      0.58        73

   micro avg       0.72      0.72      0.72       366
   macro avg       0.46      0.51      0.46       366
weighted avg       0.85      0.72      0.75       366



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


###**Mixed Sentiment**###

In [None]:
document_without_negative_sentiment = " Meanwhile  the creation of a joint cabinet and punishment of coup leaders remain dependent on Zelaya's return to the presidency  still far from certain four months into the standoff that emerged from the coup. Union leader Juan Barahona  one of Zelaya's top three negotiators  told a rally of hundreds of the president's followers that the joint cabinet  if indeed formed  would be made up of ministers from both governments. The formation of a national unity government and amnesty for crimes linked to the coup were two key points of the San Jose reconciliation agenda set out in August  whose central tenet calls for Zelaya's return to office. The resumption of talks on Tuesday will come just two days before the October 15 deadline given by the Zelaya camp for his unconditional return to power. Reinstating him any later  supporters say  risks causing a delay in presidential and legislative elections planned for November 29. \"I do not understand the three-day break \" Zelaya's wife Xiomara Castro told AFP from within the Brazilian embassy  where the deposed leader has been holed up since his surprise return to the capital on September 21. A diplomatic delegation from the Organization of American States left Honduras Thursday without resolving the political impasse between Micheletti and Zelaya  who was forced out of the country at gunpoint. A rancher known for his trademark white cowboy hat  Zelaya veered to the left after his election and alarmed conservatives by aligning himself with leftist Venezuelan President Hugo Chavez. They feared Zelaya was seeking to change the constitution to allow himself to seek reelection."
input = f"classify the author sentiment on Manuel Zelaya as positive, neutral, or negative:{newline}{document_without_negative_sentiment}"
parameters = {'inputs': input}
prediction = query(parameters)[0]['generated_text']
print(prediction)

negative
