## **Sentiment Analysis**

In this notebook, we essentially focus on the Sentiment Analysis performance metric (more about this on the readme.md file).

### **Steps to run this Notebook:**

- **Step 1:** Download the libraries & Load the data
- **Step 2:** Prompt the text generative LLM - using the prompt given below
- **Step 3:** Computing and calculating the scores & download results
- **Step 4:** Compress all in 1 function

### **Step 1:** Download the libraries & Load the data

In [1]:
# Keep to generate other reviews/labels in the future
# !pip install datasets

In [2]:
# Import Libraries
import pandas as pd
from datasets import load_dataset
import random

In [10]:
# Load the IMDb dataset from Hugging Face datasets
# dataset = load_dataset("imdb")

In [11]:
# Select a subset of reviews and their labels
# random_indices = random.sample(range(len(dataset['train'])), 10)
# random_reviews = [dataset['train'][i]['text'] for i in random_indices]
# random_labels = [dataset['train'][i]['label'] for i in random_indices]

# Convert label indices to actual labels
# label_mapping = {
#     0: 'Negative',
#     1: 'Positive',
#    2: 'Neutral'
# }
# random_labels = [label_mapping[label] for label in random_labels]

In [12]:
# df = pd.DataFrame({'Review': random_reviews, 'Ground_Truth_Label': random_labels})
# df.to_csv("dataset_sample_movie_reviews.csv")

In [9]:
df = pd.read_csv("./content/dataset_sample_movie_reviews_v2.csv") # delete . if on colab

In [13]:
testing_array = df['Review'].values
print(testing_array)
print(len(testing_array))

['Just got through watching this version of "Samhain", and even though I still like it, it\'s nothing like the "rough cut" version I have. If you check the message board, you\'ll see an apology from the director for this cut down version, 79 minutes., and he says he had nothing to do with this R-rated trimmed down edit with a completely new screwed up ending. Christian really doesn\'t need to distant himself that much, because the basic gore elements still stand up, even though highly trimmed down. This is a damn shame, because this had the potential of being one of the goriest and best gore films in years. It still has the porn stars, and the inbreds, and some of the extreme gore can at least be partially seen. I\'m just glad I have that "rough cut", because to me, it\'s a jewel for any gorehounds library. Christian Viel definitely has the skill and vision to deliver the goods, and hopefully his next project will be better produced. The idiots had a near classic in their hands, and sc

###  **Step 2:** Prompt the text generative LLM - using the prompt given below


**Query the text generating llm with the following prompt:** (copy the document as mentionned: PASTE_DOCUMENTS_HERE)

```
Please classify the following 10 sentences: positive, negative or neutral. Here are the sentences:
```
```
PASTE_SENTENCES_HERE. please return the answers as an array
```

### **Step 3:** Computing and calculating the scores & download results


In [14]:
# Add the result
predicted_labels = ['Positive', 'Negative', 'Neutral', 'Negative', 'Negative', 'Negative', 'Positive', 'Neutral', 'Negative', 'Positive']

In [15]:
df['Predicted_Labels'] = predicted_labels
correct_predictions = sum(df['Ground_Truth_Label'] == df['Predicted_Labels'])
total_reviews = len(df)
accuracy = correct_predictions / total_reviews
print("Total Score:", accuracy)
print("\nDataFrame with 10 random reviews:")

Total Score: 0.7

DataFrame with 10 random reviews:


In [16]:
df

Unnamed: 0.1,Unnamed: 0,Review,Ground_Truth_Label,Predicted_Labels
0,0,"Just got through watching this version of ""Sam...",Positive,Positive
1,1,"In this forgettable trifle, the 40-ish Norma S...",Negative,Negative
2,2,Peter O'Toole is a treat to watch in roles whe...,Positive,Neutral
3,3,This is one of the greatest sports movies ever...,Positive,Negative
4,4,First of all this movie is not a comedy; unles...,Negative,Negative
5,5,Without a doubt this is one of the worst films...,Negative,Negative
6,6,'Deliverance' is a brilliant condensed epic of...,Positive,Positive
7,7,The arrival of White Men in Arctic Canada chal...,Positive,Neutral
8,8,"Curiously, Season 6 of the Columbo series cont...",Negative,Negative
9,9,The year 2005 saw no fewer than 3 filmed produ...,Positive,Positive


In [17]:
model_name = "Chat GPT"
output_filename = "chat_gpt_sentiment.csv"

In [18]:
new_data = {
    'model_name': model_name,
    'accuracy': [accuracy]
}
new_df = pd.DataFrame(new_data)
new_df.to_csv(output_filename, index=False)
print(new_df)

  model_name  accuracy
0   Chat GPT       0.7


### **Step 4:** Compress all in 1 function

In [19]:
import pandas as pd

def calculate_and_export_sent_analysis(model_name, opt_result):
    df = pd.read_csv("./content/dataset_sample_movie_reviews.csv") # Adjust path if necessary
    # Convert both predicted and ground truth labels to lowercase
    predicted_labels = [label.lower() for label in opt_result]
    df['Predicted_Labels'] = predicted_labels
    df['Ground_Truth_Label'] = df['Ground_Truth_Label'].str.lower()
    # Count correct predictions
    correct_predictions = sum(df['Ground_Truth_Label'] == df['Predicted_Labels'])
    total_reviews = len(df)
    accuracy = correct_predictions / total_reviews
    # Create DataFrame with accuracy and model name
    new_data = {
        'model_name': model_name,
        'accuracy': [accuracy]
    }
    new_df = pd.DataFrame(new_data)
    # Export to CSV
    output_filename = f"./results/{model_name}.csv"
    new_df.to_csv(output_filename, index=False)

    return new_df

In [23]:
model_name = "chat_gpt"
# Generate the results by copy pasting the following prompt:
df = pd.read_csv("./content/dataset_sample_movie_reviews.csv") # Adjust path if necessary
df[["Review",'Ground_Truth_Label']]
# Click on the icon next to *Review* (convert this dataframe to an interactive table) - then select (right) copy table and select JSON and copy - paste the result in the cell below  replacing **PASTE_DOCUMENTS_HERE**
# Then copy the entire cell and prompt the LLM

Unnamed: 0,Review,Ground_Truth_Label
0,I wanted to like this movie. But it falls apar...,Negative
1,"Even if it were remotely funny, this mouldy wa...",Negative
2,This is an excellent film and one should not b...,Positive
3,Despite having known people who are either gre...,Negative
4,One word: suPURRRRb! I don't think I have see ...,Positive
5,The early career of Abe Lincoln is beautifully...,Positive
6,I bought this at tower records after seeing th...,Negative
7,"Steven Spielberg produced, wrote, came up with...",Positive
8,When I took my seat in the cinema I was in a c...,Positive
9,I wonder how the actors acted in this movie. A...,Negative


In [27]:
#Please classify the following 10 sentences: positive or negative. Here are the sentences please provide an array for answer i.e: predicted_labels = ['positive','negative',...]:: PASTE_SENTENCES_HERE. please return the answers as an array

In [26]:
# Example usage:
predicted_labels = ['negative', 'negative', 'positive', 'negative', 'positive', 'positive', 'negative', 'negative', 'positive', 'negative']

In [27]:
# Example usage:
calculate_and_export_sent_analysis(model_name, predicted_labels)

Unnamed: 0,model_name,accuracy
0,chat_gpt,0.9


In [30]:
df = pd.read_csv(f"./results/{model_name}.csv")  # delete . if on colab
df

Unnamed: 0,model_name,accuracy
0,chat_gpt,0.9
