## **Sentiment Analysis**

In this notebook, we essentially focus on the Sentiment Analysis performance metric (more about this on the readme.md file).

### **Steps to run this Notebook:**

- **Step 1:** Download the libraries & Load the data
- **Step 2:** Prompt the text generative LLM - using the prompt given below
- **Step 3:** Computing and calculating the scores & download results
- **Step 4:** Compress all in 1 function

### **Step 1:** Download the libraries & Load the data

In [12]:
# Keep to generate other reviews/labels in the future
# !pip install datasets

In [13]:
# Import Libraries
import pandas as pd
from datasets import load_dataset
import random

In [14]:
# Load the IMDb dataset from Hugging Face datasets
# dataset = load_dataset("imdb")

In [18]:
# Select a subset of reviews and their labels
# random_indices = random.sample(range(len(dataset['train'])), 10)
# random_reviews = [dataset['train'][i]['text'] for i in random_indices]
# random_labels = [dataset['train'][i]['label'] for i in random_indices]

# Convert label indices to actual labels
# label_mapping = {
#     0: 'Negative',
#     1: 'Positive',
#    2: 'Neutral'
# }
# random_labels = [label_mapping[label] for label in random_labels]

In [19]:
# df = pd.DataFrame({'Review': random_reviews, 'Ground_Truth_Label': random_labels})
# df.to_csv("dataset_sample_movie_reviews.csv")

In [24]:
df = pd.read_csv("/content/dataset_sample_movie_reviews.csv") # delete . if on colab

In [25]:
testing_array = df['Review'].values
print(testing_array)
print(len(testing_array))

["I wanted to like this movie. But it falls apart in the middle. the whole premise is a good one and ties up nicely, but the middle runs off tangent. The people I watched with were getting annoyed while it ran off course, and hoping it would end sooner than it did. Another person actually fell asleep during the middle segment! I found myself day dreaming elsewhere during the Schtick parts that had nothing to do with the plot. I bought it for the eye candy and it delivered that well, but it lacks Pixar's writing and soul. I think kids 8 and under will enjoy the ride at face vaule, while missing the plot. People old enough to follow a plot will find it wonders too far to return quickly and easily. Edit out most of the middle section, make it 50 minutes and it would be a solid flick. I wish I had better things to say. But I don't"
 "Even if it were remotely funny, this mouldy waxwork of a film would still be soberingly disrespectful. Stopping just short of digging up the boys' corpses and

###  **Step 2:** Prompt the text generative LLM - using the prompt given below


**Query the text generating llm with the following prompt:** (copy the document as mentionned: PASTE_DOCUMENTS_HERE)

```
Please classify the following 10 sentences: positive, negative or neutral. Here are the sentences:
```
```
PASTE_SENTENCES_HERE. please return the answers as an array
```

###**Step 3:** Computing and calculating the scores & download results


In [32]:
# Add the result
predicted_labels = ['Positive', 'Negative', 'Neutral', 'Negative', 'Negative', 'Negative', 'Positive', 'Neutral', 'Negative', 'Positive']

In [27]:
df['Predicted_Labels'] = predicted_labels
correct_predictions = sum(df['Ground_Truth_Label'] == df['Predicted_Labels'])
total_reviews = len(df)
accuracy = correct_predictions / total_reviews
print("Total Score:", accuracy)
print("\nDataFrame with 10 random reviews:")

Total Score: 0.2

DataFrame with 10 random reviews:


In [28]:
df

Unnamed: 0.1,Unnamed: 0,Review,Ground_Truth_Label,Predicted_Labels
0,0,I wanted to like this movie. But it falls apar...,Negative,Positive
1,1,"Even if it were remotely funny, this mouldy wa...",Negative,Negative
2,2,This is an excellent film and one should not b...,Positive,Neutral
3,3,Despite having known people who are either gre...,Negative,Negative
4,4,One word: suPURRRRb! I don't think I have see ...,Positive,Negative
5,5,The early career of Abe Lincoln is beautifully...,Positive,Negative
6,6,I bought this at tower records after seeing th...,Negative,Positive
7,7,"Steven Spielberg produced, wrote, came up with...",Positive,Neutral
8,8,When I took my seat in the cinema I was in a c...,Positive,Negative
9,9,I wonder how the actors acted in this movie. A...,Negative,Positive


In [29]:
model_name = "Chat GPT"
output_filename = "chat_gpt_sentiment.csv"

In [31]:
new_data = {
    'model_name': model_name,
    'accuracy': [accuracy]
}
new_df = pd.DataFrame(new_data)
new_df.to_csv(output_filename, index=False)
print(new_df)

  model_name  accuracy
0   Chat GPT       0.2


### **Step 4:** Compress all in 1 function

In [25]:
import pandas as pd

def calculate_and_export_sent_analysis(model_name, opt_result):
    df = pd.read_csv("./content/dataset_sample_movie_reviews.csv") # Adjust path if necessary
    # Convert both predicted and ground truth labels to lowercase
    predicted_labels = [label.lower() for label in opt_result]
    df['Predicted_Labels'] = predicted_labels
    df['Ground_Truth_Label'] = df['Ground_Truth_Label'].str.lower()
    # Count correct predictions
    correct_predictions = sum(df['Ground_Truth_Label'] == df['Predicted_Labels'])
    total_reviews = len(df)
    accuracy = correct_predictions / total_reviews
    # Create DataFrame with accuracy and model name
    new_data = {
        'model_name': model_name,
        'accuracy': [accuracy]
    }
    new_df = pd.DataFrame(new_data)
    # Export to CSV
    output_filename = f"./results/{model_name}.csv"
    new_df.to_csv(output_filename, index=False)

    return new_df

In [26]:
model_name = "chat_gpt"
# Generate the results by copy pasting the following prompt:
df = pd.read_csv("./content/dataset_sample_movie_reviews.csv") # Adjust path if necessary
df[["Review"]]
# Click on the icon next to *Review* (convert this dataframe to an interactive table) - then select (right) copy table and select JSON and copy - paste the result in the cell below  replacing **PASTE_DOCUMENTS_HERE**
# Then copy the entire cell and prompt the LLM

Unnamed: 0,Review
0,I wanted to like this movie. But it falls apar...
1,"Even if it were remotely funny, this mouldy wa..."
2,This is an excellent film and one should not b...
3,Despite having known people who are either gre...
4,One word: suPURRRRb! I don't think I have see ...
5,The early career of Abe Lincoln is beautifully...
6,I bought this at tower records after seeing th...
7,"Steven Spielberg produced, wrote, came up with..."
8,When I took my seat in the cinema I was in a c...
9,I wonder how the actors acted in this movie. A...


In [27]:
#Please classify the following 10 sentences: positive, negative or neutral. Here are the sentences please provide an array for answer i.e: predicted_labels = ['positive','negative',...]:: PASTE_SENTENCES_HERE. please return the answers as an array

In [28]:
# Example usage:
predicted_labels = ['negative', 'negative', 'positive', 'negative', 'positive', 'positive', 'negative', 'neutral', 'positive', 'negative']

In [29]:
# Example usage:
calculate_and_export_sent_analysis(model_name, predicted_labels)

Unnamed: 0,model_name,accuracy
0,chat_gpt,0.9


In [30]:
df = pd.read_csv(f"./results/{model_name}.csv")  # delete . if on colab
df

Unnamed: 0,model_name,accuracy
0,chat_gpt,0.9
