<a href="https://colab.research.google.com/github/OormiC/IMDB_review_sentiment/blob/main/Project1IMDB_LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# IMDB review sentiment analysis using OpenAI
The aim of this project is to test the effects of different types of prompts on IMDB review sentiments and how they can affect the accuracies of the model's predictions by comparing the predicted sentiments with the actual ones.

In [1]:
!pip install --q --upgrade datasets
!pip install openai==0.28

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m388.9/388.9 kB[0m [31m16.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting openai==0.28
  Downloading openai-0.28.0-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.28.0


In [2]:
# Import dependencies and imdb dataset
import datasets
from google.colab import userdata
import pandas as pd
import pandas as pd
import openai
openai.api_key = userdata.get('apiKey')
imdb = datasets.load_dataset("scikit-learn/imdb")

Downloading readme:   0%|          | 0.00/1.22k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/66.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [35]:
# Sample a small number of reviews to conserve tokens
train_df = imdb['train'].to_pandas().sample(n=20)
train_df

Unnamed: 0,review,sentiment
25018,I really don't understand who this movie is ai...,negative
32996,"This ""Debuted"" today on the SciFi channel and ...",negative
33636,"Spin-offs, for somebody who don't know, are no...",positive
1375,I was skimming over the list of films of Richa...,negative
43681,"""Capitães de Abril"" is a very good. The story ...",positive
47001,I watched like 8 or 9 Herzog movies and none o...,negative
6158,'The Big Snit' came into my life complete by a...,positive
578,This movie is so God-awful that it was literal...,negative
29629,My boyfriend and I rented this because we thou...,negative
23070,What a dog of a movie. Noni Hazelhurst's perfo...,negative


In [36]:
# Function to predict the reviews
def classify_sentiment(input_prompt):
    # Initialize list to store predicted sentiments
    predicted_sentiments = []

    # Extract reviews based on the slice_size
    reviews = train_df['review']
    actual_sentiments = train_df['sentiment']

    # Iterate over the sliced items
    for review in reviews:
        # Construct the full prompt
        full_prompt = f"{input_prompt}: {review}\nSentiment:"

        # Pass the full prompt to OpenAI API for sentiment analysis
        response = openai.Completion.create(
            model="gpt-3.5-turbo-instruct",
            prompt=full_prompt,
            max_tokens=1,
            temperature = 0
        )
        # Extract the sentiment label from the output (removing any white space, and ensuring lower case)
        predicted_sentiment = response['choices'][0]['text'].strip().lower()

        # Add the predicted sentiment to the list
        predicted_sentiments.append(predicted_sentiment)

    # Create a DataFrame from the reviews and predicted sentiment labels
    classify_df = pd.DataFrame({
        'Review': reviews,
        'Actual_Sentiment': actual_sentiments,
        'Predicted_Sentiment': predicted_sentiments
    })

    return classify_df

In [37]:
# Function to predict accuracy of model
def accuracy_score(classified_df):
  accurate_predictions = 0
  for index, row in classified_df.iterrows():
    if row["Actual_Sentiment"] == row["Predicted_Sentiment"]:
      accurate_predictions += 1

  prediction_score = (accurate_predictions / 20) * 100
  return prediction_score

In [38]:
# Put in a basic prompt and compare the actual and predicted sentiments
classifying_df = classify_sentiment("Sentiment analysis for the following text")
score = accuracy_score(classifying_df)
print(f"{score} %")

90.0 %


In [41]:
# Pretty good! Now put in a more specific prompt and see if there is any difference
classifying_df = classify_sentiment("Classify the following movie review as either 'positive' or 'negative")
score = accuracy_score(classifying_df)
print(f"{score} %")

95.0 %


In [48]:
# One off! Let's investigate the confusing review
classifying_df["Review"].to_list()[-2]

"One of the most interesting things is that this 1988 film is highly touted as an `in-name only' sequel. There's nothing wrong with that except this: The return of Chevy Chase as Ty Webb. This connects the viewer to this character (from the original Caddyshack in 1980,) and makes fans thinking or wanting Caddyshack II to be similar to the first one.<br /><br />There are rumors that Rodney Dangerfield was supposed to return. He carried a big part of the first film, so his return would have put Caddyshack 2 over the top. Jackie Mason is the `new' Rodney for this movie and does a decent job, even though their comic deliveries are way different. Dan Aykroyd was great but not in the film enough. He should have been involved to the tune of how much screen time Bill Murray got in the first one. Robert Stack (Airplane!) was good in the `new' Ted Knight/Villian role. (We miss you, Ted!) Danny Noonan should have been back. So many others could have returned to show us what happened to their char

In [51]:
# Although the review is 'positive', it's tone is slightly confusing as it has some negative comments; let's engineer a more elaborate prompt
classifying_df = classify_sentiment("Classify the following movie review as either 'positive' or 'negative' but keep in mind that some positive reviews may have some negative language. Make sure to judge how the review reads as a whole")
score = accuracy_score(classifying_df)
print(f"{score} %")
# No change in accuracy score

95.0 %


**Key takeaways**
A more elaborate and specific prompt does better with this specific OpenAI model. Make sure to set the output as one word and address that the classification is binary as either 'positive' or 'negative'. There is still some confusion when a marked 'positive' review has some negative language.