### 🌌**Review Classification Example**

In this notebook, we will build a sentiment classification system for movie reviews using OpenAI's GPT-4 and LangChain.

**Objectives:**
- Load and prepare a dataset of movie reviews  
- Use a Language Model (LLM) to classify each review as `positive`, `negative`, or `neutral`  
- Store and organize the classified reviews in a structured `DataFrame`  
- Calculate and display statistics for each sentiment category  
- Visualize the sentiment distribution using user-friendly and soft-colored plots  

**Let's get started!**


In [None]:
%pip install langchain langchain_openai --upgrade

In [None]:
from langchain_openai.chat_models import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage
import pandas as pd
import matplotlib.pyplot as plt
import os

os.environ['OPENAI_API_KEY'] = "YOUR_OPENAI_API_KEY"

#### **List of Movie Reviews**
This list contains a variety of positive, negative, and neutral movie reviews to be classified by the language model.

In [None]:
reviews = [
    {"review": "This is a great movie. I will watch it again."},
    {"review": "I love this movie!"},
    {"review": "I hate this movie."},
    {"review": "That was a waste of my time."},
    {"review": "I will never get that time back."},
    {"review": "This is a waste of money."},
    {"review": "I will never watch a movie by that director again."},
    {"review": "Absolutely fantastic! A must-watch."},
    {"review": "The storyline was captivating from start to finish."},
    {"review": "The acting was subpar and the plot was predictable."},
    {"review": "This movie stars an actor I know."},
    {"review": "I was on the edge of my seat the whole time."},
    {"review": "The cinematography was breathtaking."},
    {"review": "I wouldn't recommend this movie to anyone."},
    {"review": "A cinematic masterpiece!"},
    {"review": "The characters lacked depth and the dialogue was cheesy."},
    {"review": "A rollercoaster of emotions. Loved every minute of it."},
    {"review": "I fell asleep halfway through."},
    {"review": "The hype around this movie was undeserved."},
    {"review": "A refreshing take on a classic story."},
    {"review": "The pacing was slow and it dragged on."},
    {"review": "A visual treat with a compelling narrative."},
    {"review": "I regret buying a ticket for this."},
    {"review": "I watched the movie last night."},
    {"review": "The soundtrack was the only good thing about this movie."},
    {"review": "A forgettable experience."},
    {"review": "I saw the trailer but haven’t watched the movie yet."},
    {"review": "This movie left a lasting impression on me."},
    {"review": "The movie is two hours long."},
    {"review": "It was a movie."},
]

#### **Initializing the Model and System Prompt**
We start by initializing the ChatOpenAI object using the `gpt-4o-mini` model. This model will handle the task of classifying the sentiment of each movie review.

Next, we define a `SystemMessage`, which acts as the instruction prompt for the model. This message tells the model exactly what to do:
classify each review as **_positive_**, **_negative_**, or **_neutral_**, and return only one of those three words. This ensures that the output is consistent and easy to process in the next steps.

In [None]:
chat = ChatOpenAI(model="gpt-4o-mini")

system_prompt = SystemMessage(content="""
You are responsible for classification of movie reviews. 
Please classify the following review as one of the following sentiments:
    positive
    negative
    neutral

Only return one of the three words: positive, negative, or neutral.
""")

#### **Classifying Each Review**

In this step, we iterate through a list of movie reviews and classify each one using the model we initialized earlier.

For each review:

- We print the review text to the console (useful for debugging or tracking progress).
- We send both the system prompt and the actual review (as a `HumanMessage`) to the model using `chat.invoke(...)`.
- The model returns a response containing the predicted sentiment, which we clean and convert to lowercase.
- We check if the response is one of the expected values: **`positive`**, **`negative`**, or **`neutral`**. If not, we raise an exception to catch unexpected results.
- Finally, we store the original review and its classified sentiment in a list for later use.

This process ensures that each review is classified accurately and that the results are structured consistently.


In [None]:
classified_reviews = []

for review in reviews:
    print(f"Classifying: {review['review']}")
    response = chat.invoke([
        system_prompt,
        HumanMessage(content=review['review'])
    ])
    label = response.content.strip().lower()
    if label not in ['positive', 'negative', 'neutral']:
        raise Exception(f"Unexpected classification: {label}")
    classified_reviews.append({
        'review': review['review'],
        'sentiment': label
    })

#### **Analyzing the Results**

After classifying all the reviews, we convert the list of results into a `Pandas DataFrame`. This makes it easier to analyze and visualize the data.

- `pd.DataFrame(classified_reviews)` creates a table where each row contains a movie review and its corresponding sentiment.
- `value_counts()` is used to count how many times each sentiment (positive, negative, neutral) appears in the dataset.
- `value_counts(normalize=True) * 100` calculates the percentage distribution of each sentiment category, giving us insight into the overall sentiment trends in the reviews.

This analysis step helps summarize and validate the performance of the classification task.

In [None]:
df = pd.DataFrame(classified_reviews)

counts = df['sentiment'].value_counts()
percentages = df['sentiment'].value_counts(normalize=True) * 100

#### **Visualizing the Sentiment Distribution**

To better understand the overall sentiment distribution, we create a pie chart using Matplotlib.

- We define a custom `color_map` to assign a distinct and soft color to each sentiment: green for positive, red for negative, and gray for neutral.
- The `colors` list maps the sentiment labels to their corresponding colors in the same order as the `counts` data.
- We use `plt.pie(...)` to generate the pie chart, showing the percentage of each sentiment category.
- The chart includes labels and percentage values (`autopct='%1.1f%%'`) for readability.
- Finally, we add a title and use `plt.tight_layout()` to ensure the layout fits well in the output window.

This visualization helps us quickly grasp the sentiment balance in the dataset.

In [None]:
color_map = {
    'positive': '#A3D9A5',  # soft green
    'negative': '#F5A9A9',  # soft red
    'neutral': '#D0D0D0'    # soft gray
}
colors = [color_map[sent] for sent in counts.index]

plt.style.use('default')
plt.figure(figsize=(6, 6))
plt.pie(counts, labels=counts.index, autopct='%1.1f%%', colors=colors)
plt.title('Sentiment Distribution')
plt.tight_layout()
plt.show()