# Sentiment Analysis
in patnership with `Jupyter AI`

This notebook provides step-by-step instructions for replicating what Andrew did in the first demo. Please follow along, and feel free to play with different variations too! 


## Step 1: Understand Code for Data Analysis

- Run the cell below to load the customer reviews into a Pandas DataFrame

In [1]:
import pandas as pd

df = pd.read_csv("data/customer_reviews.csv")
df.head(5)

Unnamed: 0,product_id,customer_id,rating,review_text,review_date,product_category,verified_purchase
0,P001,C1547,5,Absolutely love this product! Exceeded all my ...,2024-01-15,Electronics,True
1,P001,C2891,4,Great product overall. Works as advertised. On...,2024-01-18,Electronics,True
2,P002,C3456,1,Very disappointed. Stopped working after just ...,2024-01-20,Electronics,True
3,P002,C4123,2,Not worth the price. Quality feels cheap and i...,2024-01-22,Electronics,False
4,P003,C5678,5,Perfect! Exactly what I needed. The design is ...,2024-01-25,Home & Kitchen,True


- Run the cell below to display the analysis
- Create a new chat by clicking `+Chat` (`Chat 3`)
- Drag and drop the code cell below to the chat interface, and use a prompt like this:
  > What does this code cell do? (starting with creating sentiment_category column)

In [2]:
# Create sentiment categories based on numeric rating
df['sentiment_category'] = pd.cut(
    df['rating'],
    bins=[0, 2, 3, 5],
    labels=['Negative', 'Neutral', 'Positive']
)

# Group by category and sentiment, aggregating key metrics
analysis = (
    df.groupby(['product_category', 'sentiment_category'])
      .agg(
          rating_count=('rating', 'count'),
          product_id_nunique=('product_id', 'nunique')
      )
      .reset_index()
)

# Fill NaNs resulting from empty groups with zeros
analysis = analysis.fillna({
    'rating_count': 0,
    'product_id_nunique': 0
})

# Calculate percentage of negative reviews safely
category_counts = df.groupby('product_category').size()
negative_counts = (
    df[df['sentiment_category'] == 'Negative']
    .groupby('product_category')
    .size()
)
negative_pct = (negative_counts / category_counts * 100).fillna(0)

# Display results
print("Review analysis by category and sentiment:")
print(analysis)
print("\nPercentage of negative reviews by category:")
print(negative_pct.round(1))

Review analysis by category and sentiment:
   product_category sentiment_category  rating_count  product_id_nunique
0             Books           Negative             2                   1
1             Books            Neutral             0                   0
2             Books           Positive             2                   1
3          Clothing           Negative             1                   1
4          Clothing            Neutral             1                   1
5          Clothing           Positive             5                   3
6       Electronics           Negative             5                   3
7       Electronics            Neutral             0                   0
8       Electronics           Positive             6                   4
9   Health & Beauty           Negative             0                   0
10  Health & Beauty            Neutral             1                   1
11  Health & Beauty           Positive             5                   4
12   Hom

  df.groupby(['product_category', 'sentiment_category'])


## Step 2: Summarize Customer Reviews

- In the same chat you created in the previous step, drag and drop the Markdown cell below and use a prompt like this:
  > Follow the instructions in the Markdown cell to generate code (3 notebook cells) to analyze the data.
___

Create code that extracts reviews from a Pandas DataFrame and summarizes the issues reported by customers. Create a code cell for each step:

**Step 1: Extract the reviews in a list**

The reviews are saved in the column `review_text` of the DataFrame `df`. Extract all reviews in a list `combined_reviews`.

**Step 2: Examine the reviews** 

Use OpenAI’s gpt-4.1-mini model to summarize the content in the reviews. The goal is to output a summary of product and service issues. To call gpt, assume the OPENAI_API_KEY is defined in .env, and use python-dotenv to load it. Use this prompt:

""" 
You are analyzing customer reviews to identify product and service issues.

Below are customer reviews for a set of products. Please analyze them and provide:

1. **Common Product Issues**: List the main problems customers reported about the product itself (quality, functionality, features, etc.)
2. **Common Service Issues**: List the main problems customers reported about the service (shipping, customer support, packaging, etc.)
3. **Frequency**: For each issue, estimate how often it appears (e.g., "mentioned frequently", "occasional complaint", "rare issue")

Format your response clearly with headers and bullet points.

CUSTOMER REVIEWS:
{combined_reviews} 

"""

**Step 3: Print & Save the summary** 

Print the summary and save it as a markdown file `customer_reviews.md`


___

### Step 2.1 - Extract the reviews in a list

In [3]:
# Step 1: Extract reviews from the DataFrame
combined_reviews = df["review_text"].tolist()

# Display the first few reviews to verify
combined_reviews[:5]

['Absolutely love this product! Exceeded all my expectations. The quality is outstanding and it arrived faster than promised. Will definitely buy again!',
 'Great product overall. Works as advertised. Only minor issue is the instructions could be clearer, but figured it out eventually.',
 'Very disappointed. Stopped working after just 2 weeks. Customer service was unhelpful. Would not recommend.',
 "Not worth the price. Quality feels cheap and it doesn't perform as well as described. Returning it.",
 "Perfect! Exactly what I needed. The design is beautiful and it's very user-friendly. Highly recommend!"]

### Step 2.2 - Use GPT-4.1-mini model to summarize product and service issues

In [4]:
# Step 2: Summarize product and service issues using OpenAI's gpt-4.1-mini

from openai import OpenAI
from dotenv import load_dotenv
import os

# Load OpenAI API key from environment (already available in this environment)
load_dotenv()

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

prompt = f"""
You are analyzing customer reviews to identify product and service issues.

Below are customer reviews for a set of products. Please analyze them and provide:

1. **Common Product Issues**: List the main problems customers reported about the product itself (quality, functionality, features, etc.)
2. **Common Service Issues**: List the main problems customers reported about the service (shipping, customer support, packaging, etc.)
3. **Frequency**: For each issue, estimate how often it appears (e.g., "mentioned frequently", "occasional complaint", "rare issue")

Format your response clearly with headers and bullet points.

CUSTOMER REVIEWS:
{combined_reviews}
"""

response = client.chat.completions.create(
    model="gpt-4.1-mini",
    messages=[{"role": "user", "content": prompt}]
)

summary = response.choices[0].message.content

### Step 2.3 - Print and Save the summary

In [None]:
# Step 3: Print and save the summary to a Markdown file

print(summary)

with open("outputs/customer_reviews.md", "w", encoding="utf-8") as f:
    f.write(summary)

print("\n✅ Summary saved to 'customer_reviews.md'.")

**Analysis of Customer Reviews**

---

### 1. Common Product Issues

- **Poor Quality / Durability**
  - Many customers reported the product breaking or stopping working shortly after purchase ("stopped working after just 2 weeks", "broke on first use", "broke immediately", "poor quality materials").
  - Descriptions such as "cheaply made", "feels like it will break any moment", and "terrible quality" are frequent.
  - Appears **frequently** in negative reviews.

- **Misleading Product Description / Photos**
  - Several reviews mention that the product did not match the photos or descriptions online ("not what I ordered", "looks nothing like the photos", "misleading").
  - Complaints about false or exaggerated product claims.
  - Mentioned **occasionally to frequently**.

- **Limited Features / Underwhelming Performance**
  - Some customers feel the product only "does the job" but lacks exciting features or does not meet price expectations.
  - Comments like "nothing special," "expecte