# Review Summarization (API-based)
## Generating Recommendation Articles with ChatGPT / Claude
---
**Objective:** Use generative AI to produce per-cluster and per-product summaries from customer reviews.

**Outputs per cluster:**
1. Category overview article — top 3 products, key complaints, worst product
2. Individual product summaries — strengths, weaknesses, recommendation

**API support:** OpenAI (GPT-4o-mini) and Anthropic (Claude Sonnet) — switch with a flag.

## 1. Imports & Configuration

In [1]:
import os
import json
import time
import pandas as pd
import numpy as np
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())

print("Libraries loaded")

Libraries loaded


In [2]:
# ============================================================
# SWITCH PROVIDER HERE
# ============================================================
PROVIDER = "anthropic"  # "openai" or "anthropic"

# Model selection
MODELS = {
    "openai": "gpt-4o-mini",
    "anthropic": "claude-sonnet-4-20250514"
}

# API keys from .env
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")

# Validate
if PROVIDER == "openai" and not OPENAI_API_KEY:
    raise ValueError("OPENAI_API_KEY not found in .env")
if PROVIDER == "anthropic" and not ANTHROPIC_API_KEY:
    raise ValueError("ANTHROPIC_API_KEY not found in .env")

print(f"Provider: {PROVIDER}")
print(f"Model: {MODELS[PROVIDER]}")

Provider: anthropic
Model: claude-sonnet-4-20250514


In [3]:
# Initialize client based on provider
if PROVIDER == "openai":
    from openai import OpenAI
    client = OpenAI(api_key=OPENAI_API_KEY)
    
    def call_llm(system_prompt, user_prompt, max_tokens=2000):
        response = client.chat.completions.create(
            model=MODELS["openai"],
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            max_tokens=max_tokens,
            temperature=0.3
        )
        return response.choices[0].message.content

elif PROVIDER == "anthropic":
    import anthropic
    client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
    
    def call_llm(system_prompt, user_prompt, max_tokens=2000):
        response = client.messages.create(
            model=MODELS["anthropic"],
            system=system_prompt,
            messages=[
                {"role": "user", "content": user_prompt}
            ],
            max_tokens=max_tokens,
            temperature=0.3
        )
        return response.content[0].text

# Quick test
test = call_llm("You are a helpful assistant.", "Say 'API connection successful' and nothing else.")
print(test)

API connection successful


## 2. Load Data

In [None]:
# Load clustered data
df = pd.read_csv("../data/data_with_clusters.csv")

print(f"Dataset: {len(df):,} reviews")
print(f"\nCluster distribution:")
print(df["cluster_name"].value_counts())

Dataset: 28,332 reviews

Cluster distribution:
cluster_name
Fire Tablets             14396
Batteries & Household    12086
E-Readers                 1049
Smart Speakers             632
Accessories                138
Media & Home                31
Name: count, dtype: int64


In [None]:
# If predictions are available, load them too
try:
    pred_df = pd.read_csv("../data/data_with_predictions_v2.csv")
    # Merge predicted sentiment into main df
    if "predicted_label" in pred_df.columns:
        df["predicted_label"] = pred_df["predicted_label"]
        df["predicted_score"] = pred_df["predicted_score"]
        print("Loaded v2 predictions")
        print(df["predicted_label"].value_counts())
except FileNotFoundError:
    print("No predictions file found, using star-based sentiment")
    df["predicted_label"] = df["sentiment"].str.upper()

Loaded v2 predictions
predicted_label
POSITIVE    25476
NEGATIVE     1648
NEUTRAL      1208
Name: count, dtype: int64


## 3. Prepare Review Data for Prompts

We need to be smart about what we send to the API — we can't send all 28K reviews. Strategy:
- Sample representative reviews per product (positive, negative, neutral)
- Include aggregate stats (avg rating, sentiment distribution)
- Prioritize longer, more informative reviews

In [6]:
def get_product_stats(product_df):
    """Compute summary stats for a product."""
    total = len(product_df)
    avg_rating = product_df["reviews.rating"].mean()
    
    sentiment_counts = product_df["predicted_label"].value_counts()
    pct_positive = sentiment_counts.get("POSITIVE", 0) / total * 100
    pct_negative = sentiment_counts.get("NEGATIVE", 0) / total * 100
    pct_neutral = sentiment_counts.get("NEUTRAL", 0) / total * 100
    
    return {
        "total_reviews": total,
        "avg_rating": round(avg_rating, 2),
        "pct_positive": round(pct_positive, 1),
        "pct_negative": round(pct_negative, 1),
        "pct_neutral": round(pct_neutral, 1)
    }


def sample_reviews(product_df, n_positive=5, n_negative=5, n_neutral=3, min_words=5):
    """Sample diverse reviews for a product, prioritizing longer reviews."""
    product_df = product_df.copy()
    product_df["text"] = product_df["reviews.text"].fillna("").astype(str)
    product_df["word_count"] = product_df["text"].str.split().str.len()
    
    # Filter out very short reviews
    product_df = product_df[product_df["word_count"] >= min_words]
    
    sampled = []
    
    for label, n in [("POSITIVE", n_positive), ("NEGATIVE", n_negative), ("NEUTRAL", n_neutral)]:
        subset = product_df[product_df["predicted_label"] == label]
        if len(subset) == 0:
            continue
        # Prioritize longer reviews (more informative)
        subset = subset.sort_values("word_count", ascending=False)
        sample = subset.head(n)
        for _, row in sample.iterrows():
            # Truncate very long reviews
            text = row["text"][:500]
            sampled.append(f"[{label}] (Rating: {row['reviews.rating']}) {text}")
    
    return sampled


print("Helper functions defined")

Helper functions defined


In [7]:
# Build cluster data structure
clusters = {}

for cluster_name in df["cluster_name"].unique():
    cluster_df = df[df["cluster_name"] == cluster_name]
    
    products = []
    for product_name in cluster_df["name"].unique():
        product_df = cluster_df[cluster_df["name"] == product_name]
        stats = get_product_stats(product_df)
        reviews = sample_reviews(product_df)
        
        products.append({
            "name": product_name,
            "stats": stats,
            "sample_reviews": reviews
        })
    
    # Sort by review count descending
    products.sort(key=lambda x: x["stats"]["total_reviews"], reverse=True)
    
    clusters[cluster_name] = {
        "products": products,
        "total_reviews": len(cluster_df),
        "total_products": len(products),
        "avg_rating": round(cluster_df["reviews.rating"].mean(), 2)
    }

# Summary
for name, data in clusters.items():
    print(f"{name}: {data['total_products']} products, {data['total_reviews']:,} reviews, "
          f"avg rating {data['avg_rating']}")

Batteries & Household: 7 products, 12,086 reviews, avg rating 4.45
Media & Home: 8 products, 31 reviews, avg rating 4.39
Smart Speakers: 9 products, 632 reviews, avg rating 4.54
Accessories: 11 products, 138 reviews, avg rating 4.31
E-Readers: 10 products, 1,049 reviews, avg rating 4.66
Fire Tablets: 20 products, 14,396 reviews, avg rating 4.56


## 4. Prompt Templates

In [8]:
SYSTEM_PROMPT = """You are a consumer product analyst writing recommendation articles based on real 
customer reviews. Write in a clear, professional tone. Base all claims on the review data provided.
Do not invent facts or reviews. If data is limited, say so."""


CLUSTER_SUMMARY_PROMPT = """Write a recommendation article for the "{cluster_name}" product category.

Category overview:
- {total_products} products
- {total_reviews:,} total customer reviews
- Average rating: {avg_rating}/5

Products in this category (sorted by review count):
{product_list}

Sample reviews from top products:
{sample_reviews}

Write the article with these sections:
1. **Category Overview** — Brief intro to this product category and overall customer satisfaction
2. **Top 3 Recommended Products** — Based on ratings, review count, and review sentiment. For each: name, why it's recommended, key strengths from reviews
3. **Common Complaints** — Top 3-5 recurring issues across the category based on negative reviews
4. **Worst Rated Product** — Which product has the most complaints and why
5. **Buying Recommendation** — Brief conclusion with advice for shoppers

Keep it under 600 words."""


PRODUCT_SUMMARY_PROMPT = """Write a concise product review summary for:

Product: {product_name}

Stats:
- Total reviews: {total_reviews}
- Average rating: {avg_rating}/5
- Positive: {pct_positive}%
- Negative: {pct_negative}%
- Neutral: {pct_neutral}%

Sample reviews:
{sample_reviews}

Write a summary with:
1. **Overall Verdict** — One sentence recommendation (Buy / Consider / Avoid)
2. **Strengths** — Top 3 positives from reviews
3. **Weaknesses** — Top 3 negatives from reviews
4. **Best For** — What type of buyer this product suits

Keep it under 200 words."""


print("Prompt templates defined")

Prompt templates defined


## 5. Generate Cluster Summaries

In [9]:
def build_cluster_prompt(cluster_name, cluster_data):
    """Build the full prompt for a cluster summary."""
    # Product list with stats
    product_lines = []
    for p in cluster_data["products"]:
        s = p["stats"]
        product_lines.append(
            f"- {p['name'][:80]}: {s['total_reviews']} reviews, "
            f"avg {s['avg_rating']}/5, {s['pct_positive']}% positive, {s['pct_negative']}% negative"
        )
    product_list = "\n".join(product_lines)
    
    # Sample reviews from top 3 products (by review count)
    all_reviews = []
    for p in cluster_data["products"][:3]:
        all_reviews.append(f"\n--- {p['name'][:60]} ---")
        all_reviews.extend(p["sample_reviews"][:8])  # limit per product
    sample_reviews = "\n".join(all_reviews)
    
    return CLUSTER_SUMMARY_PROMPT.format(
        cluster_name=cluster_name,
        total_products=cluster_data["total_products"],
        total_reviews=cluster_data["total_reviews"],
        avg_rating=cluster_data["avg_rating"],
        product_list=product_list,
        sample_reviews=sample_reviews
    )


print("Prompt builder defined")

Prompt builder defined


In [10]:
# Generate cluster summaries
cluster_summaries = {}

for cluster_name, cluster_data in clusters.items():
    print(f"\nGenerating summary for: {cluster_name}...")
    
    prompt = build_cluster_prompt(cluster_name, cluster_data)
    
    try:
        summary = call_llm(SYSTEM_PROMPT, prompt, max_tokens=1500)
        cluster_summaries[cluster_name] = summary
        print(f"  Done ({len(summary)} chars)")
    except Exception as e:
        print(f"  ERROR: {e}")
        cluster_summaries[cluster_name] = f"Error generating summary: {e}"
    
    time.sleep(1)  # rate limit courtesy

print(f"\nGenerated {len(cluster_summaries)} cluster summaries")


Generating summary for: Batteries & Household...
  Done (3816 chars)

Generating summary for: Media & Home...
  Done (3763 chars)

Generating summary for: Smart Speakers...
  Done (3928 chars)

Generating summary for: Accessories...
  Done (3422 chars)

Generating summary for: E-Readers...
  Done (3714 chars)

Generating summary for: Fire Tablets...
  Done (3794 chars)

Generated 6 cluster summaries


In [11]:
# Display cluster summaries
for cluster_name, summary in cluster_summaries.items():
    print("=" * 70)
    print(f"CLUSTER: {cluster_name}")
    print("=" * 70)
    print(summary)
    print()

CLUSTER: Batteries & Household
# Batteries & Household Products: Customer Review Analysis

## Category Overview

The Batteries & Household category features 7 diverse products with over 12,000 customer reviews and a solid 4.45/5 average rating. While the category name suggests batteries and household items, the actual products range from alkaline batteries to pet supplies and office organizers. Customer satisfaction is generally high, with most products receiving positive feedback for value and performance.

## Top 3 Recommended Products

**1. AmazonBasics AAA Performance Alkaline Batteries (36 Count)**
With 8,343 reviews and 4.45/5 stars, these batteries earn top marks for capacity and value. Customers consistently praise their performance compared to name brands like Duracell, with one reviewer noting they're "tied for the top spot" in capacity testing. The convenient packaging—batteries grouped in sets of four—prevents the common problem of loose batteries cluttering drawers.

**2. 

## 6. Generate Product Summaries

Generate individual summaries only for products with enough reviews to be meaningful (>= 20 reviews).

In [12]:
MIN_REVIEWS_FOR_PRODUCT_SUMMARY = 20

# Count eligible products
eligible = []
for cluster_name, cluster_data in clusters.items():
    for p in cluster_data["products"]:
        if p["stats"]["total_reviews"] >= MIN_REVIEWS_FOR_PRODUCT_SUMMARY:
            eligible.append((cluster_name, p))

print(f"Products eligible for individual summary (>= {MIN_REVIEWS_FOR_PRODUCT_SUMMARY} reviews): {len(eligible)}")
for cluster, p in eligible:
    print(f"  [{cluster}] {p['name'][:60]}... ({p['stats']['total_reviews']} reviews)")

Products eligible for individual summary (>= 20 reviews): 33
  [Batteries & Household] AmazonBasics AAA Performance Alkaline Batteries (36 Count)... (8343 reviews)
  [Batteries & Household] AmazonBasics AA Performance Alkaline Batteries (48 Count) - ... (3728 reviews)
  [Smart Speakers] Amazon Tap Smart Assistant Alexaenabled (black) Brand New... (601 reviews)
  [Accessories] Amazon 9W PowerFast Official OEM USB Charger and Power Adapt... (39 reviews)
  [Accessories] AmazonBasics Backpack for Laptops up to 17-inches... (25 reviews)
  [Accessories] AmazonBasics 15.6-Inch Laptop and Tablet Bag... (21 reviews)
  [E-Readers] Kindle Voyage E-reader, 6 High-Resolution Display (300 ppi) ... (505 reviews)
  [E-Readers] Kindle E-reader - White, 6 Glare-Free Touchscreen Display, W... (287 reviews)
  [E-Readers] Kindle Oasis E-reader with Leather Charging Cover - Walnut, ... (62 reviews)
  [E-Readers] Kindle Oasis E-reader with Leather Charging Cover - Black, 6... (55 reviews)
  [E-Readers] Kindl

In [13]:
# Generate product summaries
product_summaries = {}

for cluster_name, product in eligible:
    product_name = product["name"]
    stats = product["stats"]
    
    print(f"Generating: {product_name[:50]}...")
    
    prompt = PRODUCT_SUMMARY_PROMPT.format(
        product_name=product_name,
        total_reviews=stats["total_reviews"],
        avg_rating=stats["avg_rating"],
        pct_positive=stats["pct_positive"],
        pct_negative=stats["pct_negative"],
        pct_neutral=stats["pct_neutral"],
        sample_reviews="\n".join(product["sample_reviews"])
    )
    
    try:
        summary = call_llm(SYSTEM_PROMPT, prompt, max_tokens=600)
        product_summaries[product_name] = {
            "cluster": cluster_name,
            "stats": stats,
            "summary": summary
        }
        print(f"  Done ({len(summary)} chars)")
    except Exception as e:
        print(f"  ERROR: {e}")
        product_summaries[product_name] = {
            "cluster": cluster_name,
            "stats": stats,
            "summary": f"Error: {e}"
        }
    
    time.sleep(0.5)  # rate limit

print(f"\nGenerated {len(product_summaries)} product summaries")

Generating: AmazonBasics AAA Performance Alkaline Batteries (3...
  Done (1341 chars)
Generating: AmazonBasics AA Performance Alkaline Batteries (48...
  Done (1363 chars)
Generating: Amazon Tap Smart Assistant Alexaenabled (black) Br...
  Done (1370 chars)
Generating: Amazon 9W PowerFast Official OEM USB Charger and P...
  Done (1324 chars)
Generating: AmazonBasics Backpack for Laptops up to 17-inches...
  Done (1260 chars)
Generating: AmazonBasics 15.6-Inch Laptop and Tablet Bag...
  Done (1358 chars)
Generating: Kindle Voyage E-reader, 6 High-Resolution Display ...
  Done (1307 chars)
Generating: Kindle E-reader - White, 6 Glare-Free Touchscreen ...
  Done (1370 chars)
Generating: Kindle Oasis E-reader with Leather Charging Cover ...
  Done (1360 chars)
Generating: Kindle Oasis E-reader with Leather Charging Cover ...
  Done (1327 chars)
Generating: Kindle Oasis E-reader with Leather Charging Cover ...
  Done (1337 chars)
Generating: Kindle Voyage E-reader, 6 High-Resolution Display

In [14]:
# Display product summaries
for product_name, data in product_summaries.items():
    print("=" * 70)
    print(f"PRODUCT: {product_name[:70]}")
    print(f"Cluster: {data['cluster']} | Reviews: {data['stats']['total_reviews']} | "
          f"Rating: {data['stats']['avg_rating']}/5")
    print("-" * 70)
    print(data["summary"])
    print()

PRODUCT: AmazonBasics AAA Performance Alkaline Batteries (36 Count)
Cluster: Batteries & Household | Reviews: 8343 | Rating: 4.45/5
----------------------------------------------------------------------
## AmazonBasics AAA Performance Alkaline Batteries Review Summary

**Overall Verdict:** Consider these batteries for basic household needs, but be aware of quality control issues.

**Strengths:**
1. **Excellent value** - Competitive capacity per dollar compared to name brands, with convenient bulk packaging
2. **Smart packaging design** - Batteries come in easy-access cardboard box that prevents spills, unlike traditional blister packs
3. **Good performance in low-drain devices** - Reliable for keyboards, mice, and remote controls with steady power draw

**Weaknesses:**
1. **Quality control problems** - Multiple reports of dead or undervoltage batteries straight from the package
2. **Battery leakage issues** - Several customers experienced acid leakage that damaged devices
3. **Poor per

## 7. Save All Summaries

In [None]:
# Save as JSON for the web app
output = {
    "provider": PROVIDER,
    "model": MODELS[PROVIDER],
    "cluster_summaries": cluster_summaries,
    "product_summaries": product_summaries
}

with open("../data/summaries_api.json", "w", encoding="utf-8") as f:
    json.dump(output, f, indent=2, ensure_ascii=False)

print(f"Saved to summaries_api.json")
print(f"  Cluster summaries: {len(cluster_summaries)}")
print(f"  Product summaries: {len(product_summaries)}")

Saved to summaries_api.json
  Cluster summaries: 6
  Product summaries: 33


In [None]:
# Also save as a readable markdown report
md_lines = []
md_lines.append("# Amazon Product Review Summaries")
md_lines.append(f"\n*Generated with {MODELS[PROVIDER]} ({PROVIDER})*\n")
md_lines.append("---\n")

# Cluster summaries
for cluster_name, summary in cluster_summaries.items():
    md_lines.append(f"## {cluster_name}\n")
    md_lines.append(summary)
    md_lines.append("\n---\n")

# Product summaries
md_lines.append("# Individual Product Summaries\n")

current_cluster = None
for product_name, data in product_summaries.items():
    if data["cluster"] != current_cluster:
        current_cluster = data["cluster"]
        md_lines.append(f"\n## {current_cluster}\n")
    
    md_lines.append(f"### {product_name[:70]}")
    md_lines.append(f"*{data['stats']['total_reviews']} reviews | "
                    f"Avg {data['stats']['avg_rating']}/5 | "
                    f"{data['stats']['pct_positive']}% positive*\n")
    md_lines.append(data["summary"])
    md_lines.append("\n")

report_md = "\n".join(md_lines)

with open("../docs/summaries_report.md", "w", encoding="utf-8") as f:
    f.write(report_md)

print(f"Saved to summaries_report.md ({len(report_md):,} chars)")

Saved to summaries_report.md (70,879 chars)


## 8. Cost Estimate

In [17]:
# Cost summary
# The rough estimation below is based on actual Anthropic console data.
# Input tokens are much larger than output because we send review samples in each prompt.

total_calls = len(cluster_summaries) + len(product_summaries)

total_output_chars = 0
for summary in cluster_summaries.values():
    total_output_chars += len(summary)
for data in product_summaries.values():
    total_output_chars += len(data["summary"])

print(f"Total API calls: {total_calls}")
print(f"Total output chars: {total_output_chars:,}")
print(f"\nActual cost (from Anthropic console): ~$0.43 USD")
print(f"  This covers both input tokens (review samples in prompts)")
print(f"  and output tokens ({total_calls} summaries generated)")
print(f"\nFor reference - approximate per-call cost: ~${0.43/total_calls:.3f}")

Total API calls: 39
Total output chars: 66,552

Actual cost (from Anthropic console): ~$0.43 USD
  This covers both input tokens (review samples in prompts)
  and output tokens (39 summaries generated)

For reference - approximate per-call cost: ~$0.011


## 9. Summary

### Method
- Sampled representative reviews per product (positive, negative, neutral)
- Prioritized longer, more informative reviews (min 5 words, sorted by length)
- Sent product stats + sample reviews to LLM with structured prompts
- Generated both category-level overview articles and individual product summaries

### Results
- **6 cluster summaries** — one recommendation article per category (~600 words each)
- **33 product summaries** — individual verdict/strengths/weaknesses for products with 20+ reviews
- **39 total API calls** at a cost of **$0.43 USD** (Anthropic Claude Sonnet)
- ~$0.011 per summary — very cost-effective

### Provider
- Used: **Anthropic Claude Sonnet** (`claude-sonnet-4-20250514`)
- Switchable to OpenAI GPT-4o-mini via `PROVIDER` flag (would cost ~$0.01-0.02 total)
- Temperature: 0.3 for consistent, factual output

### Files Produced
- `summaries_api.json` — structured data for the web app (cluster + product summaries)
- `summaries_report.md` — readable markdown report with all summaries (70,879 chars)

### Notes
- Products with < 20 reviews were excluded from individual summaries (insufficient data)
- The cost estimate in v1 of this notebook was incorrect (only counted output tokens). Actual cost confirmed via Anthropic console: $0.43 for all 39 calls