# Goal of Phase 3 - BART Implementation (Simplified)

## Input
A CSV file (`final.csv`) containing product information and reviews, with columns such as:

- `ProductID`, `Product Name`, `Category`, `Brand`, `Ratings`
- `sentiment` (positive/negative or numerical score)
- `reviews.text`

## Processing
For each product category:

1. Determine the **Top 3 products overall** based on average sentiment.  
2. Identify the **worst product** in the category (lowest average sentiment).

## AI Prompt Construction
Create a structured, multi-part prompt instructing the AI to:

- Recommend the Top 3 products with key differences.  
- Explain why the worst product should be avoided.  
- Base all insights **solely on the review data**.

## Output
- Use **BART** (`facebook/bart-large-cnn`) to generate a **human-readable blog-style article** per category.  
- Save all articles in a CSV (`generated_articles_bart.csv`) with co


### 1. Load & Preprocess Data

In [6]:
import pandas as pd
from transformers import BartTokenizer, BartForConditionalGeneration

# Load CSV
df = pd.read_csv("../outputs/final.csv")

# Map sentiment to numeric if needed
sentiment_map = {"positive": 1, "neutral": 0, "negative": -1}
if df['sentiment'].dtype == object:
    df['sentiment_score'] = df['sentiment'].map(sentiment_map)
else:
    df['sentiment_score'] = df['sentiment']

# Optional: remove duplicates
df = df.drop_duplicates(subset=["ProductID", "reviews.text"])

### 2. Determine Top & Worst Products per Category

In [7]:
# Aggregate by product
product_avg_sentiment = (
    df.groupby(['Cluster', 'ProductID', 'Product Name'])
      .agg(avg_sentiment=('sentiment_score', 'mean'),
           reviews=('reviews.text', lambda x: ' '.join(x)))
      .reset_index()
)

# Get top 3 and worst per cluster
top_worst_per_cluster = {}
for cluster, group in product_avg_sentiment.groupby('Cluster'):
    top3 = group.sort_values('avg_sentiment', ascending=False).head(3)
    worst = group.sort_values('avg_sentiment', ascending=True).head(1)
    top_worst_per_cluster[cluster] = {"top3": top3, "worst": worst}

### 3. Construct AI Prompts for BART

We want a structured prompt that:

- Introduces the category

- Lists Top 3 products & their key differences

- Explains the worst product

- Only uses review data

In [8]:
def construct_bart_input(top3_df, worst_df):
    """
    Concatenate all review texts for Top 3 and Worst product to feed BART.
    """
    reviews_text = ""
    
    # Top 3 products
    reviews_text += "Category Top 3 products reviews and key points:\n"
    for i, row in top3_df.iterrows():
        reviews_text += f"{i+1}. {row['Product Name']} (avg_sentiment={row['avg_sentiment']:.2f}): {row['reviews']}\n\n"
    
    # Worst product
    worst_row = worst_df.iloc[0]
    reviews_text += f"Worst product review:\n{worst_row['Product Name']} (avg_sentiment={worst_row['avg_sentiment']:.2f}): {worst_row['reviews']}\n\n"
    
    return reviews_text


### 4. Generate Summury Using BART

In [9]:
# Load BART
tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")

def generate_article(text):
    """
    Generate a human-readable summary for the concatenated reviews.
    """
    inputs = tokenizer(text, max_length=1024, return_tensors="pt", truncation=True)
    summary_ids = model.generate(
        inputs["input_ids"],
        num_beams=4,
        max_length=512,
        early_stopping=True
    )
    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)


### 5. Loop over clusters and save CSV

In [10]:
articles = []

for cluster, data in top_worst_per_cluster.items():
    # Concatenate all reviews for Top 3 + Worst
    bart_input_text = construct_bart_input(data['top3'], data['worst'])
    
    # Generate blog-style article using BART
    article_text = generate_article(bart_input_text)
    
    # Store result
    articles.append({"Cluster": cluster, "Article": article_text})

# Save to CSV
articles_df = pd.DataFrame(articles)
articles_df.to_csv("../deliverables/generated_articles_bart.csv", index=False)

print("Articles generated and saved to ../deliverables/generated_articles_bart.csv")

Articles generated and saved to ../deliverables/generated_articles_bart.csv


## Product Clusters and Reviews

### This is the result of a previous experiment. Below the number 2 is the final one.

1. 

| Cluster                  | Article                                                                                                                                                                                                                                                                                                                                                                         |                                                                                                                                                                                                                                                  |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Batteries**            | **Top 3 Recommended Products:**<br>1. AmazonBasics AA Performance Alkaline Batteries (48 Count) - Packaging May Vary: Bulk is always the less expensive way to go for products like these. Many users found them reliable and long-lasting.<br><br>**Worst Product:**<br>- Some batteries were defective. Missing backup spring required improvisation (e.g., adding aluminum). |                                                                                                                                                                                                                                                  |
| **Classic Fire Tablets** | **Top 3 Recommended Products:**<br>1. Fire Kids Edition Tablet, 7 Display, Wi-Fi, 16 GB, Pink Kid-Proof Case: Great tablet for kids when traveling.<br>2. Echo (White): I love this product. Perfect size for travel. Great for watching videos.<br><br>**Worst Product:**<br>- [Not specified]                                                                                 |                                                                                                                                                                                                                                                  |
| **Fire HD Tablets**      | **Top 3 Recommended Products:**<br>1. Fire HD 8 Kids Edition Tablet, 8 HD Display, 32 GB, Pink Kid-Proof Case: Replaces older models with improved technology. Compact and user-friendly.<br><br>**Worst Product:**<br>- [Not specified]                                                                                                                                        |                                                                                                                                                                                                                                                  |
| **Kindle E-readers**     | **Top 3 Recommended Products:**<br>1. All-New Kindle Oasis E-reader - 7 High-Resolution Display (300 ppi), Waterproof, Built-In Audible, 32 GB, Wi-Fi + Free Cellular Connectivity: Highly praised for readability and versatility. Users love the high-resolution display.<br><br>**Worst Product:**<br>- [Not specified]                                                      |                                                                                                                                                                                                                                                  |
| **Smart Home / Audio**   | **Top 3 Recommended Products:**<br>1. Amazon Fire TV with 4K Ultra HD and Alexa Voice Remote (Pendant Design)                                                                                                                                                                                                                                                                   | Streaming Media Player: A lazy man's dream when combined with Alexa.<br>2. AmazonBasics Nylon CD/DVD Binder (400 Capacity): Great case to keep everything organized. Holds many CDs, very sturdy.<br><br>**Worst Product:**<br>- [Not specified] |


2.

| Cluster              | Article                                                                                                                                                                                                                                                                                                  |
|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Batteries           | **AmazonBasics AA Performance Alkaline Batteries (48 Count) - Packaging May Vary** (avg_sentiment=1.00): Bulk is always the less expensive way to go for products like these. **AmazonBasics AAA Performance Battery (36 Count)** is missing backup spring so I have to put a pcs of aluminum to make it work. |
| Classic Fire Tablets | **Echo (White)** (avg_sentiment=1.00): I love this product. Perfect size for travel. Great for watching videos as well. **Fire Tablet, 7" Display, Wi-Fi, 8 GB - Includes Special Offers, Magenta**. Good basic tablet for checking email, web browsing, and reading ebooks.                                   |
| Fire HD Tablets      | I love buying from Best Buy and I love the warranties they offer. **All-New Fire HD 8 Kids Edition Tablet, 8" HD Display, 32 GB, Pink Kid-Proof Case** (avg_sentiment=1.00). Stay away from the certified refurbished Amazon Fire TV. The screen looked half corrupted, see the pictures.                        |
| Kindle E-readers     | **All-New Kindle Oasis E-reader - 7" High-Resolution Display (300 ppi), Waterproof, Built-In Audible, 32 GB, Wi-Fi + Free Cellular Connectivity** (avg_sentiment=1.00): The best Kindle ever, for me, is still the huge DX that Amazon used to make.                                                         |
| Smart Home / Audio   | **Amazon Fire TV with 4K Ultra HD and Alexa Voice Remote (Pendant Design) | Streaming Media Player** (avg_sentiment=1.00): A lazy man's dream when it is combined with Alexa. If you get the Harmony Hub you can really impress with home automation.                                                         |


-------

# Review of BART Implementation for Product Review Summaries

## 1. Overview

A workflow was implemented to generate blog-style articles summarizing product reviews per category using **BART (`facebook/bart-large-cnn`)**. The process included:

- Preprocessing data and mapping sentiment labels to numeric scores.  
- Aggregating products by average sentiment and combining review texts.  
- Selecting the top 3 products and the worst product per category.  
- Constructing prompts for BART and generating summaries.  
- Saving the results to a CSV.  

## 2. Results

- Many articles **repeated reviews** instead of providing a summary.  
- Some outputs included the **prompt itself**, showing the model did not generate new content.  
- Product names were sometimes **duplicated or malformed**.  
- Average sentiment scores were all 1.0, so the model had **no contrast** between top and worst products.  
- The worst product was **not clearly highlighted**, and comparisons between top products were minimal.  

## 3. Analysis

- **BART limitations:** Trained for news-style text; struggles with multiple products and concatenated reviews.  
- **Input issues:** Long, repetitive texts caused truncation and confusion.  
- **Prompt structure:** Unstructured prompts led BART to echo instructions.  
- **Lack of sentiment contrast:** Uniform scores prevent meaningful differentiation.  

## 4. Key Takeaways

- BART alone is **not ideal for multi-product comparison**.

## 5. Conclusion

The workflow successfully prepares data, but BARTâ€™s outputs are limited due to **model constraints and input structure**. With structured prompts and selective review sampling, it is possible to generate **readable blog-style articles** highlighting top products, their differences, and the worst product clearly.
