# Goal of Phase 3 - BART Implementation (Simplified)

## Input
A CSV file (`final.csv`) containing product information and reviews, with columns such as:

- `ProductID`, `Product Name`, `Category`, `Brand`, `Ratings`
- `sentiment` (positive/negative or numerical score)
- `reviews.text`

## Processing
For each product category:

1. Determine the **Top 3 products overall** based on average sentiment.  
2. Identify the **worst product** in the category (lowest average sentiment).

## AI Prompt Construction
Create a structured, multi-part prompt instructing the AI to:

- Recommend the Top 3 products with key differences.  
- Explain why the worst product should be avoided.  
- Base all insights **solely on the review data**.

## Output
- Use **BART** (`facebook/bart-large-cnn`) to generate a **human-readable blog-style article** per category.  
- Save all articles in a CSV (`generated_articles_bart.csv`) with co


### 1. Load & Preprocess Data

In [43]:
import pandas as pd
from transformers import BartTokenizer, BartForConditionalGeneration

# Load CSV
df = pd.read_csv("../outputs/final.csv")

# Map sentiment to numeric if needed
sentiment_map = {"positive": 1, "neutral": 0, "negative": -1}
if df['sentiment'].dtype == object:
    df['sentiment_score'] = df['sentiment'].map(sentiment_map)
else:
    df['sentiment_score'] = df['sentiment']

# Optional: remove duplicates
df = df.drop_duplicates(subset=["ProductID", "reviews.text"])

### 2. Determine Top & Worst Products per Category

In [44]:
# Aggregate by product
product_avg_sentiment = (
    df.groupby(['Cluster', 'ProductID', 'Product Name'])
      .agg(avg_sentiment=('sentiment_score', 'mean'),
           reviews=('reviews.text', lambda x: ' '.join(x)))
      .reset_index()
)

# Get top 3 and worst per cluster
top_worst_per_cluster = {}
for cluster, group in product_avg_sentiment.groupby('Cluster'):
    top3 = group.sort_values('avg_sentiment', ascending=False).head(3)
    worst = group.sort_values('avg_sentiment', ascending=True).head(1)
    top_worst_per_cluster[cluster] = {"top3": top3, "worst": worst}

### 3. Construct AI Prompts for BART

We want a structured prompt that:

- Introduces the category

- Lists Top 3 products & their key differences

- Explains the worst product

- Only uses review data

In [45]:
def construct_bart_input(top3_df, worst_df):
    """
    Concatenate all review texts for Top 3 and Worst product to feed BART.
    """
    reviews_text = ""
    
    # Top 3 products
    reviews_text += "Top 3 products reviews:\n"
    for i, row in top3_df.iterrows():
        reviews_text += f"{i+1}. {row['Product Name']}: {row['reviews']}\n\n"
    
    # Worst product
    worst_row = worst_df.iloc[0]
    reviews_text += f"Worst product review:\n{worst_row['Product Name']}: {worst_row['reviews']}\n\n"
    
    return reviews_text

### 4. Generate Summury Using BART

In [46]:
from transformers import BartTokenizer, BartForConditionalGeneration
import torch

# Load BART
tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")

def generate_article(text):
    """
    Generate a human-readable summary for the concatenated reviews.
    """
    inputs = tokenizer(text, max_length=1024, return_tensors="pt", truncation=True)
    summary_ids = model.generate(
        inputs["input_ids"],
        num_beams=4,
        max_length=512,
        early_stopping=True
    )
    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)

### 5. Loop over clusters and save CSV

In [48]:
articles = []

for cluster, data in top_worst_per_cluster.items():
    # Concatenate all reviews for Top 3 + Worst
    bart_input_text = construct_bart_input(data['top3'], data['worst'])
    
    # Generate blog-style article using BART
    article_text = generate_article(bart_input_text)
    
    # Store result
    articles.append({"Cluster": cluster, "Article": article_text})

# Save to CSV
articles_df = pd.DataFrame(articles)
articles_df.to_csv("../deliverables/generated_articles_bart.csv", index=False)

print("Articles generated and saved to ../deliverables/generated_articles_bart.csv")

Articles generated and saved to ../deliverables/generated_articles_bart.csv
