# Deployment of NLP Sentiment Analysis & Summarization System

##  Project Overview
This document describes the deployment stage of a complete NLP pipeline that performs:
- **Review Summarization**
- **Sentiment Classification**
- **Complaint Highlighting**
- **Worst Product Identification**

The deployment was implemented using **Gradio** to create a user-friendly web interface for analyzing customer reviews from CSV files.

![Deployment](https://example.com/deployment-diagram.png) *Replace with your interface screenshot or diagram*

---

## Deployment Tools & Technologies
| Tool      | Purpose                         |
|-----------|----------------------------------|
| Gradio    | Building interactive web UI     |
| Transformers | Load BERT-based summarization & classification models |
| Pandas    | Data preprocessing              |
| Python    | Backend logic and file handling |

---

##  Deployment Workflow

### 1. Interface Design with Gradio
- File upload for **CSV review data**
- Dropdown menu to select **product category**
- Display area for formatted analysis text

### 2. Model Integration
- **Summarization** model: `facebook/bart-large-cnn`
- **Sentiment classifier**: Custom fine-tuned BERT model with 2 classes (`POSITIVE`, `NEGATIVE`)
- Optional: placeholder for `NEUTRAL` for future extension

### 3. Review Processing Pipeline
For each product:
- Aggregate top reviews
- Summarize reviews into readable narrative
- Extract negative comments as complaints
- Analyze sentiment distribution

### 4. Output Generation
- Display **Top 3 Products** in storytelling format
- Present **Sentiment Scores** per product
- Highlight **Worst Product**
- Include **Average Rating** of selected category

---

##  Gradio Interface Example

```python
gr.Interface(
    fn=analyze_uploaded_file,
    inputs=[
        gr.File(label="Upload CSV File"),
        gr.Dropdown(choices=categories, label="Select a Category")
    ],
    outputs="text",
    title="Product Review Analyzer",
    description="Upload a CSV file containing product reviews and choose a category to analyze."
).launch()



In [3]:
pip install gradio transformers torch

Note: you may need to restart the kernel to use updated packages.


##  Importing Libraries

In [4]:
import gradio as gr
import pandas as pd
from transformers import pipeline


# Load models

In [None]:
# Load models
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
sentiment_classifier = pipeline("sentiment-analysis", model="saved_model_pytorch", tokenizer="saved_model_pytorch")

#  Summarization Function

In [9]:
# Summarization function
def summarize_reviews(reviews):
    combined = " ".join(reviews[:10])[:1024]  # Join and truncate the first 10 reviews
    summary = summarizer(combined, max_length=150, min_length=30, do_sample=False)  # Generate summary
    return summary[0]['summary_text']  # Return the summarized text

# Sentiment Classification Function

In [10]:

# Sentiment classification function
def classify_reviews(reviews):
    sample_reviews = reviews[:5]  # Take the first 5 reviews for faster evaluation
    sentiments = sentiment_classifier(sample_reviews)  # Get predicted labels

    sentiment_counts = {"POSITIVE": 0, "NEGATIVE": 0, "NEUTRAL": 0}  # Initialize sentiment counters
    for sentiment in sentiments:
        label = sentiment['label'].upper()
        if label in sentiment_counts:
            sentiment_counts[label] += 1

    total = sum(sentiment_counts.values())
    if total == 0:
        return "No sentiments detected."

    # Compute sentiment distribution ratios
    pos_ratio = sentiment_counts["POSITIVE"] / total * 100
    neg_ratio = sentiment_counts["NEGATIVE"] / total * 100
    neu_ratio = sentiment_counts["NEUTRAL"] / total * 100

    return f"Positive: {pos_ratio:.1f}%, Neutral: {neu_ratio:.1f}%, Negative: {neg_ratio:.1f}%"


# Clean DataFrame Function

In [11]:
# Clean DataFrame
def clean_dataframe(df):
    df = df[['Cluster_Category', 'name', 'reviews.rating', 'reviews.text']].dropna()  # Keep only relevant columns and drop missing
    df = df[df['reviews.rating'].astype(str).str.isnumeric()]  # Ensure ratings are numeric
    df['reviews.rating'] = df['reviews.rating'].astype(float)  # Convert rating to float

    # Standardize some category labels
    df.loc[df['Cluster_Category'].str.contains('supplies', case=False, na=False), 'Cluster_Category'] = 'Supplies'
    df.loc[df['Cluster_Category'].str.contains('beauty', case=False, na=False), 'Cluster_Category'] = 'Health'
    return df


# Top Products Analysis

In [12]:
# Main analysis function
def get_top_products_by_category(df, category):
    cat_df = df[df['Cluster_Category'] == category]  # Filter rows by category
    if cat_df.empty:
        return None, None, "No reviews available for this category."

    grouped = cat_df.groupby('name').agg({
        'reviews.rating': 'mean',  # Compute average rating
        'reviews.text': lambda x: list(x)  # Aggregate review texts
    }).reset_index()

    top3 = grouped.sort_values(by='reviews.rating', ascending=False).head(3)  # Top 3 products
    worst = grouped.sort_values(by='reviews.rating', ascending=True).head(1)  # Worst product

    top_summaries = []
    for _, row in top3.iterrows():
        name = row['name']
        reviews = row['reviews.text']
        summary = summarize_reviews(reviews)  # Summary of top reviews
        complaints = summarize_reviews([r for r in reviews if "not" in r or "bad" in r or "problem" in r])  # Complaint summary
        sentiment_summary = classify_reviews(reviews)  # Sentiment classification
        top_summaries.append((name, summary, complaints, sentiment_summary))

    worst_name = worst.iloc[0]['name']
    worst_reviews = worst.iloc[0]['reviews.text']
    worst_summary = summarize_reviews(worst_reviews)

    return top_summaries, worst_name, worst_summary


# Gradio App Logic

In [13]:

# Gradio interface handler
def analyze_uploaded_file(file, category):
    try:
        df = pd.read_csv(file.name)
        df = clean_dataframe(df)
    except Exception as e:
        return f"Error reading file: {e}"

    top_summaries, worst_name, worst_summary = get_top_products_by_category(df, category)
    if top_summaries is None:
        return worst_summary

    result = f"\nBlog-Style Article for Category: {category}\n"
    result += "\nHere's a breakdown of the top 3 products in a storytelling format:\n"

    order_words = ["First", "Second", "Third"]
    for i, (name, summary, complaints, sentiment) in enumerate(top_summaries):
        result += f"""\n{order_words[i]}, we examined the product titled: \"{name}\".\n"""
        result += f"""Here's what customers appreciated about it:\n{summary}\n"""
        result += f"""On the other hand, some users highlighted concerns such as:\n{complaints}\n"""
        result += f"""Based on the sentiment analysis, we found the following:\n{sentiment}\n"""
        result += "-" * 60 + "\n"

    result += f"\nWorst Product Based on Reviews: {worst_name}\n"
    result += f"Why should it be avoided?\n{worst_summary}\n"

    avg_rating = df[df['Cluster_Category'] == category]['reviews.rating'].mean()
    result += f"\nAverage Rating for {category}: {avg_rating:.2f}"

    return result

# Gradio Interface Launch

In [15]:
# Categories
categories = ["Electronics", "Supplies", "H Electronics", "Tablets", "Batteries", "Computer Accessories"]

# Gradio UI
interface = gr.Interface(
    fn=analyze_uploaded_file,
    inputs=[
        gr.File(label="Upload CSV File"),  # Upload CSV with reviews
        gr.Dropdown(choices=categories, label="Select a Category")  # Category selection dropdown
    ],
    outputs="text",  # Plain text output
    title=" Product Review Analyzer with CSV Upload",
    description="Upload a CSV file containing product reviews and choose a category to analyze."
)

interface.launch(share=False)  # Start the interface


* Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.




Your max_length is set to 150, but your input_length is only 25. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=12)
Your max_length is set to 150, but your input_length is only 3. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=1)
Your max_length is set to 150, but your input_length is only 137. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=68)
Your max_length is set to 150, but your input_length is only 76. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=38)
Your 