# Crisis_recovery_Sentiment_Analysis

## Purpose

This notebook performs **AI-driven sentiment and topic enrichment**
on customer review text generated during crisis conditions.

It transforms **unstructured customer feedback**
into **structured, machine-readable intelligence**
that can be used by:
- Operations teams
- Hygiene and safety monitoring
- Crisis recovery analytics
- Machine learning feature pipelines

This notebook is the **bridge between raw text and analytical signals**.

---

## Business Context

During a crisis, customer reviews often contain:
- Early warnings of hygiene issues
- Signals of service degradation
- Emotional responses to delays and failures

Raw text cannot be directly consumed by:
- Dashboards
- Aggregations
- ML models

This notebook converts customer voice into:
- Standardized topics
- Normalized sentiment labels
- Quantitative sentiment scores

Enabling **early detection of risk before metrics degrade**.

---

## Inputs and Outputs

### Inputs (Silver Layer)

| Source Table | Purpose |
|-------------|--------|
| `silver_orders_enriched` | Customer reviews linked to orders |

---

### Outputs

| Table | Business Purpose |
|------|------------------|
| `ai_review_sentiment` | AI-enriched review intelligence |

Each record contains:
- Review topic
- Sentiment classification
- Sentiment intensity score

---

## Design Principles

- No raw text modification
- Deterministic, reproducible enrichment
- Lightweight AI suitable for batch processing
- Interpretable outputs (not black-box embeddings)
- Safe for downstream aggregation and ML use

---

In [0]:
%pip install textblob

In [0]:
dbutils.library.restartPython() 

##  1: Load Customer Review Text (Silver Layer)

### Business Problem

Customer sentiment during a crisis is primarily expressed
through **free-text reviews**, not structured metrics.

However, raw text is:
- Unstructured
- Difficult to aggregate
- Not directly usable for analytics or ML

To make sentiment actionable, reviews must first be
**isolated and contextualized at the order level**.

---

### Approach

We load customer review text from the Silver layer,
selecting only:
- `order_id` → preserves transactional context
- `review_text` → raw customer voice

This ensures sentiment signals remain:
- Traceable to real orders
- Safe for downstream enrichment
- Aligned with lakehouse layering principles


In [0]:
from pyspark.sql.functions import col

#Loading Silver data incoming text to an LLM endpoint.
df_reviews = (
    spark.table("food_delivery.silver_orders_enriched")
    .select("order_id", "review_text")
)

display(df_reviews.select("review_text").distinct())


%md
## 2: AI-Based Topic & Sentiment Extraction

### Business Problem

Raw review text cannot be directly consumed by:
- Dashboards
- Aggregations
- Machine learning models

Manual interpretation does not scale during crisis periods
when review volume and urgency increase.

---

### Approach

We define a **vectorized Pandas UDF** that simulates
LLM-style intent detection and sentiment analysis by:

1. **Topic Classification**
   - Hygiene-related risk
   - Service quality issues
   - Explicit positive feedback
   - Other / uncategorized feedback

2. **Sentiment Scoring**
   - Uses polarity scoring to quantify emotion
   - Normalizes output into Positive / Neutral / Negative
   - Produces a continuous sentiment intensity score

The UDF outputs a **structured schema** suitable for Spark:
- `ai_topic`
- `ai_sentiment`
- `sentiment_score`


In [0]:
from pyspark.sql.functions import pandas_udf
import pandas as pd
from textblob import TextBlob
from typing import Iterator

@pandas_udf(
    "struct<ai_topic:string, ai_sentiment:string, sentiment_score:double>"
)
def analyze_review_llm(texts: Iterator[pd.Series]) -> Iterator[pd.DataFrame]:
    # Process each batch of review texts
    for text_series in texts:
        results = []
        for text in text_series:
            if text is None:
                results.append(("Other", "Neutral", 0.0))
                continue
            txt = str(text).lower()
            # Topic detection
            if any(word in txt for word in ["sick", "undercooked", "smell", "poison"]):
                topic = "Hygiene"
            elif any(word in txt for word in ["late", "delay", "cold", "wait"]):
                topic = "Service"
            elif any(word in txt for word in ["great", "delicious", "loved"]):
                topic = "Positive"
            else:
                topic = "Other"
            # Sentiment analysis
            polarity = TextBlob(txt).sentiment.polarity
            if polarity > 0.2:
                sentiment = "Positive"
            elif polarity < -0.2:
                sentiment = "Negative"
            else:
                sentiment = "Neutral"
            results.append((topic, sentiment, float(polarity)))
        # Return results as DataFrame
        yield pd.DataFrame(
            results, 
            columns=["ai_topic", "ai_sentiment", "sentiment_score"]
        )

## 3: Apply AI Enrichment to Reviews

### Business Problem

AI outputs must be:
- Columnar
- Explicit
- Queryable

Nested or opaque AI responses
prevent downstream analytics and ML usage.

---

### Approach

We apply the AI enrichment UDF to each review and:
- Attach results as a structured column
- Flatten AI outputs into individual fields

This converts unstructured text into
**analytics-ready sentiment signals** at the order level.

Each review is now represented by:
- Topic classification
- Sentiment label
- Sentiment intensity score


In [0]:
# Apply AI enrichment UDF to review text and flatten results
df_ai_enriched = (
    df_reviews
    .withColumn("ai_result", analyze_review_llm(col("review_text")))
    .select(
        "order_id",
        col("ai_result.ai_topic").alias("ai_topic"),
        col("ai_result.ai_sentiment").alias("ai_sentiment"),
        col("ai_result.sentiment_score").alias("sentiment_score")
    )
)

display(df_ai_enriched)

%md
## 4: Sentiment Distribution Validation

### Business Problem

Before persisting AI-enriched data,
we must validate that sentiment outputs are:
- Reasonable
- Balanced
- Aligned with crisis expectations

Skewed or implausible distributions
can indicate enrichment logic issues.

---

### Approach

We segment enriched reviews into:
- Positive
- Negative
- Neutral

Basic count checks provide:
- Sanity validation
- Early detection of misclassification
- Confidence in downstream usage

This step acts as a **lightweight quality gate**
for AI enrichment.


In [0]:
df_positive = df_ai_enriched.filter(col("ai_sentiment") == "Positive")

# 2. Filter for Negative Sentiment
df_negative = df_ai_enriched.filter(col("ai_sentiment") == "Negative")

# 3. Filter for Neutral Sentiment
df_neutral = df_ai_enriched.filter(col("ai_sentiment") == "Neutral")

print(f"Positive count: {df_positive.count()}")
print(f"Negative count: {df_negative.count()}")
print(f"Neutral count:  {df_neutral.count()}")

## 5: Persist AI-Enriched Sentiment Table

### Business Problem

AI enrichment is computationally expensive
and should not be recomputed repeatedly.

Re-running sentiment analysis:
- Increases cost
- Slows pipelines
- Risks inconsistent outputs

---

### Approach

We persist the enriched sentiment data
as a Delta table:

- Created once if it does not exist
- Skipped if already present to ensure idempotency

This table becomes the **single source of truth**
for AI-derived sentiment signals across the platform.


In [0]:
# Persist AI-enriched sentiment table if not already present
if not spark.catalog.tableExists("food_delivery.ai_review_sentiment"):
    df_ai_enriched.write \
        .format("delta") \
        .mode("overwrite") \
        .saveAsTable("food_delivery.ai_review_sentiment")
else:
    # Skip creation if table exists
    print("ai review sentiment table already exists → skipping creation")

%md
## 6: Hygiene Risk Signal Inspection

### Business Problem

Hygiene-related feedback represents
**high-severity operational and reputational risk**
during crisis periods.

These signals must be:
- Easily discoverable
- Quickly reviewable
- Actionable by safety teams

---

### Approach

We filter the AI-enriched sentiment table
to surface reviews classified under the **Hygiene** topic.

This enables:
- Rapid investigation
- Escalation workflows
- Early intervention before incidents escalate

This view functions as an **early warning system**
for food safety and hygiene risk.


In [0]:
# Display reviews flagged for hygiene risk
display(
    spark.table("food_delivery.ai_review_sentiment")
    .filter(col("ai_topic") == "Hygiene")
)

## Summary

This notebook establishes an **AI-powered sentiment intelligence layer**
for crisis recovery by:

- Translating unstructured customer reviews into structured signals
- Classifying operational risk topics (hygiene, service, etc.)
- Quantifying customer emotion using sentiment scores
- Persisting AI outputs for reuse across analytics and ML

It ensures customer voice is:
- Measurable
- Interpretable
- Actionable

Not just heard — but **operationalized**.


## Downstream Dependencies

The Sentiment Analysis layer feeds:

### Gold Analytics Tables
- Crisis KPI aggregation
- Hygiene incident tracking
- Service quality dashboards

### ML Feature Engineering
- Sentiment velocity
- Crisis exposure indicators
- Churn risk feature construction

### Crisis Operations & Safety Systems
- Hygiene escalation workflows
- Store-level risk investigation
- Quality assurance monitoring


Any failure in this layer directly impacts:
- Crisis detection speed
- Model accuracy
- Operational response quality

This is why sentiment enrichment must remain
**precise, interpretable, and reproducible**.