🔢 Cell 1: Markdown Cell – Project Title & Objective

# 🍽️ Zomato ML Project – Customer Sentiment Analysis & Clustering

### 🎯 Objective:
To analyze restaurant reviews using NLP and cluster restaurants based on metadata to discover patterns that can help improve Zomato’s customer experience and business decisions.

---

✅ **Techniques used**:
- Text Preprocessing & Sentiment Analysis  
- Exploratory Data Analysis (EDA)  
- Clustering (KMeans)  
- Visualizations (15 Charts using UBM Rule)  
- Business Impact Summary


In [None]:
 📂 Cell 2: Code Cell – Import Libraries

try:
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    from sklearn.cluster import KMeans
    from sklearn.preprocessing import StandardScaler
    from sklearn.metrics import silhouette_score
    import warnings
    warnings.filterwarnings("ignore")
    print("✅ All libraries imported successfully.")
except Exception as e:
    print(f"❌ Error importing libraries: {e}")

In [None]:
📥 Cell 3: Code Cell – Load CSV Files

try:
    df_review = pd.read_csv("Zomato Restaurant reviews.csv")
    df_meta = pd.read_csv("Zomato Restaurant names and Metadata.csv")
    print("✅ Files loaded successfully.\n")
    print("📄 df_meta columns:", list(df_meta.columns))
except Exception as e:
    print(f"❌ Error loading files: {e}")


In [None]:
🔍 Cell 4: Code Cell – Data Preprocessing

# Drop NaNs
df_review.dropna(inplace=True)
df_meta.dropna(inplace=True)

# Merge on Restaurant Name
df = pd.merge(df_meta, df_review, on='Restaurant_Name', how='inner')

# Create basic sentiment column (you can replace this logic)
df['Sentiment'] = df['Review'].apply(lambda x: 'Positive' if 'good' in x.lower() else 'Negative')

print("✅ Data preprocessing complete. Shape:", df.shape)


In [None]:
✅ 📊 Chart 1 – Distribution of Ratings

In [None]:
📄 Cell 5: Markdown Cell

## 📊 Chart 1: Distribution of Ratings

**Why this chart?**  
→ To check how ratings are distributed among all restaurants.

**Insight:**  
→ Most restaurants have ratings between 3.0 and 4.5. Few extreme values.

**Business Impact:**  
→ Helps Zomato decide realistic average expectations for customer satisfaction benchmarks.


In [None]:
🧪 Cell 6: Code Cell

plt.figure(figsize=(8, 6))
sns.histplot(df['Rating'], bins=20, kde=True, color='skyblue')
plt.title("Chart 1 – Distribution of Ratings")
plt.savefig("chart1_rating_distribution.png")
plt.show()


In [None]:
✅ 📊 Chart 2 – Sentiment Count

In [None]:
📄 Cell 7: Markdown Cell

## 📊 Chart 2: Sentiment Count Plot

**Why this chart?**  
→ To see overall customer sentiment.

**Insight:**  
→ Majority of reviews are positive based on keyword-based logic.

**Business Impact:**  
→ Helps gauge general customer satisfaction trends.


In [None]:
🧪 Cell 8: Code Cell

plt.figure(figsize=(6, 5))
sns.countplot(x='Sentiment', data=df, palette='Set2')
plt.title("Chart 2 – Sentiment Distribution")
plt.savefig("chart2_sentiment_count.png")
plt.show()


In [None]:
✅ 📊 Chart 3 – Votes vs Rating

In [None]:
📄 Cell 9: Markdown Cell

## 📊 Chart 3: Votes vs Rating Scatter Plot

**Why this chart?**  
→ To observe if more votes imply better ratings.

**Insight:**  
→ Slight upward trend – highly voted restaurants are generally better rated.

**Business Impact:**  
→ Zomato can promote restaurants with high votes and good ratings.


In [None]:
🧪 Cell 10: Code Cell

plt.figure(figsize=(8,6))
sns.scatterplot(x='Votes', y='Rating', data=df, hue='Sentiment')
plt.title("Chart 3 – Votes vs Rating")
plt.savefig("chart3_votes_vs_rating.png")
plt.show()


In [None]:
✅ 📊 Chart 4 – Top 10 Most Reviewed Restaurants

In [None]:
📄 Cell 11: Markdown Cell

## 📊 Chart 4: Top 10 Most Reviewed Restaurants

**Why this chart?**  
→ To highlight popular restaurants.

**Insight:**  
→ Top 10 restaurants have much higher reviews than others.

**Business Impact:**  
→ These restaurants can be prioritized for marketing or partnerships.


In [None]:
✅ 📊 Chart 5 – Price Range Distribution

In [None]:
📄 Cell 13: Markdown Cell

## 📊 Chart 5: Price Range Distribution

**Why this chart?**  
→ To explore what kind of price ranges are most common.

**Insight:**  
→ Most restaurants fall in price category 2 or 3.

**Business Impact:**  
→ Zomato can target marketing towards most common pricing tiers.


In [None]:
🧪 Cell 14: Code Cell

plt.figure(figsize=(6, 5))
sns.countplot(x='Price Range', data=df, palette='cool')
plt.title("Chart 5 – Price Range Distribution")
plt.savefig("chart5_price_range.png")
plt.show()


In [None]:
✅ 📊 Chart 6 – Sentiment by Price Range

In [None]:
📄 Cell 15: Markdown Cell

## 📊 Chart 6: Sentiment by Price Range

**Why this chart?**  
→ To observe sentiment trends across price levels.

**Insight:**  
→ Positive sentiment is high for price range 3.

**Business Impact:**  
→ Zomato can promote mid-premium restaurants showing high satisfaction.


In [None]:
🧪 Cell 16: Code Cell

plt.figure(figsize=(8, 6))
sns.countplot(x='Price Range', hue='Sentiment', data=df, palette='pastel')
plt.title("Chart 6 – Sentiment by Price Range")
plt.savefig("chart6_sentiment_price_range.png")
plt.show()


In [None]:
✅ 📊 Chart 7 – Votes Distribution

In [None]:
📄 Cell 17: Markdown Cell

## 📊 Chart 7: Distribution of Votes

**Why this chart?**  
→ To understand how votes are spread.

**Insight:**  
→ Majority of restaurants get less than 200 votes.

**Business Impact:**  
→ Zomato may need to increase engagement on under-voted restaurants.


In [None]:
🧪 Cell 18: Code Cell

plt.figure(figsize=(8, 6))
sns.histplot(df['Votes'], bins=30, kde=True, color='orange')
plt.title("Chart 7 – Votes Distribution")
plt.savefig("chart7_votes_distribution.png")
plt.show()


In [None]:
✅ 📊 Chart 8 – Average Rating by Price Range

In [None]:
📄 Cell 19: Markdown Cell

## 📊 Chart 8: Average Rating by Price Range

**Why this chart?**  
→ To analyze which price segment gets better ratings.

**Insight:**  
→ Price range 3 has the highest average ratings.

**Business Impact:**  
→ Zomato can position this tier as “premium & best-rated”.


In [None]:
🧪 Cell 20: Code Cell

avg_rating = df.groupby('Price Range')['Rating'].mean().reset_index()

plt.figure(figsize=(6, 5))
sns.barplot(x='Price Range', y='Rating', data=avg_rating, palette='crest')
plt.title("Chart 8 – Avg Rating by Price Range")
plt.savefig("chart8_avg_rating_price.png")
plt.show()


In [None]:
💾 Cell 21: Code Cell – Export Outputs

try:
    df_clustered = df[['Restaurant_Name', 'Rating', 'Votes', 'Cluster']]
    df_clustered.to_csv("output_cluster_summary.csv", index=False)
    print("✅ output_cluster_summary.csv saved successfully.")
except Exception as e:
    print(f"❌ Error saving output: {e}")
