# JioMart Tier 2/3 Cities Expansion Analysis
## End-to-End Data Science Portfolio Project

---

### Business Context
**JioMart**, the digital commerce arm of Reliance Retail Ventures Ltd., is aggressively scaling operations into non-metro Tier 2 and Tier 3 cities across India, expanding to 5,000+ pin codes and 3,000+ stores.

### Problem Statement
How can JioMart optimize its expansion strategy into Tier 2/3 cities to improve profitability, reduce logistics and inventory costs, and enhance customer retention?

### Hypothesis
Margin erosion and lower repeat purchase rates in Tier 2/3 cities are driven by:
1. Higher logistics and last-mile costs
2. Product assortment mismatch
3. Weaker customer loyalty
4. Insufficient infrastructure leading to spoilage

---

## 1. Import Libraries

In [None]:
# Data Processing
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestClassifier, GradientBoostingRegressor
from sklearn.cluster import KMeans
from sklearn.metrics import classification_report, silhouette_score, mean_absolute_error, r2_score

# Configuration
np.random.seed(42)
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
%matplotlib inline

print("✅ Libraries imported successfully")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

## 2. Load Data

Load the pre-generated datasets from the complete analysis pipeline.

In [None]:
# Load datasets
transactions_df = pd.read_csv('data/transactions.csv')
customers_df = pd.read_csv('data/customers.csv')
stores_df = pd.read_csv('data/stores.csv')
products_df = pd.read_csv('data/products.csv')
inventory_df = pd.read_csv('data/inventory.csv')

print("✅ Data loaded successfully\n")
print(f"Transactions: {len(transactions_df):,} records")
print(f"Customers: {len(customers_df):,} records")
print(f"Stores: {len(stores_df):,} records")
print(f"Products: {len(products_df):,} records")
print(f"Inventory: {len(inventory_df):,} records")

## 3. Exploratory Data Analysis

### 3.1 Data Overview

In [None]:
# Transaction data overview
print("=" * 80)
print("TRANSACTION DATA OVERVIEW")
print("=" * 80)
transactions_df.head(10)

In [None]:
# Data types and missing values
print("\nData Info:")
print(transactions_df.info())

print("\nMissing Values:")
print(transactions_df.isnull().sum())

print("\nBasic Statistics:")
transactions_df.describe()

### 3.2 Regional Performance Analysis

In [None]:
# Regional performance summary
regional_perf = transactions_df.groupby('region_tier').agg({
    'transaction_id': 'count',
    'revenue': 'sum',
    'margin': 'sum',
    'customer_id': 'nunique',
    'delivery_time_hours': 'mean',
    'delivery_distance_km': 'mean',
    'logistics_cost': 'mean',
    'spoilage_cost': 'mean'
}).round(2)

regional_perf.columns = ['Transactions', 'Total Revenue (₹)', 'Total Margin (₹)', 'Unique Customers', 
                          'Avg Delivery Time (hrs)', 'Avg Delivery Distance (km)', 
                          'Avg Logistics Cost (₹)', 'Avg Spoilage Cost (₹)']
regional_perf['Margin %'] = ((regional_perf['Total Margin (₹)'] / regional_perf['Total Revenue (₹)']) * 100).round(2)
regional_perf['Revenue per Customer (₹)'] = (regional_perf['Total Revenue (₹)'] / regional_perf['Unique Customers']).round(2)

print("\n" + "=" * 80)
print("REGIONAL PERFORMANCE SUMMARY")
print("=" * 80)
regional_perf

### 3.3 Visualization: Regional Performance Dashboard

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(16, 10))

# Revenue by region
revenue_data = regional_perf['Total Revenue (₹)'] / 1_000_000
axes[0, 0].bar(revenue_data.index, revenue_data.values, color=['#1f77b4', '#ff7f0e', '#2ca02c'])
axes[0, 0].set_title('Total Revenue by Region (₹ Millions)', fontsize=14, fontweight='bold')
axes[0, 0].set_ylabel('Revenue (₹M)')
for i, v in enumerate(revenue_data.values):
    axes[0, 0].text(i, v + 1, f'₹{v:.1f}M', ha='center', fontweight='bold')

# Margin %
margin_data = regional_perf['Margin %']
axes[0, 1].bar(margin_data.index, margin_data.values, color=['#d62728', '#9467bd', '#8c564b'])
axes[0, 1].set_title('Profit Margin % by Region', fontsize=14, fontweight='bold')
axes[0, 1].set_ylabel('Margin %')
axes[0, 1].axhline(y=margin_data.mean(), color='red', linestyle='--', label='Average')
axes[0, 1].legend()
for i, v in enumerate(margin_data.values):
    axes[0, 1].text(i, v + 0.5, f'{v:.1f}%', ha='center', fontweight='bold')

# Delivery Time
delivery_data = regional_perf['Avg Delivery Time (hrs)']
axes[1, 0].bar(delivery_data.index, delivery_data.values, color=['#e377c2', '#7f7f7f', '#bcbd22'])
axes[1, 0].set_title('Avg Delivery Time by Region', fontsize=14, fontweight='bold')
axes[1, 0].set_ylabel('Hours')
for i, v in enumerate(delivery_data.values):
    axes[1, 0].text(i, v + 0.3, f'{v:.1f}h', ha='center', fontweight='bold')

# Logistics Cost
logistics_data = regional_perf['Avg Logistics Cost (₹)']
axes[1, 1].bar(logistics_data.index, logistics_data.values, color=['#17becf', '#1f77b4', '#ff7f0e'])
axes[1, 1].set_title('Avg Logistics Cost by Region', fontsize=14, fontweight='bold')
axes[1, 1].set_ylabel('Cost (₹)')
for i, v in enumerate(logistics_data.values):
    axes[1, 1].text(i, v + 3, f'₹{v:.0f}', ha='center', fontweight='bold')

plt.tight_layout()
plt.show()

### 3.4 Customer Behavior Analysis

In [None]:
# Customer purchase patterns
customer_behavior = transactions_df.groupby(['customer_id', 'region_tier']).agg({
    'transaction_id': 'count',
    'revenue': 'sum',
    'margin': 'sum'
}).reset_index()
customer_behavior.columns = ['customer_id', 'region_tier', 'purchase_count', 'total_revenue', 'total_margin']

behavior_summary = customer_behavior.groupby('region_tier').agg({
    'purchase_count': 'mean',
    'total_revenue': 'mean',
    'customer_id': 'count'
}).round(2)
behavior_summary.columns = ['Avg Purchases per Customer', 'Avg Revenue per Customer (₹)', 'Total Customers']

# Repeat purchase rate (3+ purchases)
repeat_customers = customer_behavior[customer_behavior['purchase_count'] >= 3].groupby('region_tier').size()
total_customers = customer_behavior.groupby('region_tier').size()
behavior_summary['Repeat Rate (%)'] = ((repeat_customers / total_customers) * 100).round(2)

print("\n" + "=" * 80)
print("CUSTOMER BEHAVIOR SUMMARY")
print("=" * 80)
behavior_summary

In [None]:
# Visualize customer behavior
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Purchase frequency distribution
for tier in ['Metro', 'Tier 2', 'Tier 3']:
    tier_cust = customer_behavior[customer_behavior['region_tier'] == tier]['purchase_count']
    axes[0].hist(tier_cust, bins=20, alpha=0.6, label=tier)
axes[0].set_title('Purchase Frequency Distribution', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Number of Purchases')
axes[0].set_ylabel('Number of Customers')
axes[0].legend()
axes[0].set_xlim(0, 20)

# Repeat rate by region
repeat_rate_data = behavior_summary['Repeat Rate (%)']
axes[1].bar(repeat_rate_data.index, repeat_rate_data.values, color=['#2ca02c', '#ff7f0e', '#d62728'])
axes[1].set_title('Customer Repeat Purchase Rate (3+ orders)', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Repeat Rate (%)')
for i, v in enumerate(repeat_rate_data.values):
    axes[1].text(i, v + 1, f'{v:.1f}%', ha='center', fontweight='bold')

plt.tight_layout()
plt.show()

### 3.5 Category Performance Analysis

In [None]:
# Merge with product data
txn_with_product = transactions_df.merge(products_df[['product_id', 'category', 'product_name']], on='product_id')

# Category performance by region
category_perf = txn_with_product.groupby(['region_tier', 'category']).agg({
    'revenue': 'sum',
    'margin': 'sum',
    'transaction_id': 'count'
}).reset_index()
category_perf['margin_pct'] = ((category_perf['margin'] / category_perf['revenue']) * 100).round(2)

print("\n" + "=" * 80)
print("TOP CATEGORIES BY REGION")
print("=" * 80)
category_perf.sort_values(['region_tier', 'revenue'], ascending=[True, False]).head(15)

In [None]:
# Visualize category performance
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Category revenue by tier
category_revenue = txn_with_product.pivot_table(values='revenue', index='category', columns='region_tier', aggfunc='sum') / 1000
category_revenue.plot(kind='barh', ax=axes[0], color=['#1f77b4', '#ff7f0e', '#2ca02c'])
axes[0].set_title('Revenue by Category & Region (₹ Thousands)', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Revenue (₹K)')
axes[0].legend(title='Region')

# Category margin %
category_margins = category_perf.pivot_table(values='margin_pct', index='category', columns='region_tier')
category_margins.plot(kind='bar', ax=axes[1], color=['#d62728', '#9467bd', '#8c564b'])
axes[1].set_title('Margin % by Category & Region', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Margin %')
axes[1].set_xticklabels(axes[1].get_xticklabels(), rotation=45, ha='right')
axes[1].legend(title='Region')

plt.tight_layout()
plt.show()

### 3.6 Logistics Analysis

In [None]:
# Cost breakdown by region
logistics_breakdown = transactions_df.groupby('region_tier').agg({
    'product_cost': 'mean',
    'logistics_cost': 'mean',
    'spoilage_cost': 'mean',
    'total_cost': 'mean'
}).round(2)

print("\n" + "=" * 80)
print("COST BREAKDOWN BY REGION")
print("=" * 80)
logistics_breakdown

In [None]:
# Visualize logistics analysis
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Cost breakdown
logistics_breakdown[['product_cost', 'logistics_cost', 'spoilage_cost']].plot(
    kind='bar', stacked=True, ax=axes[0], color=['#8c564b', '#e377c2', '#d62728']
)
axes[0].set_title('Cost Breakdown by Region', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Avg Cost (₹)')
axes[0].set_xticklabels(axes[0].get_xticklabels(), rotation=0)
axes[0].legend(['Product Cost', 'Logistics Cost', 'Spoilage Cost'])

# Delivery time vs distance scatter
for tier, color in [('Metro', 'blue'), ('Tier 2', 'orange'), ('Tier 3', 'green')]:
    data = transactions_df[transactions_df['region_tier'] == tier].sample(min(1000, len(transactions_df[transactions_df['region_tier'] == tier])))
    axes[1].scatter(data['delivery_distance_km'], data['delivery_time_hours'], alpha=0.3, s=10, label=tier, color=color)
axes[1].set_title('Delivery Time vs Distance', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Distance (km)')
axes[1].set_ylabel('Time (hours)')
axes[1].legend()

plt.tight_layout()
plt.show()

## 4. Machine Learning Models

### 4.1 Model 1: Margin Risk Classification

In [None]:
print("=" * 80)
print("MODEL 1: MARGIN RISK CLASSIFICATION (Random Forest)")
print("=" * 80)

# Aggregate store performance
store_perf = transactions_df.groupby('store_id').agg({
    'revenue': 'sum',
    'margin': 'sum',
    'margin_pct': 'mean',
    'logistics_cost': 'mean',
    'spoilage_cost': 'mean',
    'delivery_time_hours': 'mean',
    'is_perishable': 'mean',
    'transaction_id': 'count'
}).reset_index()

store_perf = store_perf.merge(stores_df[['store_id', 'region_tier', 'infrastructure_score', 'warehouse_distance_km']], on='store_id')

# Define high risk: margin % < 10%
store_perf['high_risk'] = (store_perf['margin_pct'] < 10).astype(int)

print(f"\nHigh Risk Stores: {store_perf['high_risk'].sum()} / {len(store_perf)}")
print(f"Low Risk Stores: {(1 - store_perf['high_risk']).sum()} / {len(store_perf)}")

# Prepare features
le = LabelEncoder()
store_perf['region_code'] = le.fit_transform(store_perf['region_tier'])

X = store_perf[['revenue', 'logistics_cost', 'spoilage_cost', 'delivery_time_hours', 
                'is_perishable', 'infrastructure_score', 'warehouse_distance_km', 
                'transaction_id', 'region_code']]
y = store_perf['high_risk']

# Train-test split
if len(y.unique()) > 1:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)
    
    # Scale features
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Train Random Forest
    rf_model = RandomForestClassifier(n_estimators=100, max_depth=8, random_state=42)
    rf_model.fit(X_train_scaled, y_train)
    
    y_pred = rf_model.predict(X_test_scaled)
    
    print("\nClassification Report:")
    print(classification_report(y_test, y_pred, target_names=['Low Risk', 'High Risk']))
    
    # Feature importance
    feature_importance = pd.DataFrame({
        'feature': X.columns,
        'importance': rf_model.feature_importances_
    }).sort_values('importance', ascending=False)
    
    print("\nTop 5 Feature Importances:")
    print(feature_importance.head())
    
    # Visualize
    plt.figure(figsize=(10, 6))
    plt.barh(feature_importance['feature'][:8], feature_importance['importance'][:8])
    plt.xlabel('Importance')
    plt.title('Top Features for Margin Risk Prediction', fontsize=14, fontweight='bold')
    plt.gca().invert_yaxis()
    plt.tight_layout()
    plt.show()
else:
    print("\n⚠ Insufficient class diversity for training")

### 4.2 Model 2: Customer Lifetime Value Prediction

In [None]:
print("=" * 80)
print("MODEL 2: CUSTOMER LIFETIME VALUE PREDICTION (Gradient Boosting)")
print("=" * 80)

# Prepare customer features
cust_features = customer_behavior.merge(customers_df[['customer_id', 'age', 'income_bracket', 
                                                       'digital_literacy_score', 'registration_date']], 
                                        on='customer_id')
cust_features['registration_date'] = pd.to_datetime(cust_features['registration_date'])
cust_features['days_since_registration'] = (datetime(2024, 9, 30) - cust_features['registration_date']).dt.days

# Encode income bracket
income_mapping = {'10-15K': 1, '15-25K': 2, '25-50K': 3, '50-75K': 4, '75K+': 5}
cust_features['income_code'] = cust_features['income_bracket'].map(income_mapping)
cust_features['income_code'] = cust_features['income_code'].fillna(2)  # Default to mid-range

# Features and target
X_clv = cust_features[['purchase_count', 'age', 'income_code', 'digital_literacy_score', 'days_since_registration']]
X_clv['region_code'] = le.transform(cust_features['region_tier'])
y_clv = cust_features['total_revenue']

# Remove any NaN values
mask = ~(X_clv.isna().any(axis=1) | y_clv.isna())
X_clv = X_clv[mask]
y_clv = y_clv[mask]

# Train-test split
X_train_clv, X_test_clv, y_train_clv, y_test_clv = train_test_split(X_clv, y_clv, test_size=0.25, random_state=42)

# Scale
scaler_clv = StandardScaler()
X_train_clv_scaled = scaler_clv.fit_transform(X_train_clv)
X_test_clv_scaled = scaler_clv.transform(X_test_clv)

# Train Gradient Boosting
gb_model = GradientBoostingRegressor(n_estimators=100, max_depth=5, learning_rate=0.1, random_state=42)
gb_model.fit(X_train_clv_scaled, y_train_clv)

y_pred_clv = gb_model.predict(X_test_clv_scaled)

mae = mean_absolute_error(y_test_clv, y_pred_clv)
r2 = r2_score(y_test_clv, y_pred_clv)

print(f"\nModel Performance:")
print(f"  MAE: ₹{mae:.2f}")
print(f"  R² Score: {r2:.4f}")

# Visualize
plt.figure(figsize=(10, 6))
plt.scatter(y_test_clv, y_pred_clv, alpha=0.5, s=20)
plt.plot([y_test_clv.min(), y_test_clv.max()], [y_test_clv.min(), y_test_clv.max()], 'r--', lw=2)
plt.xlabel('Actual Revenue (₹)')
plt.ylabel('Predicted Revenue (₹)')
plt.title('Customer Lifetime Value: Predicted vs Actual', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

### 4.3 Model 3: Customer Segmentation (K-Means Clustering)

In [None]:
print("=" * 80)
print("MODEL 3: CUSTOMER SEGMENTATION (K-Means Clustering)")
print("=" * 80)

# Prepare clustering features
cluster_data = cust_features[['purchase_count', 'total_revenue', 'total_margin', 
                               'age', 'income_code', 'digital_literacy_score']].copy()

# Remove NaN
cluster_data = cluster_data.dropna()

# Scale for clustering
scaler_cluster = StandardScaler()
cluster_scaled = scaler_cluster.fit_transform(cluster_data)

# Find optimal k using silhouette score
silhouette_scores = []
K_range = range(2, 8)

for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
    labels = kmeans.fit_predict(cluster_scaled)
    silhouette_scores.append(silhouette_score(cluster_scaled, labels))

optimal_k = K_range[np.argmax(silhouette_scores)]
print(f"\nOptimal number of clusters: {optimal_k}")
print(f"Best silhouette score: {max(silhouette_scores):.4f}")

# Train final model
kmeans_final = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)
cluster_labels = kmeans_final.fit_predict(cluster_scaled)

# Add cluster labels back to original data
cluster_data_with_labels = cluster_data.copy()
cluster_data_with_labels['cluster'] = cluster_labels

# Cluster profiles
cluster_profiles = cluster_data_with_labels.groupby('cluster').agg({
    'purchase_count': ['count', 'mean'],
    'total_revenue': 'mean',
    'total_margin': 'mean',
    'age': 'mean',
    'income_code': 'mean',
    'digital_literacy_score': 'mean'
}).round(2)

print("\nCluster Profiles:")
print(cluster_profiles)

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Cluster distribution
cluster_counts = pd.Series(cluster_labels).value_counts().sort_index()
axes[0].pie(cluster_counts.values, labels=[f'Cluster {i}' for i in cluster_counts.index], 
            autopct='%1.1f%%', startangle=90)
axes[0].set_title('Customer Segment Distribution', fontsize=12, fontweight='bold')

# Revenue by cluster
cluster_revenue = cluster_data_with_labels.groupby('cluster')['total_revenue'].sum() / 1000
axes[1].bar(cluster_revenue.index, cluster_revenue.values, color=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728'])
axes[1].set_title('Total Revenue by Cluster (₹ Thousands)', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Revenue (₹K)')
axes[1].set_xlabel('Cluster')

plt.tight_layout()
plt.show()

## 5. Key Insights & Recommendations

### 5.1 Critical Findings

In [None]:
print("=" * 80)
print("KEY BUSINESS INSIGHTS")
print("=" * 80)

# Calculate key insights
metro_margin = regional_perf.loc['Metro', 'Margin %']
tier3_margin = regional_perf.loc['Tier 3', 'Margin %']
margin_gap = metro_margin - tier3_margin

metro_logistics = regional_perf.loc['Metro', 'Avg Logistics Cost (₹)']
tier3_logistics = regional_perf.loc['Tier 3', 'Avg Logistics Cost (₹)']
logistics_increase = ((tier3_logistics - metro_logistics) / metro_logistics) * 100

metro_delivery = regional_perf.loc['Metro', 'Avg Delivery Time (hrs)']
tier3_delivery = regional_perf.loc['Tier 3', 'Avg Delivery Time (hrs)']
time_increase = tier3_delivery - metro_delivery

tier2_customers = regional_perf.loc['Tier 2', 'Unique Customers']
tier3_customers = regional_perf.loc['Tier 3', 'Unique Customers']
total_untapped = tier2_customers + tier3_customers

print(f"\n1. 📊 MARGIN GAP:")
print(f"   Tier 3 cities have {margin_gap:.1f}% lower margins than Metro")
print(f"   Metro: {metro_margin:.2f}% vs Tier 3: {tier3_margin:.2f}%")

print(f"\n2. 🚚 LOGISTICS CHALLENGE:")
print(f"   Tier 3 logistics costs are {logistics_increase:.0f}% higher than Metro")
print(f"   Metro: ₹{metro_logistics:.2f} vs Tier 3: ₹{tier3_logistics:.2f}")

print(f"\n3. ⏱️ DELIVERY DELAY:")
print(f"   Tier 3 deliveries take {time_increase:.1f} hours longer than Metro")
print(f"   Metro: {metro_delivery:.1f}h vs Tier 3: {tier3_delivery:.1f}h")

print(f"\n4. 💰 GROWTH POTENTIAL:")
print(f"   {total_untapped:,.0f} customers in Tier 2/3 represent untapped revenue")
print(f"   Tier 2: {tier2_customers:,.0f} | Tier 3: {tier3_customers:,.0f}")

### 5.2 Strategic Recommendations

In [None]:
print("\n" + "=" * 80)
print("STRATEGIC RECOMMENDATIONS")
print("=" * 80)

recommendations = {
    "1. Optimize Last-Mile Logistics": [
        "• Establish micro-fulfillment centers in Tier 2/3 city clusters",
        "• Partner with local logistics providers familiar with regional terrain",
        "• Implement hub-and-spoke distribution model"
    ],
    "2. Improve Inventory Management": [
        "• Deploy predictive analytics for demand forecasting",
        "• Reduce perishable inventory in Tier 3 stores initially",
        "• Implement FIFO strictly for fresh produce"
    ],
    "3. Tailor Product Assortment": [
        "• Focus on non-perishables for Tier 3 (Groceries, Home Care)",
        "• Gradually introduce premium products based on digital literacy",
        "• Create region-specific bundles"
    ],
    "4. Enhance Customer Retention": [
        "• Launch loyalty programs with tier-based rewards",
        "• Offer free delivery for repeat customers in Tier 2/3",
        "• Use targeted WhatsApp marketing"
    ],
    "5. Technology Deployment": [
        "• Deploy route optimization software",
        "• Implement IoT sensors for cold chain monitoring",
        "• Use AI chatbots for customer support"
    ]
}

for key, actions in recommendations.items():
    print(f"\n{key}:")
    for action in actions:
        print(f"  {action}")

## 6. Summary & Conclusion

In [None]:
print("\n" + "=" * 80)
print("PROJECT SUMMARY")
print("=" * 80)

print(f"\n📊 Key Metrics:")
print(f"  • Total Transactions: {len(transactions_df):,}")
print(f"  • Total Revenue: ₹{transactions_df['revenue'].sum()/1_000_000:.2f}M")
print(f"  • Total Margin: ₹{transactions_df['margin'].sum()/1_000_000:.2f}M")
print(f"  • Unique Customers: {transactions_df['customer_id'].nunique():,}")
print(f"  • Active Stores: {len(stores_df)}")
print(f"  • Product SKUs: {len(products_df)}")

print(f"\n🎯 Models Trained:")
print(f"  1. Random Forest - Margin Risk Classification")
print(f"  2. Gradient Boosting - Customer Lifetime Value Prediction")
print(f"  3. K-Means Clustering - Customer Segmentation")

print(f"\n✅ Analysis Complete!")
print("=" * 80)

---

## End of Analysis

**Project:** JioMart Tier 2/3 Cities Expansion Analysis  
**Author:** Data Science Portfolio  
**Date:** October 2024

This notebook demonstrates end-to-end data science capabilities including:
- Data analysis and exploration
- Statistical insights and business intelligence
- Machine learning model development
- Visualization and storytelling
- Strategic recommendation generation

---