# üõí The Battle of Online Grocery Giants: Blinkit vs. JioMart vs. Swiggy Instamart  
*A Data Story on 100,000 Orders and What They Reveal About India‚Äôs Online Grocery Market*

---

## üß≠ Introduction

The online grocery race in India is heating up. Platforms like **Blinkit**, **JioMart** and **Swiggy Instamart** are competing for the top spot in convenience, delivery speed and customer satisfaction.

To understand their performance, I analyzed **100,000 real online grocery orders** across these three platforms.

### üîç Objectives
1. Discover **order trends** and popular product categories.  
2. Compare **average order values** across platforms.  
3. Examine **delivery performance** and **refund behavior**.  
4. Explore **customer satisfaction** through service ratings.  
5. Provide **actionable recommendations** for better customer experience.

---

## üì¶ Dataset Overview

The dataset contains **11 columns** covering order details, delivery times, product categories, refunds, and ratings.


In [7]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load dataset
df = pd.read_csv('/Users/dishajaiswal/Desktop/data.csv')

# Quick look at the data
df.head()

Unnamed: 0,Order ID,Customer ID,Platform,Order Date & Time,Delivery Time (Minutes),Product Category,Order Value (INR),Customer Feedback,Service Rating,Delivery Delay,Refund Requested
0,ORD000001,CUST2824,JioMart,19:29.5,30,Fruits & Vegetables,382,"Fast delivery, great service!",5,No,No
1,ORD000002,CUST1409,Blinkit,54:29.5,16,Dairy,279,Quick and reliable!,5,No,No
2,ORD000003,CUST5506,JioMart,21:29.5,25,Beverages,599,Items missing from order.,2,No,Yes
3,ORD000004,CUST5012,JioMart,19:29.5,42,Beverages,946,Items missing from order.,2,Yes,Yes
4,ORD000005,CUST4657,Blinkit,49:29.5,30,Beverages,334,"Fast delivery, great service!",5,No,No


## üìä Dataset Insights

- **Total Rows:** 100,000  
- **Total Columns:** 11  
- **Data Quality:** No missing values ‚Äî the dataset is clean and ready for analysis.  

### Numeric Columns
- Delivery Time (Minutes)  
- Order Value (INR)  
- Service Rating  

### Categorical Columns
- Platform  
- Product Category  
- Delivery Delay  
- Refund Requested  
- Customer Feedback (text)  
- Order Date & Time (timestamp)  
- Order ID / Customer ID (identifiers)

> **‚ú® The stage is set, the players are ready ‚Äî now let‚Äôs see what 100,000 orders reveal.**



### ‚ö†Ô∏è Why I Am Dropping ‚ÄúOrder Date & Time‚Äù

At first glance, the **‚ÄúOrder Date & Time‚Äù** column seems useful ‚Äî it could help analyze **when orders are placed** or identify **peak ordering hours**.  

However, this column is **unreliable**:

- The **hours sometimes go beyond 24**, which is not valid for a timestamp.  
- Minutes and seconds often have **decimal values**, which do not make sense in real time.  
- Many entries appear **corrupted or artificially generated**, so they cannot be trusted.  

üí° Because this column does not contain **accurate or meaningful information**, I am dropping it to make sure that all further analysis is based on **clean and trustworthy data**.  


In [9]:
# Dropping the unreliable "Order Date & Time" column
df = df.drop(columns=['Order Date & Time'])

## üìä Exploring the Landscape  
## Orders by Platform  

Let‚Äôs see which platform has the highest order volume.


In [None]:
platform_counts = df['Platform'].value_counts()

# Define brand colors (darker/mid shades)
platform_colors = {
    'Swiggy Instamart': '#FF8C00',  # Dark Orange
    'JioMart': '#FF0000',           # Red
    'Blinkit': '#FFD700'            # Mid Yellow
}

plt.figure(figsize=(7,5))
bars = plt.bar(platform_counts.index, platform_counts.values, color=[platform_colors[x] for x in platform_counts.index])

# Add numbers on top of each bar
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, height + 100, str(int(height)), ha='center', fontweight='bold')

plt.title('Orders by Platform')
plt.xlabel('Platform')
plt.ylabel('Number of Orders')
plt.show()



### üîπ Observation

All three platforms show similar order volumes, with **Swiggy Instamart** slightly ahead - indicating strong user engagement and logistics presence.


## üõí Orders by Product Category  

Which product types dominate the online grocery cart?


In [None]:
# Count orders by product category
category_counts = df['Product Category'].value_counts()

# Define colors (add more if you have more categories)
category_colors = ['skyblue', 'orange', 'red', 'green', 'purple', 'pink']

plt.figure(figsize=(10,6))
bars = plt.bar(category_counts.index,
               category_counts.values,
               color=category_colors[:len(category_counts)])  # ensure color list matches number of categories

plt.title('Orders by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Number of Orders')
plt.xticks(rotation=45, ha='right')

# Add value labels on top of bars
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, height + 200,
             f'{int(height)}',
             ha='center', va='bottom')

plt.tight_layout()
plt.show()


### Insights

- **Dairy** and **Grocery** are the top categories by order volume.  
- **Personal Care** has fewer orders but tends to have a higher spend per order (to be confirmed).  

üí° **Key takeaway:** Essential goods drive volume, while premium or specialized categories drive value.


### Compare Average Order Values Across Platforms

Now that we understand **order trends**, it‚Äôs important to examine **how much customers are spending on average** on each platform.  
This gives us insights into **customer behavior** and **platform profitability**.

We will calculate the **average order value (AOV)** for each platform and visualize it.


In [None]:
# Calculate average order value per platform
platform_order_values = df.groupby('Platform')['Order Value (INR)'].mean().sort_values(ascending=False)

# Display the result
platform_order_values


In [None]:
import matplotlib.pyplot as plt

# Data
platforms = ['Swiggy Instamart', 'JioMart', 'Blinkit']
avg_order_values = [592.902150, 590.526519, 589.548947]
colors = ['orange', 'red', 'yellow']  # Custom colors

# Create bar chart
fig, ax = plt.subplots(figsize=(8,5))
bars = ax.bar(platforms, avg_order_values, color=colors)

# Add average values on top of bars
for bar in bars:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2, height + 2, f'{height:.2f}', 
            ha='center', va='bottom', fontsize=10)

# Labels and title
ax.set_ylabel('Average Order Value (INR)')
ax.set_title('Average Order Value by Platform')

plt.show()


### üí° Insight: Average Order Value Across Platforms

- üüß **Swiggy Instamart** ‚Äì ‚Çπ592.90 (highest average order value)  
- üü• **JioMart** ‚Äì ‚Çπ590.53  
- üü® **Blinkit** ‚Äì ‚Çπ589.55  

**Key takeaways:**  
- The differences are minimal (~‚Çπ3), indicating that **customer spending per order is fairly consistent** across these platforms.  
- **Business implication:** Swiggy may have a slight edge in encouraging larger orders, potentially through promotions, product variety, or upselling. Small improvements in **delivery experience or loyalty incentives** could further influence customer spending.


## ‚è±Ô∏è The Race Against Time: Delivery Performance

Delivery speed can make or break the customer experience. Monitoring delivery times and identifying delays is crucial for improving satisfaction and operational efficiency.


In [None]:
plt.figure(figsize=(8,5))
plt.hist(df['Delivery Time (Minutes)'], bins=30, color='skyblue', edgecolor='black')
plt.title('Distribution of Delivery Time')
plt.xlabel('Delivery Time (Minutes)')
plt.ylabel('Number of Orders')
plt.show()


### Observation

The majority of deliveries are completed quickly, although a few outliers take longer, potentially influenced by factors such as delivery location or product type.


## üöö Delivery Delays

Even a small delay can impact customer satisfaction. Understanding the reasons behind delays helps optimize delivery routes and improve overall efficiency.


In [None]:
sns.countplot(data=df, x='Delivery Delay', hue='Delivery Delay', palette={'No':'green', 'Yes':'red'}, dodge=False, legend=False)
plt.title('Delivery Delay Distribution')
plt.xlabel('Delivery Delay')
plt.ylabel('Number of Orders')
plt.show()


### Insight

Approximately 13‚Äì14% of orders face delivery delays. While this represents a minority, these delayed orders may significantly affect customer satisfaction and brand perception.


## üí∏ Refund Requests: When Things Go Wrong

Let‚Äôs see how often customers request refunds - and whether delivery delays play a role.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

sns.countplot(
    data=df,
    x='Refund Requested',
    hue='Refund Requested',
    palette={'No':'blue', 'Yes':'orange'},
    dodge=False,
    legend=False
)

plt.title('Refund Requests Distribution')
plt.xlabel('Refund Requested')
plt.ylabel('Number of Orders')
plt.show()


**Observation:**  
Nearly half of the orders involve refund requests - surprisingly high!


## üì¶ Delivery Delay vs Refund Requested

Orders with delivery delays are more likely to result in refund requests, highlighting the financial impact of late deliveries.


In [None]:
delay_refund_ct = pd.crosstab(df['Delivery Delay'], df['Refund Requested'])
sns.heatmap(delay_refund_ct, annot=True, fmt='d', cmap='YlGnBu')
plt.title('Delivery Delay vs Refund Requested')
plt.xlabel('Refund Requested')
plt.ylabel('Delivery Delay')
plt.show()


### Key Finding

Refunds are almost equally likely with or without a delivery delay. This suggests that delays are not the sole cause‚Äîother factors such as product issues or customer expectations may also play a significant role.


## ‚≠ê Service Ratings: The Voice of the Customer

Customer feedback provides invaluable insights into service quality and areas for improvement.


In [None]:
sns.countplot(
    data=df, 
    x='Service Rating', 
    hue='Service Rating',    # required for palette
    palette='Oranges', 
    dodge=False, 
    legend=False
)
plt.title('Distribution of Service Ratings')
plt.xlabel('Service Rating')
plt.ylabel('Number of Orders')
plt.show()


### Insights

- The majority of customers give ratings of **4 or 5**, reflecting high overall satisfaction.  
- A small proportion of low ratings (**1‚Äì2**) persist, typically linked to refunds, delivery issues, or other negative experiences.


## üìä Service Rating vs Delivery Delay and Refund
Analyzing service ratings alongside delivery delays and refund requests reveals how operational issues impact customer satisfaction.


In [None]:
plt.figure(figsize=(8,5))
sns.boxplot(
    data=df,
    x='Delivery Delay',
    y='Service Rating',
    hue='Delivery Delay',              # assign hue
    palette=['#66c2a5','#fc8d62'],    # custom colors
    dodge=False,                       # keep one box per category
    legend=False                       # remove extra legend
)
plt.title('Service Rating vs Delivery Delay')
plt.show()

plt.figure(figsize=(8,5))
sns.boxplot(
    data=df,
    x='Refund Requested',
    y='Service Rating',
    hue='Refund Requested',           # assign hue
    palette=['#8da0cb','#e78ac3'],   # custom colors
    dodge=False,
    legend=False
)
plt.title('Service Rating vs Refund Requested')
plt.show()


**Takeaways:**  
- Orders with delivery delays or refund requests tend to have lower ratings.  
- Timely delivery and seamless refund experiences strongly influence customer satisfaction.


## üìà Platform Comparison

Comparing service ratings across platforms helps identify which channels deliver the best customer experience.


In [None]:
plt.figure(figsize=(8,5))
sns.boxplot(
    data=df,
    x='Platform',
    y='Service Rating',
    hue='Platform',  
    palette={'Swiggy Instamart':'orange', 'Blinkit':'yellow', 'JioMart':'red'},
    dodge=False,     
    legend=False    
)
plt.title('Service Ratings Across Platforms')
plt.xlabel('Platform')
plt.ylabel('Service Rating')
plt.show()


### Observations

- Ratings for **JioMart** are relatively consistent, reflecting stable customer satisfaction.  
- **Blinkit** and **Swiggy Instamart** display higher variability in ratings, indicating inconsistent customer experiences.


## üí∞ Refund Requests by Platform

Analyzing refund requests across platforms highlights which services face more post-order issues and customer dissatisfaction.


In [None]:
refund_platform = df.groupby(['Platform', 'Refund Requested']).size().unstack()

ax = refund_platform.plot(kind='bar', stacked=True, figsize=(8,5), color=['skyblue', 'orange'])

plt.title("Refund Requests by Platform")
plt.ylabel("Number of Orders")
plt.xlabel("Platform")
plt.xticks(rotation=0)
plt.legend(title="Refund Requested")

# Add counts on top of each bar segment
for i, platform in enumerate(refund_platform.index):
    cumulative_height = 0
    for j, refund_status in enumerate(refund_platform.columns):
        count = refund_platform.loc[platform, refund_status]
        plt.text(i, cumulative_height + count/2, str(count), ha='center', va='center', color='black', fontsize=10)
        cumulative_height += count

plt.tight_layout()
plt.show()


### Insight

Refund patterns vary across platforms, which may indicate differences in operational efficiency, service quality, or delivery reliability.


## üìâ Correlation Analysis

Let‚Äôs explore whether there are any linear relationships between key numeric variables in the dataset.


In [None]:
corr = df[['Delivery Time (Minutes)', 'Order Value (INR)', 'Service Rating']].corr()

plt.figure(figsize=(6,5))
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation between Numerical Features')
plt.show()


### Result

Correlation analysis shows no strong linear relationships among delivery time, order value, and service ratings.  

üí° **Insight:** Customer satisfaction is influenced by multiple factors beyond speed and cost, with overall experience and reliability being key determinants.


## üß© Key Findings and Business Insights

| Objective           | Key Finding                                      | Business Implication                      |
|--------------------|-------------------------------------------------|-------------------------------------------|
| Platform Performance | All three are close competitors                | Focus on differentiation via service      |
| Product Trends      | Essentials lead, but Personal Care has highest spend | Market premium bundles                   |
| Delivery            | 13.7% delayed orders                            | Optimize last-mile logistics              |
| Refunds             | 46% refund rate (even without delay)           | Improve product quality and communication   |
| Ratings             | High but variable                               | Build consistency and trust               |


## üèÅ Conclusion: What the Data Tells Us

The analysis of 100,000 online grocery orders reveals clear patterns in customer behavior, delivery performance, and platform efficiency. Essential categories like Grocery and Dairy dominate order volume, while Personal Care often drives higher spending. Delivery delays affect around 13‚Äì14% of orders, and nearly half of all orders involve refund requests ‚Äî a surprisingly high figure that signals issues beyond just late deliveries. 

Customer ratings remain high overall, but variability across platforms suggests room for more consistent service quality. While speed is important, the data shows that reliability and smooth issue resolution play an equally crucial role in customer satisfaction.

To convert these insights into meaningful business value, it is important to look beyond what the data shows ‚Äî and understand what platforms can *do* because of it.  

In the next section, we translate analytics into **platform-specific, actionable strategies** for Blinkit, JioMart, and Swiggy Instamart.


## üß† Actionable Business Recommendations for Blinkit, JioMart and Swiggy Instamart

Data is most powerful when it drives decisions. Based on the patterns observed in delivery times, service ratings, refund behavior, and product preferences, here are targeted strategies each platform can implement to improve customer satisfaction, operational efficiency, and profitability.

---

### üîÑ Cross-Platform Recommendations (Useful for All)

#### **1. Reduce Refund Rates (currently ~46%)**
- Improve packaging quality, especially for fragile items.
- Introduce quality scoring for SKUs and auto-block high-return products.
- Provide real-time item replacement suggestions to reduce refund requests.

#### **2. Enhance Reliability and Experience Consistency**
- Standardize delivery partner processes.
- Improve real-time order tracking accuracy.
- Notify customers proactively about delays to avoid dissatisfaction.

#### **3. Target High-Value Customers More Effectively**
- Customers who buy Dairy and Personal Care respond well to:
  - subscription models  
  - bundle offers  
  - personalized recommendations  

---

## Platform-Specific Recommendations

### **Blinkit**
Blinkit shows the highest variability in service ratings, indicating inconsistent experiences.

**Recommendations:**
- Improve operational consistency across cities and dark stores.
- Introduce upselling nudges to increase Average Order Value.
- Reduce refund rate by offering instant replacements instead of refunds.

---

### **JioMart**
JioMart delivers strong consistency but lacks in boosting higher-margin categories.

**Recommendations:**
- Promote high-margin categories like snacks and personal care.
- Implement personalized ‚ÄúSmart Basket‚Äù auto-fill suggestions for routine buyers.
- Improve product quality control to reduce non-delay-related refunds.

---

### **Swiggy Instamart**
 Swiggy Instamart leads in both order volume and AOV‚Äîindicating a premium user base‚Äîbut refund issues and rating variability persist.

**Recommendations:**
- Strengthen premium positioning with curated categories (e.g., gourmet, healthy snacks).
- Improve substitution logic and packaging quality to reduce refunds.
- Optimize high-volume routes to reduce delays further.

---

### ‚öôÔ∏è Operational Improvements

#### **Delivery Delays ‚Üí Lower Ratings**
- Implement time-slot based delivery in congested regions.
- Improve micro-warehouse placement using demand heatmaps.
- Adopt predictive routing models.

#### **Refund Requests Not Strongly Linked to Delays**
This suggests product quality and availability issues.

**Fix:**
- Track SKU dissatisfaction rate.
- Improve inventory forecasting.
- Provide dynamic substitutions based on customer history.

---

### üß≠ Strategic Takeaways for Business Leaders

- **Blinkit** ‚Üí Focus on experience consistency  
- **JioMart** ‚Üí Expand and promote high-margin categories  
- **Swiggy Instamart** ‚Üí Reduce refunds, amplify premium positioning  
- **All Platforms** ‚Üí Optimize delivery routing, real-time transparency, proactive communication

These actionable insights convert your analysis into meaningful business guidance for all three platforms.


## üéØ Final Summary: From Insights to Action

This project goes beyond exploring patterns in online grocery data ‚Äî it bridges the gap between analytics and real business decision-making. By identifying trends in delivery performance, refund behavior, product preferences, and customer satisfaction, we generate actionable insights that Blinkit, JioMart, and Swiggy Instamart can use to enhance their operations.

The findings highlight the importance of reliability, consistency, transparency, and proactive service. Whether it's optimizing delivery routes, improving SKU quality, enhancing substitution logic, or targeting high-value customer segments, the recommended strategies provide each platform with a roadmap to strengthen performance and customer trust.

As India‚Äôs online grocery market continues to expand, data-driven decision-making will be the key differentiator. The insights and recommendations shared here aim to support these platforms in delivering faster, smarter, and more delightful experiences to millions of users.
