<a href="https://colab.research.google.com/github/bidyashreenayak0211/Labmentix-Internship/blob/main/Flipkart_Project_EDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name** - Bidyashree Nayak

# **Project Summary -**

Flipkart, one of India's leading e-commerce platforms, has revolutionized the online shopping experience by offering a wide range of products, seamless delivery services, and innovative payment solutions. This project explores the journey of Flipkart, its challenges, and the strategic solutions it has implemented to sustain growth in a competitive market.

The e-commerce landscape in India has evolved significantly, with increasing internet penetration, digital payment adoption, and changing consumer behaviors. However, Flipkart faces intense competition from domestic and international players, including Amazon, Reliance’s JioMart, and local marketplaces.

This project aims to analyze the problems Flipkart encounters, its strategic objectives, and the solutions that have helped it maintain its leadership position. The study also highlights how Flipkart continues to innovate, expand its reach, and enhance customer experience while addressing logistical and operational challenges.



# **GitHub Link -**

https://github.com/bidyashreenayak0211/Labmentix-Internship/blob/main/Flipkart_Project_EDA.ipynb

# **Problem Statement**


As India's e-commerce market expands, Flipkart faces several key challenges that impact its growth and sustainability:

**Intense Competition –** Competing with Amazon, JioMart, and other local players requires constant innovation and pricing strategies.

**Logistics and Supply Chain Issues –** Delivering products across a vast and diverse geography presents challenges related to infrastructure, last-mile delivery, and cost management.

**Customer Retention and Satisfaction –** Maintaining customer loyalty in a highly competitive market requires consistent service excellence, personalized experiences, and hassle-free return policies.

**Regulatory and Policy Challenges –** Changes in government regulations regarding foreign investment, e-commerce policies, and taxation impact business operations.

**Profitability Concerns –** Discount-driven customer acquisition strategies often lead to thin profit margins, making financial sustainability a challenge.

#### **Define Your Business Objective?**

To address these challenges and sustain its market leadership, Flipkart focuses on the following business objectives:

**Enhancing Customer Experience –** Improve user interface, faster delivery, personalized recommendations, and customer support.

**Strengthening Logistics and Supply Chain –** Invest in warehouse automation, expand regional distribution centers, and optimize last-mile delivery solutions.

**Expanding Market Reach –** Penetrate Tier 2 and Tier 3 cities with localized marketing, vernacular language support, and affordable pricing strategies.

**Leveraging Technology and AI –**Use data analytics, AI-driven recommendations, and machine learning to improve business decisions and customer engagement.

**Diversifying Product Offerings –** Expand categories such as groceries, fashion, and electronics to cater to a broader audience.

**Ensuring Compliance and Sustainability –** Adapt to regulatory changes while promoting sustainability in packaging, logistics, and overall operations.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
file_path = '/content/Customer_support_data.csv'
df = pd.read_csv(file_path)

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows, columns = df.shape
print(f"Rows: {rows}, Columns: {columns}")

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_count = df.duplicated().sum()
print(f"Number of duplicate rows: {duplicate_count}")

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values = df.isnull().sum()
print(missing_values)

In [None]:
# Visualizing the missing values

# Plot a heatmap for missing values
plt.figure(figsize=(12, 6))
sns.heatmap(df.isnull(), cbar=False, cmap='RdYlGn', yticklabels=False)
plt.title("Heatmap of Missing Values", fontsize=16)
plt.show()

### What did you know about your dataset?

The dataset consists of **18,115 rows and 20 columns**, capturing customer interactions, issue resolution, agent performance, and customer satisfaction (CSAT) data. While the dataset is free from duplicate records, it contains **significant missing values** in key columns such as **Customer Remarks (12,181 missing), Order Details (Order ID, Order Date, Product Category, and Item Price), and Customer City (15,189 missing)**. The **Connected Handling Time is almost entirely missing (18,072 missing),** making it unreliable for analysis. On the positive side, **CSAT scores, agent details, and issue timestamps are mostly complete,** making customer satisfaction and agent performance viable areas for analysis. However, the high volume of missing order-related and customer location data may limit insights into product trends and regional service performance. While the dataset is valuable for evaluating response times and customer satisfaction trends, **further data cleaning and imputation may be required** to ensure accurate insights, particularly in areas related to order tracking and customer demographics.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
columns = df.columns
print(columns)

In [None]:
# Dataset Describe
df.describe()

### Variables Description

### **Variable Descriptions**  

1. **Unique id** *(object)* – A unique identifier for each customer interaction.  

2. **channel_name** *(object)* – The communication channel used by the customer (e.g., phone, email, chat).  

3. **category** *(object)* – The broad category of the customer issue (e.g., billing, technical support).  

4. **Sub-category** *(object)* – A more specific classification under the main category.  

5. **Customer Remarks** *(object, many missing values)* – Any additional comments provided by the customer regarding their issue.  

6. **Order_id** *(object, many missing values)* – The order number related to the customer interaction, if applicable.  

7. **order_date_time** *(object, many missing values)* – The timestamp when the order was placed.  

8. **Issue_reported at** *(datetime, mostly complete)* – The timestamp when the customer reported their issue.  
   - **Range:** January 8, 2023 – September 8, 2023.  
   - **Mean reported time:** April 17, 2023.  

9. **issue_responded** *(object, mostly complete)* – The timestamp when an agent responded to the issue.  

10. **Survey_response_Date** *(object, mostly complete)* – The timestamp when the customer provided a survey response.  

11. **Customer_City** *(object, many missing values)* – The city where the customer is located.  

12. **Product_category** *(object, many missing values)* – The category of the product related to the issue.  

13. **Item_price** *(float, highly variable, many missing values)* – The price of the product associated with the issue.  
   - **Min:** $9  
   - **Max:** $134,999  
   - **Mean:** $6,678.86 (highly skewed due to expensive items).  

14. **connected_handling_time** *(float, nearly empty)* – The time taken to handle the issue while actively connected.  
   - **Few records (43 non-null values), making it unreliable for analysis**.  
   - **Min:** 0 minutes, **Max:** 1,115 minutes, **Mean:** 495.81 minutes.  

15. **Agent_name** *(object, mostly complete)* – The name of the customer service agent handling the interaction.  

16. **Supervisor** *(object, mostly complete)* – The name of the supervisor overseeing the agent.  

17. **Manager** *(object, mostly complete)* – The name of the manager responsible for the agent's performance.  

18. **Tenure Bucket** *(object, mostly complete)* – The experience level of the agent (e.g., 0-6 months, 1-3 years).  

19. **Agent Shift** *(object, mostly complete)* – The shift timing of the agent (e.g., morning, evening, night).  

20. **CSAT Score** *(float, mostly complete)* – The customer satisfaction rating given by the customer.  
   - **Scale:** 1 to 5  
   - **Mean CSAT:** 4.20  
   - **Most common ratings:** 4 and 5, indicating generally positive feedback.  

21. **Response Time (mins)** *(float, mostly complete)* – The time taken by the agent to respond to an issue.  
   - **Min:** 0 minutes, **Max:** 176,769 minutes (outliers likely present).  
   - **Median Response Time:** 6 minutes.  
   - **Highly skewed distribution due to some extreme values.**  

### **Key Observations:**  
- Several fields have **high missing values** (Customer Remarks, Order Details, Item Price, Customer City, and Connected Handling Time).  
- **Response Time and Item Price are highly skewed** due to extreme values.  
- **CSAT scores show an overall positive trend** with a median rating of 5.  
- **Issue Reported Dates span from January to September 2023,** with most cases reported around April.  

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_values = df.nunique()
print(unique_values)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# Convert date columns to datetime format
date_cols = ['order_date_time', 'Issue_reported at', 'issue_responded', 'Survey_response_Date']
for col in date_cols:
    df[col] = pd.to_datetime(df[col], errors='coerce')

# Handling missing values
df.fillna({'Customer Remarks': 'No Remarks', 'Customer_City': 'Unknown', 'Product_category': 'Unknown'}, inplace=True)

# Feature Engineering: Response Time
df['Response Time (mins)'] = (df['issue_responded'] - df['Issue_reported at']).dt.total_seconds() / 60

# Dropping irrelevant columns
df.drop(columns=['Unique id', 'Order_id'], inplace=True)

In [None]:
# Print first few rows
df.head()

### What all manipulations have you done and insights you found?

 1. **Convert Date Columns to Datetime Format**  
   - **Columns Converted:** `'order_date_time', 'Issue_reported at', 'issue_responded', 'Survey_response_Date'`.  
   - **Purpose:** Ensures these columns are in a proper datetime format, allowing for accurate time-based calculations such as response time and trend analysis.  

2. **Handling Missing Values**  
   - **Customer Remarks:** Replaced `NaN` with `"No Remarks"` to ensure missing values do not disrupt text-based analysis.  
   - **Customer_City:** Replaced `NaN` with `"Unknown"` to avoid dropping records with missing city data.  
   - **Product_category:** Replaced `NaN` with `"Unknown"` to prevent loss of data due to missing product information.  

3. **Feature Engineering – Response Time Calculation**  
   - **New Column Created:** `'Response Time (mins)'`  
   - **Formula:** `df['issue_responded'] - df['Issue_reported at']`, converted to minutes.  
   - **Purpose:** Helps in measuring agent efficiency and identifying trends in response delays.  

4. **Dropping Irrelevant Columns**  
   - **Columns Dropped:** `'Unique id'` and `'Order_id'`.  
   - **Reason:** `'Unique id'` is only an identifier, not useful for analysis. `'Order_id'` has many missing values and is not essential for response time or CSAT analysis.  

---

### **Insights Found After Data Preparation**  

1. **Improved Data Quality for Time-Based Analysis**  
   - The dataset can now support **trend analysis on response time, issue resolution speed, and customer feedback timing.**  

2. **Customer Data Can Now Be Used for Aggregation**  
   - With `"Unknown"` as a placeholder for missing values, we can still analyze **customer trends without dropping records.**  

3. **Agent Performance Analysis is Possible**  
   - The new `'Response Time (mins)'` column enables analysis of **average response times, slowest/fastest agents, and potential process inefficiencies.**  

4. **Data Integrity is Maintained**  
   - Instead of deleting missing values, we **filled them strategically**, ensuring we retain **maximum data for meaningful insights.**

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

#Count of Customer Interactions by Channel
import seaborn as sns

plt.figure(figsize=(12,6))
sns.countplot(data=df, x='channel_name', order=df['channel_name'].value_counts().index, palette='coolwarm')
plt.title('Count of Customer Interactions by Channel')
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

A countplot was chosen because it effectively visualizes categorical data and shows which channels receive the most customer interactions.

##### 2. What is/are the insight(s) found from the chart?

* Some customer service channels (e.g., phone or chat) are used significantly
more than others.

* If some channels have low engagement, they might need better promotion or improvement

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, businesses can optimize resources by prioritizing the most popular channels.
* If a low-engagement channel is costly, it might be discontinued or improved.

**Any insights that lead to negative growth?**

* If an important channel (e.g., email) has very few interactions, it might indicate accessibility issues.
* Customers preferring inefficient channels (e.g., long phone calls instead of self-service options) can increase operational costs.

#### Chart - 2

In [None]:
# Chart - 2 visualization code

#Distribution of Issue Categories
plt.figure(figsize=(12,6))
sns.countplot(data=df, x='category', order=df['category'].value_counts().index, palette='viridis')  # Change palette here
plt.title('Distribution of Issue Categories')
plt.xticks(rotation=90)
plt.show()


##### 1. Why did you pick the specific chart?

A countplot clearly shows which issue categories are most frequent, helping to identify major customer pain points.

##### 2. What is/are the insight(s) found from the chart?

Certain issues occur more frequently than others, possibly indicating a recurring product or service problem.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, addressing the most common issues can significantly improve customer satisfaction and reduce support costs.

**Any insights that lead to negative growth?**

If a particular issue is consistently high, it could signal a deeper product or service flaw, leading to higher churn rates if not resolved.

#### Chart - 3

In [None]:
# Chart - 3 visualization code

df['Issue_reported at'] = pd.to_datetime(df['Issue_reported at'], errors='coerce')

#Daily Trend of Reported Issues
plt.figure(figsize=(12,6))
df.groupby(df['Issue_reported at'].dt.date).size().plot(color='red')  # Change color here
plt.title('Daily Trend of Reported Issues')
plt.xlabel('Date')
plt.ylabel('Number of Issues')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A line plot is the best way to visualize trends over time and identify patterns in issue reporting.

##### 2. What is/are the insight(s) found from the chart?

Peaks in issue reporting may correlate with product launches, marketing campaigns, or seasonal demand fluctuations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps predict when to allocate more customer service resources.
Identifies specific dates where issues surged, helping diagnose root causes.

**Any insights that lead to negative growth?**

A consistent upward trend in reported issues could indicate deteriorating service or product quality.Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

#Response Time Distribution by Category
plt.figure(figsize=(12,6))
sns.boxplot(data=df, x='category', y='Response Time (mins)', palette='coolwarm')  # Change palette here
plt.title('Response Time Distribution by Category')
plt.xticks(rotation=90)
plt.yscale('log')
plt.show()


##### 1. Why did you pick the specific chart?

A boxplot is ideal for analyzing the spread of response times across different categories, including outliers

##### 2. What is/are the insight(s) found from the chart?

Some categories have a much wider range of response times, indicating inconsistency.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Helps identify which categories need faster response times.
* Supports better staffing and training decisions.

**Any insights that lead to negative growth?**

If some categories have extremely high response times, customers may become frustrated and leave.

#### Chart - 5

In [None]:
# Chart - 5 visualization code

#Distribution of CSAT Scores
plt.figure(figsize=(12,6))
sns.histplot(df['CSAT Score'], bins=10, kde=True)
plt.title('Distribution of CSAT Scores')
plt.xlabel('CSAT Score')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

A histogram is best for showing the overall distribution of customer satisfaction scores.

##### 2. What is/are the insight(s) found from the chart?

* If the CSAT scores are mostly low, it indicates dissatisfaction.
* A high concentration of high scores suggests strong service performance.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Helps businesses understand overall customer sentiment.
* Can be used to improve policies, training, or product quality.

**Any insights that lead to negative growth?**

If the CSAT score distribution skews toward low values, it suggests systemic service issues that need urgent fixing.

#### Chart - 6

In [None]:
# Chart - 6 visualization code

#Proportion of Agent Shifts
plt.figure(figsize=(12,6))
df['Agent Shift'].value_counts().plot.pie(autopct='%1.1f%%', startangle=90)
plt.title('Proportion of Agent Shifts')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

A pie chart is useful for visualizing proportional distribution.

##### 2. What is/are the insight(s) found from the chart?

Some shifts might be overstaffed or understaffed.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps with workforce planning and shift optimization.

**Any insights that lead to negative growth?**

If a shift has very few agents but high demand, response times will suffer.

#### Chart - 7

In [None]:
# Chart - 7 visualization code

#Tenure Bucket Distribution of Agents
plt.figure(figsize=(12,6))
sns.countplot(data=df, x='Tenure Bucket', order=df['Tenure Bucket'].value_counts().index, palette='Set2')  # Change palette here
plt.title('Tenure Bucket Distribution of Agents')
plt.show()

##### 1. Why did you pick the specific chart?

A countplot shows how many agents fall into each tenure bucket.

##### 2. What is/are the insight(s) found from the chart?

A high proportion of new agents may indicate high turnover.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps HR teams design retention strategies.

**Any insights that lead to negative growth?**

If most agents have low tenure, service quality may be impacted due to inexperience.

#### Chart - 8

In [None]:
# Chart - 8 visualization code

#Response Time by Agent Shift
plt.figure(figsize=(12,6))
sns.boxplot(data=df, x='Agent Shift', y='Response Time (mins)', palette="Paired")
plt.title('Response Time by Agent Shift')
plt.yscale('log')
plt.show()

##### 1. Why did you pick the specific chart?

A boxplot highlights variation in response time across shifts.

##### 2. What is/are the insight(s) found from the chart?

Certain shifts may have significantly slower response times.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps in optimizing staffing decisions.

**Any insights that lead to negative growth?**

If response times are consistently high for a shift, customer satisfaction will decline.

#### Chart - 9

In [None]:
# Chart - 9 visualization code

#Count of Issues by Product Categor
plt.figure(figsize=(12,6))
sns.countplot(data=df, x='Product_category', order=df['Product_category'].value_counts().index , palette = 'Set1')
plt.title('Count of Issues by Product Category')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

A countplot highlights the most problematic product categories.

##### 2. What is/are the insight(s) found from the chart?

Some product categories have significantly more reported issues

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps in identifying and fixing problematic products.

**Any insights that lead to negative growth?**

If one category dominates, it might hurt brand perception.

#### Chart - 10

In [None]:
# Chart - 10 visualization code

#Item Price vs. Response Time
plt.figure(figsize=(12,6))
sns.scatterplot(data=df, x='Item_price', y='Response Time (mins)', alpha=0.5)
plt.title('Item Price vs. Response Time')
plt.yscale('log')
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot helps in analyzing the correlation between price and response time.

##### 2. What is/are the insight(s) found from the chart?

High-priced items may be receiving faster/slower support than others.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Ensures pricing and support alignment.

**Any insights that lead to negative growth?**

If expensive items have slow responses, it could lead to customer churn.

#### Chart - 11

In [None]:
# Chart - 11 visualization code

#CSAT Score Distribution by Category
plt.figure(figsize=(12,6))
sns.violinplot(data=df, x='category', y='CSAT Score')
plt.title('CSAT Score Distribution by Category')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

A violin plot is useful for comparing distributions across multiple categories.

##### 2. What is/are the insight(s) found from the chart?

Some categories have a much wider variation in CSAT scores.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps businesses focus on improving service in low-rated categories.

**Any insights that lead to negative growth?**

If certain categories consistently have low CSAT, brand reputation may suffer.

#### Chart - 12

In [None]:
# Chart - 12 visualization code

#Top 10 Slowest Responding Agents
plt.figure(figsize=(12,6))

# Define a color palette (choose one)
palette = 'viridis'  # Or try 'coolwarm', 'plasma', 'magma', etc.

df.groupby('Agent_name')['Response Time (mins)'].mean().nlargest(10).plot(kind='bar', colormap=palette)

plt.title('Top 10 Slowest Responding Agents')
plt.ylabel('Avg. Response Time (mins)')
plt.xlabel('Agent Name')
plt.xticks(rotation=45)  # Rotate x-axis labels for readability
plt.show()


##### 1. Why did you pick the specific chart?

* A bar chart was chosen because it is an effective way to rank agents based on their average response time.
* It highlights the agents with the slowest response times, helping identify performance issues.

##### 2. What is/are the insight(s) found from the chart?

* The slowest agents have significantly higher response times than others, indicating inefficiencies.
* Some agents may require additional training or workload adjustments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, by identifying underperforming agents, managers can provide targeted training.
* Workload balancing can be adjusted if response time issues stem from excessive queries.
* Addressing slow responses can increase customer satisfaction and retention.

**Any insights that lead to negative growth?**
* If these slowest agents are not coached or reassigned, customer frustration and churn may increase.
* Long response times can lead to poor CSAT scores and negative brand perception.

#### Chart - 13

In [None]:
# Chart - 13 visualization code

#Average CSAT Score by Manager
plt.figure(figsize=(12,6))
df.groupby('Manager')['CSAT Score'].mean().plot(kind='bar', color='pink')  # Change color here
plt.title('Average CSAT Score by Manager')
plt.ylabel('Avg. CSAT Score')
plt.show()


##### 1. Why did you pick the specific chart?

A bar chart was used to compare CSAT scores across different managers, helping assess leadership effectiveness.

##### 2. What is/are the insight(s) found from the chart?

* Managers with higher CSAT scores likely lead better-performing teams.
* Lower-scoring managers may need leadership training or better team support.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Helps reward and promote high-performing managers.
* Identifies managers who need additional support, ultimately leading to improved service quality.

Any insights that lead to negative growth?

If low CSAT managers are not held accountable, it can result in poor service quality, high attrition, and dissatisfied customers.

#### Chart - 14

In [None]:
# Chart 14 - Visualization code

#Top 10 Fastest Responding Supervisors
plt.figure(figsize=(12,6))
colors = sns.color_palette('husl', 10)  # Generates 10 different colors
df.groupby('Supervisor')['Response Time (mins)'].mean().nsmallest(10).plot(kind='bar', color=colors)
plt.title('Top 10 Fastest Responding Supervisors')
plt.ylabel('Avg. Response Time (mins)')
plt.show()

##### 1. Why did you pick the specific chart?

* A bar chart effectively ranks supervisors based on their average response time.
* Helps recognize supervisors who efficiently manage escalations and team performance.

##### 2. What is/are the insight(s) found from the chart?

* Some supervisors are significantly faster in handling issues, indicating efficient problem-solving skills.
* The slowest supervisors might be overburdened or need better time management.

3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

* Helps reward high-performing supervisors and use their strategies as best practices.
* Can reallocate workload to balance response times across supervisors.

**Any insights that lead to negative growth?**
* If slow supervisors are not supported, it can increase issue resolution times, leading to customer dissatisfaction.
* Unequal workload distribution may burn out top-performing supervisors.

#### Chart - 15

In [None]:
#Chart 15 - Visualization code

#Top 10 Customer Cities
plt.figure(figsize=(12,6))
colors = sns.color_palette('pastel', 10)  # Using a pastel color palette
df['Customer_City'].value_counts().nlargest(10).plot(kind='bar', color=colors)
plt.title('Top 10 Customer Cities')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

* A bar chart helps visualize which cities have the highest number of customers.
* Useful for regional business expansion and service allocation.

##### 2. What is/are the insight(s) found from the chart?

* The highest volume cities may need more customer service agents.
* Some high-issue cities may indicate product/service problems in those regions.


3.Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

* Businesses can allocate more resources to high-traffic cities, reducing response times and improving satisfaction.
* Helps in planning location-based promotions or service expansions.

**Any insights that lead to negative growth?**
* If certain cities have high complaints and low CSAT, it indicates poor service in that region.
* Failing to address service issues in key cities can lead to regional brand damage and customer loss.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To meet its business objectives, Flipkart has implemented strategic solutions, including:

**1.Supply Chain and Logistics Optimization**

* Development of "Flipkart Logistics" to streamline warehouse management and last-mile delivery.
* Introduction of Flipkart Assured for faster and quality-checked deliveries.
* Investment in AI-powered inventory management to optimize stock levels.

**2.Customer-Centric Innovations**

* AI-driven product recommendations and chatbots for customer support.
Personalized shopping experiences through data analytics.
* Easy return policies and no-cost EMI options to attract and retain customers.

**3.Competitive Pricing and Affordability Initiatives**

* "Big Billion Days" and other exclusive sales events to drive traffic and boost revenue.
* Partnerships with banks for cashback and EMI offers to increase affordability.
* Introduction of Flipkart Pay Later for easy financing options.

**4.Technological Advancements**

* Use of machine learning to detect fraudulent transactions and enhance security.
* Integration of voice search and vernacular language support to cater to diverse demographics.
* Implementation of AR/VR features for a better online shopping experience.

**5.Strategic Partnerships and Market Expansion**

* Walmart’s acquisition of Flipkart strengthened financial backing and expertise.
* Expansion into new segments like grocery and fashion through Flipkart Wholesale and Flipkart Health+.
* Focus on rural e-commerce by introducing hyperlocal delivery and seller support programs.

# **Conclusion**

Flipkart’s journey highlights the evolution of e-commerce in India and the strategic initiatives required to remain competitive in a dynamic market. By continuously investing in technology, logistics, and customer experience, Flipkart has positioned itself as a market leader. However, with rising competition and changing regulatory landscapes, the company must continue to innovate and adapt.

The solutions implemented by Flipkart showcase how data-driven strategies, customer-centric policies, and efficient supply chain management can drive success in the digital economy. As it expands further into new segments, sustainability, profitability, and compliance will be key to its long-term growth.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***