# **Project Name** - Exploratory Data Analysis on Flipkart Customer Support Data

##### **Project Type** - EDA
##### **Contribution** - Individual

# **Project Summary -**

This project presents a comprehensive Exploratory Data Analysis (EDA) of the Flipkart Customer Support dataset. The primary goal is to derive actionable insights from customer interaction data to enhance service quality and operational efficiency. The analysis begins with data loading and an extensive cleaning process, where missing values are imputed using median and mode strategies, and data types are corrected to ensure data integrity for analysis. The core of the project involves a structured visualization approach following the Univariate, Bivariate, and Multivariate (UBM) rule.

Univariate analysis focuses on understanding the distribution of key individual variables, such as Customer Satisfaction (CSAT) scores, the volume of requests per support channel, and the breakdown of inquiries by category. This reveals that customer satisfaction is generally very high, with most responses being a '5'. Bivariate analysis then explores the relationships between pairs of variables, such as the impact of the support channel, agent shift, and agent tenure on CSAT scores. Key findings indicate that while all channels perform well, some exhibit greater variance in satisfaction scores. Finally, a multivariate analysis using a correlation heatmap is performed on the numerical features to identify any underlying relationships between them, showing a moderate positive correlation between item price and handling time.

The insights gathered are synthesized to provide concrete, data-driven recommendations aimed at addressing business objectives. These include focusing on process improvements for the most frequent inquiry categories (like order-related issues) and monitoring channels with higher satisfaction variance. The project concludes that the support team is performing effectively but highlights specific areas where targeted improvements could further elevate the customer experience.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**

The business wants to understand the key factors that influence customer satisfaction and the primary reasons for customer support interactions. While the support team is believed to be performing well, there is a lack of data-driven insights to confirm this and identify specific areas for improvement. The key challenges are to quantify customer satisfaction levels, identify the most common issues and preferred support channels, and determine if factors like agent experience or shift timings impact service quality. Without this analysis, any efforts to improve customer support are based on assumptions rather than evidence, potentially leading to inefficient allocation of resources.

#### **Define Your Business Objective?**

The primary business objective is to enhance customer satisfaction and improve the operational efficiency of the Flipkart customer support team by leveraging data-driven insights. This involves identifying the key drivers of high and low satisfaction scores, understanding the main categories of customer issues, and optimizing resource allocation across different support channels and agent shifts.

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Install the squarify library first
!pip install squarify

# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import squarify # for treemaps

# Set default styles for plots
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 7)

print("All libraries installed and imported successfully!")

### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv('/content/Customer_support_data.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

## 3. ***Data Wrangling***

In [None]:
# --- Corrected Data Wrangling Code ---
print("Starting data wrangling...")
df_wrangled = df.copy()
initial_rows = len(df_wrangled)
df_wrangled.drop_duplicates(inplace=True)
print(f"Removed {initial_rows - len(df_wrangled)} duplicate rows.")

numerical_cols = df_wrangled.select_dtypes(include=np.number).columns
for col in numerical_cols:
    if df_wrangled[col].isnull().any():
        median_val = df_wrangled[col].median()
        df_wrangled[col].fillna(median_val, inplace=True)

categorical_cols = df_wrangled.select_dtypes(include=['object']).columns
for col in categorical_cols:
     if df_wrangled[col].isnull().any():
        mode_val = df_wrangled[col].mode()[0]
        df_wrangled[col].fillna(mode_val, inplace=True)

date_cols_to_convert = ['order_date_time', 'Issue_reported at', 'issue_responded', 'Survey_response_Date']
for col in date_cols_to_convert:
    if col in df_wrangled.columns:
        df_wrangled[col] = pd.to_datetime(df_wrangled[col], errors='coerce')

cols_to_drop = ['unique_id', 'Agent_name']
existing_cols_to_drop = [col for col in cols_to_drop if col in df_wrangled.columns]
df_cleaned = df_wrangled.drop(columns=existing_cols_to_drop)
print(f"Dropped identifier columns: {existing_cols_to_drop}")
print("\nData wrangling complete.")

## ***4. Data Vizualization, Storytelling & Experimenting with charts***

#### Chart - 1: Distribution of CSAT Scores (Univariate)

In [None]:
sns.countplot(data=df_cleaned, x='CSAT Score', order=df_cleaned['CSAT Score'].value_counts().index, palette='viridis')
plt.title('Distribution of Customer Satisfaction (CSAT) Scores', fontsize=16)
plt.xlabel('CSAT Score', fontsize=12)
plt.ylabel('Number of Responses', fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?
A **count plot** is the most effective choice for visualizing the distribution of a discrete, categorical variable like `CSAT Score`. It clearly and simply shows the frequency of each score, making it easy to understand the overall sentiment of customers.

##### 2. What is/are the insight(s) found from the chart?
The most prominent insight is that the vast majority of customer interactions result in a **CSAT score of 5**. This indicates a very high level of overall customer satisfaction. Scores of 1, 2, and 3 are significantly less frequent, suggesting that highly negative experiences are relatively rare.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** Yes. This insight provides quantitative evidence that the customer support team is performing very well, which can boost team morale and be used in stakeholder reports. It confirms that current strategies are largely successful.
**Negative Growth Insights:** There are no direct insights that lead to negative growth. However, the overwhelming number of '5' scores could potentially hide nuanced issues or lead to complacency. The business should focus on understanding the smaller number of negative scores to find areas for improvement.

#### Chart - 2: Support Requests by Channel (Univariate)

In [None]:
sns.countplot(data=df_cleaned, y='channel_name', order=df_cleaned['channel_name'].value_counts().index, palette='plasma')
plt.title('Number of Support Requests by Channel', fontsize=16)
plt.xlabel('Count', fontsize=12)
plt.ylabel('Channel', fontsize=12)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?
A **horizontal count plot** was chosen to clearly display the volume of requests for each support channel. Using a horizontal orientation provides ample space for the channel names on the y-axis, preventing text overlap and improving readability.

##### 2. What is/are the insight(s) found from the chart?
The chart clearly shows that **'Inbound'** calls are the most frequently used support channel by a significant margin. **'Outcall'** is the second most common, while digital channels like **'Chat'** and **'Social Media'** have a much lower volume of interactions.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** Yes. This insight is crucial for resource allocation. The business should ensure the 'Inbound' channel is well-staffed and equipped to handle the high volume. It also highlights an opportunity to promote and improve the efficiency of digital channels like 'Chat', which are often more cost-effective.
**Negative Growth Insights:** The heavy reliance on phone calls ('Inbound') could indicate that customers are not finding answers through self-service options or that digital channels are not meeting their needs, which could be an operational inefficiency.

#### Chart - 3: CSAT Score vs. Channel Name (Bivariate)

In [None]:
sns.boxplot(data=df_cleaned, x='CSAT Score', y='channel_name', palette='GnBu')
plt.title('CSAT Score Distribution by Support Channel', fontsize=16)
plt.xlabel('CSAT Score', fontsize=12)
plt.ylabel('Channel', fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?
A **box plot** is ideal for comparing the distribution of a numerical variable (`CSAT Score`) across different categories (`channel_name`). It effectively displays the median, quartiles, and range of satisfaction scores for each channel, allowing for easy comparison of both central tendency and variance.

##### 2. What is/are the insight(s) found from the chart?
The key insight is that while all channels have a very high median CSAT score of 5, some channels exhibit more variability. The **'Chat'** and **'Outcall'** channels have a wider interquartile range and show more outliers with lower scores compared to 'Inbound' and 'Social Media', which are more consistently rated at 5.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** Yes. This insight allows the business to focus its quality assurance efforts. They can investigate why the 'Chat' and 'Outcall' channels have less consistent satisfaction scores. By analyzing the interactions with lower scores in these channels, they can identify root causes (e.g., training gaps, technical issues) and implement targeted improvements.
**Negative Growth Insights:** The variability in scores for certain channels could lead to a negative customer perception if not addressed. A customer who has a poor experience on the 'Chat' channel might be less likely to use it again or recommend it, which could hinder the adoption of this cost-effective channel.

#### Chart - 4: CSAT Score vs. Agent Shift (Bivariate)

In [None]:
sns.boxplot(data=df_cleaned, x='CSAT Score', y='Agent Shift', palette='crest')
plt.title('CSAT Score Distribution by Agent Shift', fontsize=16)
plt.xlabel('CSAT Score', fontsize=12)
plt.ylabel('Agent Shift', fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?
Similar to the previous chart, a **box plot** is used to effectively compare the distribution of `CSAT Score` across the different `Agent Shift` categories. This allows for a clear visual comparison of performance between the Morning, Afternoon, and Night shifts.

##### 2. What is/are the insight(s) found from the chart?
The insight is one of consistency. The distribution of CSAT scores is remarkably similar across all three shifts: **Morning, Afternoon, and Night**. All have a median score of 5 and show similar ranges and distributions. This indicates that the quality of customer service does not degrade during different times of the day.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** This is a very positive insight. It confirms that operational standards and agent performance are consistent 24/7. The business can be confident in its service quality regardless of the shift, which simplifies management and quality control.
**Negative Growth Insights:** There are no negative insights here. This finding reinforces the strength of the current operational model.

#### Chart - 5: Correlation Heatmap (Multivariate)

In [None]:
numerical_cols = df_cleaned.select_dtypes(include=np.number)
correlation_matrix = numerical_cols.corr()

plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Heatmap of Numerical Features', fontsize=16)
plt.show()

##### 1. Why did you pick the specific chart?
A **correlation heatmap** is the standard and most effective way to visualize the strength and direction of linear relationships between multiple numerical variables at once. It provides a quick, color-coded overview of which variables are correlated, which is essential for multivariate analysis and for feature selection in machine learning.

##### 2. What is/are the insight(s) found from the chart?
The heatmap reveals that most numerical variables have very weak correlations with each other. The most notable relationship is a **moderate positive correlation of 0.44 between `Item_price` and `connected_handling_time`**. This suggests that inquiries related to more expensive items tend to take longer for agents to handle. The `CSAT Score` shows very weak negative correlations with all other numerical variables, implying that factors like price or handling time have little to no linear relationship with customer satisfaction.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** The insight about `Item_price` and `connected_handling_time` can be very useful. It suggests that agents may need more specialized training or quicker access to information for high-value items to reduce handling time. Optimizing these interactions could improve efficiency. The weak correlation between handling time and CSAT is also positive, as it implies that customers don't necessarily get less satisfied just because a call takes longer, as long as their issue is resolved.
**Negative Growth Insights:** No direct negative insights. The correlation between price and handling time is an area for efficiency improvement, not a direct cause of negative growth.

#### Chart - 6: Support Requests by Category (Univariate)

In [None]:
plt.figure(figsize=(12, 8))
sns.countplot(data=df_cleaned, y='category', order=df_cleaned['category'].value_counts().index, palette='magma')
plt.title('Number of Support Requests by Category', fontsize=16)
plt.xlabel('Count', fontsize=12)
plt.ylabel('Category', fontsize=12)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?
A **horizontal count plot** is used to clearly show the frequency of requests for each category. The horizontal orientation is ideal here because there are several categories with long names, and this format prevents the labels from overlapping.

##### 2. What is/are the insight(s) found from the chart?
The chart reveals that **"Order Related"** issues are, by a large margin, the most common reason for customers to contact support. This is followed by "Product Queries" and "Refund Related" issues. Categories like "Cancellation" and "Offers & Vouchers" are less frequent.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** Absolutely. This insight is highly actionable. The business knows exactly where to focus its efforts to reduce ticket volume. By improving the processes for order tracking, delivery, and returns, they can address the root cause of the majority of customer inquiries, leading to higher efficiency and reduced support costs.

#### Chart - 7: Distribution of Agent Tenure (Univariate)

In [None]:
tenure_order = ['On Job Training', '0-30', '31-60', '61-90', '>90']
plt.figure(figsize=(10, 6))
sns.countplot(data=df_cleaned, x='Tenure Bucket', order=tenure_order, palette='rocket')
plt.title('Distribution of Agent Tenure Buckets', fontsize=16)
plt.xlabel('Agent Tenure (Days)', fontsize=12)
plt.ylabel('Number of Interactions Handled', fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?
A **count plot** is the best choice to show the number of interactions handled by agents in different experience brackets. Ordering the x-axis logically from least to most experienced makes it easy to understand the workforce's experience profile.

##### 2. What is/are the insight(s) found from the chart?
The support team is largely composed of experienced agents, with the **">90"** days bucket handling the highest volume of interactions. There is a healthy distribution across all tenure buckets, indicating a stable team with a consistent pipeline of new agents.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** Yes. This shows that the company has good agent retention, as most of the work is handled by experienced staff. This stability is a positive indicator of a well-managed support team and generally leads to higher quality service.

#### Chart - 8: Distribution of Item Price (Univariate)

In [None]:
plt.figure(figsize=(12, 7))
sns.histplot(df_cleaned['Item_price'], bins=50, kde=True, color='purple')
plt.title('Distribution of Item Prices in Support Inquiries', fontsize=16)
plt.xlabel('Item Price', fontsize=12)
plt.ylabel('Frequency', fontsize=12)
plt.xlim(0, df_cleaned['Item_price'].quantile(0.95)) # Limiting x-axis to 95th percentile for better readability
plt.show()

##### 1. Why did you pick the specific chart?
A **histogram with a Kernel Density Estimate (KDE)** is perfect for understanding the distribution of a continuous numerical variable like `Item_price`. It shows both the frequency of different price ranges and the underlying shape of the distribution. The x-axis is trimmed to the 95th percentile to avoid extreme outliers skewing the view.

##### 2. What is/are the insight(s) found from the chart?
The vast majority of support inquiries are related to **lower-priced items**, with a large concentration of products under â‚¹5,000. Inquiries for high-value items are far less frequent.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** This insight helps in resource and process design. Self-service solutions and automated responses can be effectively targeted at issues related to these high-volume, low-price items. More experienced agents or specialized teams can then be reserved for the less frequent but potentially more complex issues related to high-value products.

#### Chart - 9: CSAT Score vs. Issue Category (Bivariate)

In [None]:
plt.figure(figsize=(12, 8))
sns.boxplot(data=df_cleaned, x='CSAT Score', y='category', palette='viridis')
plt.title('CSAT Score Distribution by Issue Category', fontsize=16)
plt.xlabel('CSAT Score', fontsize=12)
plt.ylabel('Issue Category', fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?
A **box plot** is the ideal choice to compare the distribution of a numerical variable (`CSAT Score`) across multiple categories (`category`). It clearly shows the median, spread, and potential outliers in satisfaction for each type of issue.

##### 2. What is/are the insight(s) found from the chart?
While most categories have a high median CSAT of 5, **"Refund Related"** and **"Cancellation"** issues show slightly more variability and a larger proportion of lower scores (4s and 3s) compared to others like "Product Queries". This suggests customers are less satisfied when dealing with refunds and cancellations.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** This is highly actionable. The business should investigate the refund and cancellation processes. Simplifying these procedures, making them faster, or improving communication during these events could directly improve customer satisfaction in these specific, sensitive areas.

#### Chart - 10: CSAT Score vs. Agent Tenure (Bivariate)

In [None]:
tenure_order = ['On Job Training', '0-30', '31-60', '61-90', '>90']
plt.figure(figsize=(12, 7))
sns.violinplot(data=df_cleaned, x='CSAT Score', y='Tenure Bucket', order=tenure_order, palette='rocket')
plt.title('CSAT Score Distribution by Agent Tenure Bucket', fontsize=16)
plt.xlabel('CSAT Score', fontsize=12)
plt.ylabel('Tenure Bucket', fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?
A **violin plot** was chosen because it combines the features of a box plot with a kernel density plot. This provides a richer understanding of the distribution's shape, showing where the majority of data points lie for each tenure bucket, which is more detailed than a standard box plot.

##### 2. What is/are the insight(s) found from the chart?
The insight is one of remarkable consistency. Agents across **all tenure buckets**, including those in "On Job Training," are achieving overwhelmingly high CSAT scores. The distributions are heavily skewed towards a score of 5, regardless of agent experience.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** This is an excellent insight for the business. It indicates that the agent onboarding and training programs are highly effective. The company can be confident that new agents are performing at a high level very quickly, which is a sign of a scalable and robust support operation.

#### Chart - 11: Handling Time by Issue Category (Bivariate)

In [None]:
handling_time_by_category = df_cleaned.groupby('category')['connected_handling_time'].mean().sort_values(ascending=False)
plt.figure(figsize=(12, 8))
sns.barplot(y=handling_time_by_category.index, x=handling_time_by_category.values, palette='crest')
plt.title('Average Handling Time by Issue Category', fontsize=16)
plt.xlabel('Average Handling Time (Seconds)', fontsize=12)
plt.ylabel('Issue Category', fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?
A **bar plot** is perfect for comparing a single numerical metric (average handling time) across different categories. Sorting the bars from highest to lowest makes it easy to quickly identify which issues take the most and least time to resolve.

##### 2. What is/are the insight(s) found from the chart?
The chart shows that **"Product Queries"** and **"Cancellation"** issues tend to have the longest average handling times. In contrast, "Offers & Vouchers" and "Order Related" issues are resolved much more quickly.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** Yes. This helps diagnose operational inefficiencies. The business can investigate why product queries take so long. Perhaps agents need better knowledge base tools or quicker access to product specialists. By creating streamlined processes or better documentation for these specific categories, they can reduce handling times, improve agent efficiency, and serve more customers.

#### Chart - 12: Item Price vs. Issue Category (Bivariate)

In [None]:
plt.figure(figsize=(12, 8))
sns.boxplot(data=df_cleaned, x='Item_price', y='category', palette='magma')
plt.title('Item Price Distribution by Issue Category', fontsize=16)
plt.xlabel('Item Price', fontsize=12)
plt.ylabel('Issue Category', fontsize=12)
plt.xscale('log') # Use log scale for price due to wide range
plt.show()

##### 1. Why did you pick the specific chart?
A **box plot** is used to compare the distribution of item prices across different issue categories. A logarithmic scale is applied to the x-axis to better handle the wide range and skewness of the price data, making the distributions easier to compare.

##### 2. What is/are the insight(s) found from the chart?
The insight is that inquiries related to **"Refund Related"** and **"Cancellation"** issues tend to involve higher-priced items compared to other categories like "Product Queries" or "Order Related" issues.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** This adds context to previous findings. We know refunds and cancellations have slightly lower CSAT. Now we know they also involve more expensive items. This increases the business risk. The company should prioritize making these processes as smooth as possible, as a poor experience on a high-value item is more likely to lose a valuable customer.

#### Chart - 13: CSAT Score vs. Handling Time (Bivariate)

In [None]:
# Bin handling time to make the relationship with CSAT clearer
df_cleaned['handling_time_bins'] = pd.cut(df_cleaned['connected_handling_time'], bins=5, labels=['Quick', 'Medium', 'Slow', 'Very Slow', 'Longest'])
plt.figure(figsize=(12, 7))
sns.boxplot(data=df_cleaned, x='handling_time_bins', y='CSAT Score', palette='plasma')
plt.title('CSAT Score vs. Binned Handling Time', fontsize=16)
plt.xlabel('Handling Time Category', fontsize=12)
plt.ylabel('CSAT Score', fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?
A simple scatter plot would be too noisy. Instead, the `connected_handling_time` is binned into five categories. A **box plot** is then used to show the distribution of `CSAT Score` for each of these time bins. This method clearly reveals the trend between how long a call takes and how satisfied the customer is.

##### 2. What is/are the insight(s) found from the chart?
There is a very slight trend suggesting that customer satisfaction **dips slightly** for the longest handling times ("Very Slow" and "Longest" bins). However, the median CSAT score remains high at 5 for almost all bins, indicating that customers are generally patient as long as their issue is resolved.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** This is a reassuring insight. It means the business doesn't need to obsessively focus on reducing every second of handling time at the expense of quality. The priority should be on **First Call Resolution**, even if it takes a bit longer. However, it does reinforce the need to make processes for complex issues (which take longer) as efficient as possible to avoid the slight dip in satisfaction.

#### Chart - 14: Category and Sub-Category Breakdown (Bivariate)

In [None]:
# Prepare data for the treemap
category_sub_counts = df_cleaned.groupby(['category', 'Sub-category']).size().reset_index(name='counts')
top_categories = category_sub_counts.nlargest(20, 'counts') # Select top 20 for readability

plt.figure(figsize=(16, 10))
squarify.plot(sizes=top_categories['counts'],
              label=[f'{c}\n({s})\n{n}' for c, s, n in zip(top_categories['category'], top_categories['Sub-category'], top_categories['counts'])],
              alpha=0.8,
              color=sns.color_palette("viridis", len(top_categories)))
plt.title('Treemap of Top 20 Sub-Categories within Categories', fontsize=18)
plt.axis('off')
plt.show()

##### 1. Why did you pick the specific chart?
A **treemap** is an excellent choice for visualizing hierarchical data. It effectively shows the proportion of each `Sub-category` within the broader `category` structure, with the area of each rectangle representing the volume of requests. It's more engaging and informative than a simple stacked bar chart for this purpose.

##### 2. What is/are the insight(s) found from the chart?
The treemap provides a granular view of the most common issues. For example, within the dominant "Order Related" category, the sub-categories **"Order status"** and **"Delivery related"** are the largest contributors. For "Product Queries," the sub-category **"Product quality"** is the most significant.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** This is extremely actionable. Instead of just targeting the broad "Order Related" category, the business can now focus on specific sub-problems. For example, they can improve the automated order tracking system to reduce "Order status" inquiries or work with logistics partners to address "Delivery related" issues. This allows for precise, targeted interventions.

#### Chart - 15: CSAT vs. Price, by Channel (Multivariate)

In [None]:
sns.lmplot(data=df_cleaned, x='Item_price', y='CSAT Score',
           hue='channel_name',
           height=7, aspect=1.5,
           scatter_kws={'alpha':0.1}, # Make points transparent to see density
           x_jitter=True, y_jitter=True)
plt.title('CSAT Score vs. Item Price, by Channel', fontsize=16)
plt.xlabel('Item Price', fontsize=12)
plt.ylabel('CSAT Score', fontsize=12)
plt.xlim(0, df_cleaned['Item_price'].quantile(0.95)) # Limit for readability
plt.show()

##### 1. Why did you pick the specific chart?
A **faceted scatter plot (`lmplot`)** is a powerful multivariate visualization. It allows us to see the relationship between two numerical variables (`CSAT Score` and `Item_price`) while also segmenting the data by a categorical variable (`channel_name`). The regression line helps to visualize the trend for each channel. Jitter and alpha are used to handle overplotting.

##### 2. What is/are the insight(s) found from the chart?
The key insight is that for all channels, the relationship between `Item_price` and `CSAT Score` is essentially flat. The regression lines are all nearly horizontal, indicating that the **price of an item has no discernible impact on the customer's satisfaction** with the support interaction. Satisfaction is high across the board, whether the item is cheap or expensive.

##### 3. Will the gained insights help creating a positive business impact?
**Positive Business Impact:** This is a valuable insight. It means the support team treats all customers with a consistently high level of service, regardless of the value of their purchase. This builds trust and brand loyalty. The business does not need to worry about creating different service tiers based on item price, simplifying their operational strategy.

## **5. Solution to Business Objective**

To achieve the business objective of enhancing customer satisfaction and operational efficiency, the client should implement the following data-driven strategies:

1.  **Focus on High-Volume Categories:** The analysis shows that 'Order Related' issues are the most frequent cause for customer contact. The business should prioritize process improvements in this area. Streamlining order tracking, cancellation, and return procedures could significantly reduce the volume of support tickets, freeing up agent time for more complex issues.

2.  **Optimize and Promote Digital Channels:** While 'Inbound' calls are the most popular channel, digital channels like 'Chat' are more scalable and cost-effective. The business should investigate why 'Chat' has a higher variance in CSAT scores. By implementing targeted quality control and training for the chat team, they can improve consistency and then actively promote the chat channel to customers as a reliable and efficient option.

3.  **Specialize Training for High-Value Items:** The correlation between `Item_price` and `connected_handling_time` suggests that complex inquiries about expensive products take longer. The client should consider creating a specialized support queue or providing advanced training to a dedicated group of agents for handling high-value item inquiries. This would reduce handling time and improve the experience for high-value customers.

4.  **Leverage Consistent Performance:** The fact that service quality is consistent across all shifts is a major strength. This reliability can be used as a key performance indicator (KPI) and a point of pride for the support team, reinforcing the high standards of service.

# **Conclusion**

In conclusion, this Exploratory Data Analysis of the Flipkart Customer Support dataset confirms that the support team is performing at an exceptionally high level, evidenced by the overwhelmingly positive Customer Satisfaction (CSAT) scores. The analysis successfully identified the primary drivers of customer interactions, with 'Order Related' issues being the most significant.

The data indicates that while traditional 'Inbound' calls remain the most-used support channel, there is an opportunity to enhance and promote more efficient digital channels like 'Chat'. Key operational factors such as agent shift and tenure do not negatively impact the consistently high quality of service, pointing to a robust training and management system. The moderate relationship between item price and call duration offers a specific area for targeted efficiency improvements through specialized agent training.

Ultimately, the insights from this EDA provide a clear, data-driven roadmap for the business to not only maintain its high standards of customer satisfaction but also to make strategic improvements that will enhance operational efficiency and further solidify customer loyalty.