# **Project Name**    - FLIPCART EDA PROJECT



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -** ABINETHRI T

# **Project Summary -**


This project focuses on performing Exploratory Data Analysis (EDA) on a Flipkart customer support dataset. The primary goal is to gain insights into customer behavior, support channels, issue resolution times, and customer satisfaction. These insights will be used to identify key areas for improvement in Flipkart's customer support operations.

The dataset provides a comprehensive view of customer interactions with Flipkart's support system. It includes information such as customer demographics, order details, issue categories, resolution times, agent performance, and customer satisfaction scores.

Methodology

The EDA process involves the following key steps:

Data Loading and Cleaning: The initial step involves loading the dataset and performing data cleaning tasks such as handling missing values, correcting data types, and removing duplicates.
Descriptive Analysis: This step involves calculating descriptive statistics to understand the distribution and central tendencies of key variables. This includes measures such as mean, median, standard deviation, and frequency distributions.
Univariate Analysis: This step focuses on exploring individual variables to identify patterns and distributions. Techniques such as histograms, box plots, and bar charts are used to visualize the data.
Bivariate Analysis: This step examines the relationships between pairs of variables. Scatter plots, correlation matrices, and cross-tabulations are used to identify potential associations and dependencies.
Multivariate Analysis: This step explores relationships between multiple variables simultaneously. Techniques such as principal component analysis (PCA) and clustering can be used to identify patterns and group similar data points.
Data Visualization: Throughout the EDA process, visualizations are used to present the findings in a clear and concise manner. Charts and graphs are used to highlight key trends, patterns, and outliers.
Insights and Recommendations

Based on the EDA findings, the project will provide insights and recommendations to improve Flipkart's customer support operations.

This may include:

Identifying key drivers of customer satisfaction and dissatisfaction.
Optimizing support channel utilization and response times.
Improving agent performance and training.
Personalizing customer support interactions.
Developing proactive strategies to prevent customer issues.

# **GitHub Link -**

https://github.com/ABI-THAKSHANA/FLIPCART-PROJECT

# **Problem Statement**


Flipkart, as a leading e-commerce platform, strives to provide exceptional customer support. However, they face challenges in understanding the key factors that influence customer satisfaction and identifying areas for improvement in their support operations. This project aims to address this problem by analyzing customer support data to uncover insights into:

Customer Behavior: Understanding customer interaction patterns with different support channels (e.g., calls, emails, chats), the types of issues they face, and their preferred resolution methods.
Support Channel Performance: Assessing the efficiency and effectiveness of various support channels in terms of response times, resolution rates, and customer satisfaction scores.
Agent Performance: Identifying top-performing agents, areas where agent training can be improved, and strategies to optimize agent utilization.
Customer Satisfaction Drivers: Pinpointing the factors that significantly impact customer satisfaction, including issue resolution time, agent responsiveness, and the overall support experience.
Areas for Improvement: Identifying specific areas within Flipkart's customer support operations that require attention and improvement to enhance customer satisfaction and loyalty.

#### **Define Your Business Objective?**

The primary business objective of this project is to enhance customer satisfaction and loyalty by improving the overall customer support experience. By identifying areas for improvement and implementing targeted solutions,
Flipkart can:

Reduce customer churn and increase retention: By addressing customer pain points and improving the support experience, Flipkart can reduce the likelihood of customers switching to competitors.
Enhance brand reputation and customer trust: Providing exceptional customer support builds a positive brand image and fosters trust among customers.
Drive revenue growth and customer lifetime value: Satisfied customers are more likely to make repeat purchases and recommend Flipkart to others, leading to increased revenue and customer lifetime value.
Optimize support costs: By identifying and eliminating inefficiencies in support operations, Flipkart can potentially reduce costs associated with customer support.
Improve agent performance and morale: By providing insights into agent performance and identifying training needs, Flipkart can empower its support team to deliver better service, leading to increased job satisfaction and reduced agent turnover.
In summary, the project aims to leverage data analysis to optimize Flipkart's customer support operations and achieve its business goals of increased customer satisfaction, improved operational efficiency, and ultimately, a positive impact on business growth and profitability.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno

### Dataset Loading

In [None]:
# Load Dataset
data = pd.read_csv('/content/Copy of Customer_support_data.csv')

### Dataset First View

In [None]:
# Dataset First Look
data.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
data.shape


### Dataset Information

In [None]:
# Dataset Info
data.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
data.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
data.isnull().sum()

In [None]:
# Visualizing the missing values
msno.bar(data)

### What did you know about your dataset?

The dataset appears to be a collection of customer support interactions from the Flipkart e-commerce platform, stored in CSV format. It likely contains a mix of numerical and categorical data, organized into rows representing individual support interactions and columns representing various attributes of those interactions. These attributes probably include details like customer demographics, order information, support channel used, issue category and sub-category, resolution times, agent involvement, and customer satisfaction scores (CSAT).The initial exploration of the data using Python libraries like pandas, numpy, and missingno has revealed potential missing values in some columns and the possibility of duplicate entries, both of which will need to be addressed during data wrangling. Further analysis and visualization will be performed to uncover deeper insights into customer behavior, support channel effectiveness, and areas for improvement in Flipkart's customer support operations

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
data.columns

In [None]:
# Dataset Describe
data.describe()

### Variables Description

The Flipkart customer support dataset likely captures a wide range of information about customer interactions. It includes unique identifiers for each interaction, details about the support channel used, and the nature of the customer's issue, categorized broadly and with sub-categories. Customer remarks provide textual descriptions of the problem. Order-related information like ID, date, and product details are also included. Timestamps record when the issue was reported, responded to, and when the customer provided feedback. Customer location, product price, and agent handling time are captured. The dataset also identifies the support agent, their supervisor and manager, their tenure, and their work shift. Finally, it includes a CSAT score, reflecting the customer's satisfaction with the support received. This rich set of variables allows for a comprehensive analysis of various aspects of Flipkart's customer support operations

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
data.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
import pandas as pd

# Handling Missing Values
data_wrangled['Unique id'].fillna(data_wrangled['Unique id'].mode()[0], inplace=True)
data_wrangled['channel_name'].fillna(data_wrangled['channel_name'].mode()[0], inplace=True)
data_wrangled['category'].fillna(data_wrangled['category'].mode()[0], inplace=True)
data_wrangled['Sub-category'].fillna(data_wrangled['Sub-category'].mode()[0], inplace=True)
data_wrangled['Customer Remarks'].fillna('No remarks', inplace=True)
data_wrangled['Order_id'].fillna(data_wrangled['Order_id'].mode()[0], inplace=True)  # Fill with mode (categorical)
data_wrangled['order_date_time'].fillna(data_wrangled['order_date_time'].mode()[0], inplace=True)  # Fill with mode (assuming datetime)
data_wrangled['Issue_reported at'].fillna(data_wrangled['Issue_reported at'].mode()[0], inplace=True)  # Fill with mode (datetime)
data_wrangled['issue_responded'].fillna(data_wrangled['issue_responded'].mode()[0], inplace=True)  # Fill with mode (datetime)
data_wrangled['Survey_response_Date'].fillna(data_wrangled['Survey_response_Date'].mode()[0], inplace=True)  # Fill with mode (datetime)
data_wrangled['Customer_City'].fillna('Unknown', inplace=True)  # Fill with placeholder (text)
data_wrangled['Product_category'].fillna(data_wrangled['Product_category'].mode()[0], inplace=True)  # Fill with mode (categorical)
data_wrangled['Item_price'].fillna(data_wrangled['Item_price'].mean(), inplace=True)  # Fill with mean (numerical)
data_wrangled['connected_handling_time'].fillna(data_wrangled['connected_handling_time'].mean(), inplace=True)  # Fill with mean (numerical)
data_wrangled['Agent_name'].fillna('Unknown', inplace=True)  # Fill with placeholder (text)
data_wrangled['Supervisor'].fillna('Unknown', inplace=True)  # Fill with placeholder (text)
data_wrangled['Manager'].fillna('Unknown', inplace=True)  # Fill with placeholder (text)
data_wrangled['Tenure Bucket'].fillna('Not Available', inplace=True)  # Fill with placeholder (categorical)
data_wrangled['Agent Shift'].fillna(data_wrangled['Agent Shift'].mode()[0], inplace=True)  # Fill with mode (categorical)
data_wrangled['CSAT Score'].fillna(data_wrangled['CSAT Score'].mean(), inplace=True)  # Fill with mean (numerical)

# Data Type Conversion
data_wrangled['Order_id'] = data_wrangled['Order_id'].astype(str)  # To string (Order_id)
data_wrangled['order_date_time'] = pd.to_datetime(data_wrangled['order_date_time'], format='%d/%m/%Y %H:%M', errors='coerce')  # To datetime, specifying format and handling errors
data_wrangled['Issue_reported at'] = pd.to_datetime(data_wrangled['Issue_reported at'], errors='coerce')  # To datetime, handling errors
data_wrangled['issue_responded'] = pd.to_datetime(data_wrangled['issue_responded'], errors='coerce')  # To datetime, handling errors
data_wrangled['Survey_response_Date'] = pd.to_datetime(data_wrangled['Survey_response_Date'], errors='coerce')  # To datetime, handling errors
data_wrangled['Item_price'] = data_wrangled['Item_price'].astype(float)  # To float
data_wrangled['connected_handling_time'] = data_wrangled['connected_handling_time'].astype(float)  # To float
data_wrangled['CSAT Score'] = data_wrangled['CSAT Score'].astype(float)  # To float

# Feature Engineering
data_wrangled['day_of_week'] = data_wrangled['order_date_time'].dt.dayofweek  # Extract day of week
data_wrangled['order_month'] = data_wrangled['order_date_time'].dt.month  # Extract month from order date
data_wrangled['order_year'] = data_wrangled['order_date_time'].dt.year  # Extract year from order date

# Data Cleaning
data_wrangled['Customer Remarks'] = data_wrangled['Customer Remarks'].str.strip()  # Remove spaces in remarks
data_wrangled['Customer_City'] = data_wrangled['Customer_City'].str.strip()  # Remove spaces in city names
data_wrangled['Agent_name'] = data_wrangled['Agent_name'].str.strip()  # Remove spaces in agent names
data_wrangled['Supervisor'] = data_wrangled['Supervisor'].str.strip()  # Remove spaces in supervisor names
data_wrangled['Manager'] = data_wrangled['Manager'].str.strip()  # Remove spaces in manager names
data_wrangled['Product_category'] = data_wrangled['Product_category'].str.strip()  # Remove spaces in product category names

# Replace values
data_wrangled['Product_category'] = data_wrangled['Product_category'].replace({'OldCategory': 'NewCategory'})

# Remove duplicates
data_wrangled.drop_duplicates(inplace=True)

# Remove any rows where essential columns are completely missing
data_wrangled.dropna(subset=['Order_id', 'order_date_time', 'Issue_reported at'], inplace=True)

# Final DataFrame for analysis
final_data = data_wrangled.copy()

# Save the final DataFrame as a CSV file
final_data.to_csv('final_data.csv', index=False)



### What all manipulations have you done and insights you found?

Several data manipulation tasks are performed on the DataFrame to prepare it for analysis. First, missing values are handled in various ways depending on the column type. Categorical columns like Unique id, channel_name, category, and others are filled with their mode (the most frequent value), while numerical columns like Item_price and connected_handling_time are filled with the mean value. Text fields like Customer Remarks and Customer_City are filled with placeholders such as 'No remarks' or 'Unknown'. Additionally, the Tenure Bucket column is filled with 'Not Available' for missing values. Data type conversion is also applied, where Order_id is converted to a string, and date-related columns are converted to datetime format. Numerical columns such as Item_price, connected_handling_time, and CSAT Score are cast to float type.

Feature engineering is performed by extracting new columns from the order_date_time field, such as the day of the week, month, and year. For data cleaning, whitespace is removed from text columns, and specific values, such as replacing 'OldCategory' with 'NewCategory' in the Product_category column, are updated. Duplicate rows are removed, and rows containing essential missing data (like Order_id, order_date_time, and Issue_reported at) are dropped. Finally, the cleaned and transformed DataFrame is saved as a CSV file for further analysis. These steps ensure the data is structured, consistent, and ready for use in modeling or reporting.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
# @title Item_price

from matplotlib import pyplot as plt
data['Item_price'].plot(kind='hist', bins=20, title='Item_price')
plt.gca().spines[['top', 'right',]].set_visible(False)


##### 1. Why did you pick the specific chart?

A histogram is a suitable choice for visualizing the distribution of a single numerical variable, in this case, 'Item_price'. It helps us understand the frequency of different price ranges within the dataset. By dividing the price range into bins and displaying the count of items falling within each bin, the histogram provides a clear picture of the price distribution. This allows for easy identification of common price points, outliers, and the overall shape of the distribution (e.g., skewed, normal).

##### 2. What is/are the insight(s) found from the chart?

Most items are priced below 20,000: The histogram shows a right-skewed distribution, with the majority of items falling within a lower price range (likely below 20,000). This suggests that Flipkart primarily sells products in a relatively affordable price range.
A small number of high-priced items: A smaller number of items are priced much higher, extending the distribution's tail to the right. This indicates the presence of some premium or luxury products on the platform.
Potential for exploring price segments: The distribution highlights potential price segments within the products sold on Flipkart. Further analysis could be conducted to understand the characteristics and sales patterns associated with these different price ranges.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Targeted Marketing: By understanding the price distribution, Flipkart can tailor its marketing and promotional activities to specific price segments. This allows for more effective targeting of customer groups and potential sales optimization.
Inventory Management: Insights into the most common price ranges can help with inventory management decisions. This could involve stocking up on products within the popular price ranges and managing inventory levels of higher-priced items based on demand.
Pricing Strategies: The price distribution can aid in developing competitive pricing strategies and promotions that align with customer expectations and preferences.
Negative Growth:

Over-reliance on low-priced items: While popular, a strong focus on lower-priced items could lead to lower profit margins. It's essential to balance the sales volume of low-priced products with higher-priced ones for profitability.
Limited appeal to premium segments: Having a smaller proportion of higher-priced items might limit Flipkart's appeal to customers looking for premium products. Depending on the target market, this could hinder growth in specific customer segments.
Demand fluctuations: The skewed distribution suggests that demand for products can vary significantly across price points. Flipkart needs to manage this demand variability to avoid overstocking or stockouts, particularly for high-demand lower-priced items.

Overall, understanding the price distribution is crucial for Flipkart to make informed decisions in various areas, from marketing to inventory management and pricing. By addressing potential risks, such as over-reliance on low-priced items, Flipkart can leverage the gained insights to create a positive business impact and sustain growth.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# @title connected_handling_time

from matplotlib import pyplot as plt
data['connected_handling_time'].plot(kind='hist', bins=20, title='connected_handling_time')
plt.gca().spines[['top', 'right',]].set_visible(False)

##### 1. Why did you pick the specific chart?

Similar to the 'Item_price' analysis, a histogram is chosen to visualize the distribution of 'connected_handling_time' because it's a numerical variable. This chart effectively displays the frequency of different handling time ranges, allowing us to understand the typical duration of customer support interactions and identify any outliers or patterns.

##### 2. What is/are the insight(s) found from the chart?

Most calls are handled within a short time: The histogram likely shows a right-skewed distribution, indicating that most customer support interactions have relatively short handling times. This suggests efficiency in resolving common issues.
Some calls require longer handling times: There's a tail extending to the right, representing interactions with longer handling times. These could be complex issues requiring more attention or cases with multiple touchpoints.
Potential for improvement in handling time: Understanding the distribution of handling times can help identify areas for improvement in agent training, process optimization, or resource allocation to reduce the occurrence of lengthy interactions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Reduced Customer Wait Times: By optimizing processes and agent training to address issues that contribute to longer handling times, Flipkart can reduce customer wait times and improve overall satisfaction.
Improved Agent Efficiency: Identifying and resolving bottlenecks in the support process can lead to increased agent efficiency, allowing them to handle more interactions in less time.
Cost Optimization: More efficient handling of interactions can translate to lower operational costs associated with customer support.

Negative Growth:

Focus on speed over quality: If there's excessive pressure to reduce handling times, it could negatively impact the quality of support provided, leading to customer dissatisfaction.
Ignoring complex issues: Oversimplifying support processes to reduce handling times might lead to neglecting complex customer issues, potentially increasing resolution times and customer frustration in the long run.
Inadequate staffing: If handling times are consistently long, it could indicate a need for more support staff to meet customer demand. Ignoring this could lead to increased wait times and lower customer satisfaction.

Overall, analyzing 'connected_handling_time' is crucial for optimizing support operations. While reducing handling times is desirable, it's important to balance efficiency with the quality of support provided. By carefully addressing potential negative impacts, Flipkart can leverage these insights to improve customer satisfaction and operational effectiveness, contributing to positive business outcomes.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# @title CSAT Score
from matplotlib import pyplot as plt
data['CSAT Score'].plot(kind='hist', bins=20, title='CSAT Score')
plt.gca().spines[['top', 'right',]].set_visible(False)


##### 1. Why did you pick the specific chart?

A histogram is an appropriate choice for visualizing the distribution of the 'CSAT Score' because it's a numerical variable representing customer satisfaction. The histogram displays the frequency of different CSAT score ranges, allowing us to understand the overall customer satisfaction levels and identify areas for potential improvement.

##### 2. What is/are the insight(s) found from the chart?

Most customers are satisfied: The histogram will likely show that a majority of customers provide high CSAT scores (e.g., 4 or 5 on a 5-point scale). This is a positive sign, indicating that Flipkart's customer support is generally meeting customer expectations.
Room for improvement: There will likely be some customers who provide lower CSAT scores. This indicates that there are areas where Flipkart can improve its customer support processes to enhance overall satisfaction.
Distribution shape: The shape of the distribution (e.g., normal, skewed) can reveal further insights. For example, a right-skewed distribution with a long tail towards lower scores would indicate a need to focus on addressing the issues faced by the less-satisfied customers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Identify areas for improvement: By analyzing the distribution of CSAT scores, Flipkart can identify specific areas where customer satisfaction is lower and prioritize improvements in those areas.
Enhance customer experience: Addressing the issues faced by dissatisfied customers can lead to a better overall customer experience and increased loyalty.
Track the impact of changes: Monitoring CSAT scores over time can help Flipkart track the effectiveness of any changes or improvements implemented in the customer support process.

Negative Growth:

Ignoring low CSAT scores: If Flipkart ignores the customers who provide low CSAT scores, it could lead to increased churn and a negative impact on the company's reputation.
Misinterpreting the distribution: It's crucial to analyze the distribution shape and not just focus on the average CSAT score. A high average score could mask underlying issues if there's a significant number of dissatisfied customers.
Overlooking external factors: External factors, such as seasonal trends or economic conditions, can influence customer satisfaction. It's important to consider these factors when interpreting the data.

Overall, analyzing the 'CSAT Score' histogram is essential for understanding customer satisfaction levels and identifying opportunities for improvement. By addressing the concerns of dissatisfied customers and continuously monitoring CSAT scores, Flipkart can leverage these insights to enhance its customer support operations and drive positive business outcomes.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# @title channel_name

from matplotlib import pyplot as plt
import seaborn as sns
data.groupby('channel_name').size().plot(kind='barh', color=sns.palettes.mpl_palette('Dark2'))
plt.gca().spines[['top', 'right',]].set_visible(False)

##### 1. Why did you pick the specific chart?

A horizontal bar chart is a suitable choice for visualizing the distribution of a categorical variable like 'channel_name'. It allows for easy comparison of the frequency or count of each category (in this case, the different support channels used by customers). The horizontal orientation is often preferred when category labels are long, making them easier to read.

##### 2. What is/are the insight(s) found from the chart?

Identify popular support channels: The chart will clearly show which support channels are most frequently used by customers. For example, it might reveal that calls or chats are more popular than emails.
Compare channel usage: We can easily compare the usage of different support channels and understand their relative popularity among customers. This can help in identifying the channels that require more attention and resources.
Potential for channel optimization: The insights gained from the chart can be used to optimize channel utilization. For instance, if a particular channel is underutilized, Flipkart might consider promoting it more or redirecting customers to it for specific types of issues.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Resource Allocation: By understanding channel preferences, Flipkart can allocate resources effectively, ensuring that popular channels are adequately staffed and equipped to handle customer inquiries.
Channel Optimization: Insights into channel usage can help in optimizing the support process. Flipkart might consider promoting less-used channels for specific types of issues or streamlining the process for popular channels to improve efficiency.
Customer Experience: By focusing on the channels preferred by customers, Flipkart can enhance their support experience and make it more convenient for them to seek assistance.
Negative Growth:

Neglecting less popular channels: While focusing on popular channels is important, neglecting less popular ones could lead to dissatisfaction among customers who prefer those channels. It's essential to maintain a balance and ensure that all channels are functional and provide adequate support.
Over-reliance on a single channel: If Flipkart becomes overly reliant on a single channel, it could become vulnerable to disruptions or limitations of that channel. Diversifying support channels can mitigate this risk.
Misinterpreting channel usage: It's important to consider the context and reasons behind channel usage. For example, a high volume of calls might indicate a need for better self-service options on the website to reduce the need for phone support.
Overall, analyzing the 'channel_name' bar chart is essential for understanding customer preferences and optimizing support channel utilization. By addressing potential risks and focusing on customer needs, Flipkart can leverage these insights to enhance its customer support operations and drive positive business outcomes.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
# @title Tenure Bucket

from matplotlib import pyplot as plt
import seaborn as sns
data.groupby('Tenure Bucket').size().plot(kind='barh', color=sns.palettes.mpl_palette('Dark2'))
plt.gca().spines[['top', 'right',]].set_visible(False)

##### 1. Why did you pick the specific chart?

A horizontal bar chart is an effective way to visualize the distribution of a categorical variable like 'Tenure Bucket'. It allows for a clear comparison of the frequencies or counts of each tenure category (e.g., 0-6 months, 6-12 months, etc.). The horizontal orientation makes it easier to read the category labels, especially when they are lengthy or descriptive.

##### 2. What is/are the insight(s) found from the chart?

Distribution of agent tenure: The chart reveals the distribution of agents across different tenure buckets. This helps in understanding the experience levels within the support team. For example,it might find that a large portion of agents have tenure within a specific range (e.g., 1-2 years), indicating a relatively experienced workforce.
Identifying potential training needs: If a significant number of agents fall within lower tenure buckets, it might suggest a need for more training and development programs to equip newer agents with the necessary skills and knowledge.
Workforce planning: The distribution of tenure can assist in workforce planning and recruitment strategies. For example, if there's a shortage of agents in higher tenure buckets, Flipkart might need to focus on retaining experienced agents or recruiting individuals with relevant prior experience.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Targeted training and development: By understanding the distribution of agent tenure, Flipkart can tailor its training and development programs to address the specific needs of agents at different experience levels. This can lead to improved agent performance and overall support quality.
Workforce optimization: Insights into tenure distribution can help in optimizing workforce allocation and scheduling. For instance, Flipkart might consider assigning more complex cases to experienced agents while newer agents handle simpler inquiries.
Improved employee retention: By understanding the tenure patterns, Flipkart can implement strategies to retain experienced agents and reduce turnover. This could involve offering career development opportunities, competitive compensation packages, and a supportive work environment.
Negative Growth:

Lack of experience: If a large proportion of agents are in lower tenure buckets, it could indicate a lack of experience within the support team, potentially impacting the quality and efficiency of support provided.
High turnover: A significant drop in agent numbers in higher tenure buckets could suggest high turnover among experienced agents, which can lead to loss of valuable knowledge and expertise.
Inadequate training for new hires: If training programs are not tailored to the specific needs of new agents, it could lead to slower onboarding and potentially lower performance, affecting customer satisfaction.
Overall, analyzing the 'Tenure Bucket' bar chart is crucial for understanding the composition and experience levels within the support team. By leveraging these insights, Flipkart can implement targeted strategies for training, workforce planning, and employee retention, ultimately enhancing the customer support experience and driving positive business outcomes.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
# @title Agent Shift

from matplotlib import pyplot as plt
import seaborn as sns
data.groupby('Agent Shift').size().plot(kind='barh', color=sns.palettes.mpl_palette('Dark2'))
plt.gca().spines[['top', 'right',]].set_visible(False)

##### 1. Why did you pick the specific chart?

A horizontal bar chart is an effective way to visualize the distribution of a categorical variable like 'Agent Shift'. It allows for a clear comparison of the number of agents working in each shift (e.g., morning, afternoon, night). The horizontal orientation makes it easier to read the category labels, which can be especially helpful if the shift names are lengthy. This chart provides a quick overview of the distribution of agents across different shifts.

##### 2. What is/are the insight(s) found from the chart?

Distribution of agents across shifts: The chart reveals the distribution of agents across different work shifts. This helps understand the allocation of support staff throughout the day. For example, it might show that a larger proportion of agents work during the day shift compared to the night shift.
Identifying potential staffing imbalances: If there's a significant difference in the number of agents across shifts, it could indicate potential staffing imbalances. This information can be used to optimize staffing levels to ensure adequate coverage during peak hours and efficient resource allocation.
Understanding customer support availability: By understanding the distribution of agents across shifts, you can gain insights into the availability of customer support during different times of the day. This can help Flipkart ensure that customers can reach support when they need it most.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Optimized staffing levels: By understanding the distribution of agents across shifts, Flipkart can optimize staffing levels to ensure adequate coverage during peak hours and minimize wait times for customers. This can lead to increased customer satisfaction and improved operational efficiency.
Improved agent scheduling: Insights into agent shift distribution can help in creating more effective agent schedules, taking into account factors such as customer demand and agent preferences. This can lead to better work-life balance for agents and potentially reduce employee turnover.
Enhanced customer support availability: By ensuring sufficient staffing across all shifts, Flipkart can enhance the availability of customer support, making it more convenient for customers to reach support at their preferred times. This can lead to increased customer satisfaction and loyalty.
Negative Growth:

Inadequate staffing during peak hours: If the chart reveals insufficient staffing during peak customer support hours, it could lead to longer wait times and decreased customer satisfaction. This could negatively impact Flipkart's reputation and potentially lead to customer churn.
Uneven workload distribution: If there's a significant imbalance in the number of agents across shifts, it could result in an uneven workload distribution, potentially leading to burnout among agents in busier shifts and underutilization of agents in slower shifts.
Reduced support availability: Insufficient staffing during certain shifts could limit the availability of customer support, making it difficult for customers to reach assistance when they need it. This could negatively impact customer satisfaction and potentially lead to lost sales.
Overall, the 'Agent Shift' distribution chart provides valuable insights for optimizing staffing levels, improving agent scheduling, and enhancing customer support availability. By carefully considering potential negative impacts and proactively addressing staffing imbalances, Flipkart can leverage these insights to create a positive business impact and ensure a seamless customer support experience.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
# @title Item_price vs connected_handling_time

from matplotlib import pyplot as plt
data.plot(kind='scatter', x='Item_price', y='connected_handling_time', s=32, alpha=.8)
plt.gca().spines[['top', 'right',]].set_visible(False)

##### 1. Why did you pick the specific chart?

A scatter plot is chosen to visualize the relationship between 'Item_price' and 'connected_handling_time' because it's suitable for exploring the relationship between two numerical variables. It allows us to see if there's any correlation or pattern between the price of an item and the time it takes to handle a customer support issue related to that item. By plotting each data point as a dot on the graph, we can visually assess if there's a trend, such as higher-priced items taking longer to handle.

##### 2. What is/are the insight(s) found from the chart?

Correlation between price and handling time: The scatter plot might reveal a positive correlation between item price and connected handling time. This means that as the price of an item increases, the time it takes to handle a support issue related to that item also tends to increase.
Outliers: we might observe some outliers in the scatter plot, representing cases where the handling time is unusually high or low for a given item price. These outliers could indicate specific product categories or issues that require more attention.
No clear relationship: If the scatter plot shows a random distribution of points with no discernible pattern, it suggests that there might not be a strong relationship between item price and connected handling time.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Resource allocation: If a positive correlation is found, Flipkart can allocate more experienced or specialized agents to handle support issues related to higher-priced items, potentially reducing handling times and improving customer satisfaction.
Process optimization: Identifying outliers and understanding the reasons behind them can help Flipkart optimize its support processes for specific product categories or issue types. This could involve creating dedicated support teams or developing more efficient workflows.
Pricing strategies: If there's a strong correlation between price and handling time, Flipkart might consider adjusting its pricing strategies to account for the increased support costs associated with higher-priced items.

Negative Growth:

Increased support costs: A positive correlation between price and handling time could lead to increased support costs for higher-priced items. This might affect Flipkart's profitability if not managed effectively.
Customer dissatisfaction: If customers experience longer wait times for support issues related to higher-priced items, it could lead to dissatisfaction and potential churn.
Inefficient resource allocation: If resources are not allocated effectively based on the relationship between price and handling time, it could result in longer wait times for certain customers and potential underutilization of support staff in other areas.
Overall, the scatter plot of 'Item_price' vs 'connected_handling_time' provides valuable insights into the relationship between these two variables. By carefully analyzing the correlation and any outliers, Flipkart can leverage these insights to optimize resource allocation, improve support processes, and potentially adjust pricing strategies to enhance customer satisfaction and drive positive business outcomes.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
# @title connected_handling_time vs CSAT Score

from matplotlib import pyplot as plt
data.plot(kind='scatter', x='connected_handling_time', y='CSAT Score', s=32, alpha=.8)
plt.gca().spines[['top', 'right',]].set_visible(False)

##### 1. Why did you pick the specific chart?

A scatter plot is an appropriate choice for visualizing the relationship between 'connected_handling_time' and 'CSAT Score' because it allows you to explore the correlation between two numerical variables. By plotting each data point as a dot on the graph, you can visually assess if there's a trend or pattern between the handling time of a customer support interaction and the customer's satisfaction level. This helps in understanding if longer handling times are associated with lower CSAT scores or vice versa.

##### 2. What is/are the insight(s) found from the chart?



```
# This is formatted as code
```

Negative Correlation: You might observe a negative correlation between connected handling time and CSAT score. This means that as the handling time increases, the customer satisfaction score tends to decrease. This is intuitive, as customers generally prefer quicker resolutions to their issues.
Outliers: There might be some outliers in the scatter plot, representing cases where customers provided high CSAT scores despite longer handling times or vice versa. These outliers could indicate specific situations where other factors, such as the agent's communication skills or the complexity of the issue, influenced customer satisfaction.
Clusters: You might also observe clusters of data points, suggesting different groups of interactions with distinct handling time and CSAT score patterns. This could indicate different types of issues or customer segments with varying expectations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Reduced handling times: If a negative correlation is found, Flipkart can focus on reducing handling times to potentially improve customer satisfaction. This could involve optimizing support processes, providing better agent training, or implementing self-service options for common issues.
Improved agent performance: By identifying outliers where high CSAT scores were achieved despite longer handling times, Flipkart can learn from these interactions and train agents to replicate those positive experiences.
Targeted interventions: Understanding clusters of interactions with distinct patterns can help Flipkart develop targeted interventions to address specific customer needs or issue types. This could involve creating specialized support teams or tailoring communication strategies.
Negative Growth:

Ignoring customer expectations: If Flipkart fails to address the negative correlation between handling time and CSAT score, it could lead to decreased customer satisfaction and potential churn. Customers might switch to competitors who offer faster and more efficient support.
Overemphasis on speed: While reducing handling time is important, overemphasizing speed at the expense of quality could negatively impact customer satisfaction. Customers might perceive rushed interactions as less helpful or caring.
Misinterpreting outliers: It's crucial to carefully analyze outliers and not simply dismiss them. These cases might reveal valuable insights into specific situations where Flipkart can improve its support processes.

Overall, the scatter plot of 'connected_handling_time' vs 'CSAT Score' provides valuable insights into the relationship between these two key metrics. By carefully analyzing the correlation, outliers, and clusters, Flipkart can leverage these insights to optimize support operations, improve agent performance, and enhance customer satisfaction, ultimately driving positive business outcomes.



#### Chart - 9

In [None]:
# Chart - 9 visualization code
# @title channel_name vs Item_price

from matplotlib import pyplot as plt
import seaborn as sns
figsize = (12, 1.2 * len(data['channel_name'].unique()))
plt.figure(figsize=figsize)
sns.violinplot(data, x='Item_price', y='channel_name', inner='box', palette='Dark2')
sns.despine(top=True, right=True, bottom=True, left=True)

##### 1. Why did you pick the specific chart?

A violin plot is chosen to visualize the relationship between 'channel_name' (categorical) and 'Item_price' (numerical) because it effectively displays the distribution of item prices for each support channel. It combines the features of a box plot and a kernel density plot, showing the range, quartiles, median, and density of the data. This allows for a more detailed understanding of how item prices vary across different support channels compared to a simple box plot.

##### 2. What is/are the insight(s) found from the chart?

Price distribution across channels: The violin plot shows the distribution of item prices for each support channel. You can observe the median, quartiles, and range of prices for each channel, providing insights into the typical price range of items associated with different support channels.
Channel preference for price ranges: You might find that certain support channels are more commonly used for specific price ranges. For example, customers might prefer phone support for higher-priced items, while chat or email might be more common for lower-priced items.
Outliers: The violin plot can also reveal outliers, which are data points that fall significantly outside the typical price range for a particular channel. These outliers might represent unusual cases or potential issues that require further investigation.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Targeted support strategies: By understanding the price distribution and channel preferences for different price ranges, Flipkart can develop targeted support strategies for each channel. This could involve training agents on specific product categories or providing tailored support materials.
Channel optimization: Insights into channel usage for different price ranges can help Flipkart optimize its support channel allocation. For example, if phone support is preferred for higher-priced items, Flipkart can ensure that this channel is adequately staffed and equipped to handle these inquiries efficiently.
Improved customer experience: By aligning support channels with customer preferences for different price ranges, Flipkart can enhance the overall customer experience and make it easier for customers to seek assistance for their specific needs.

Negative Growth:

Increased support costs: If certain support channels are associated with higher-priced items, it could lead to increased support costs for those channels. Flipkart needs to carefully manage resource allocation to ensure cost-effectiveness.
Customer dissatisfaction: If customers experience difficulty reaching the preferred support channel for their price range, it could lead to dissatisfaction and potential churn. Flipkart should ensure that all channels are easily accessible and provide adequate support.
Misinterpreting data: It's essential to avoid oversimplifying the relationship between channel and price. There might be other factors influencing channel preference, such as the complexity of the issue or customer demographics.

Overall, the violin plot of 'channel_name' vs 'Item_price' offers valuable insights into how item prices are distributed across different support channels. By carefully analyzing the distributions and potential trends, Flipkart can leverage these insights to optimize support strategies, improve resource allocation, and enhance the customer experience, ultimately driving positive business outcomes.



#### Chart - 10

In [None]:
# Chart - 10 visualization code
# @title Tenure Bucket vs Agent Shift

from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
plt.subplots(figsize=(8, 8))
df_2dhist = pd.DataFrame({
    x_label: grp['Agent Shift'].value_counts()
    for x_label, grp in data.groupby('Tenure Bucket')
})
sns.heatmap(df_2dhist, cmap='viridis')
plt.xlabel('Tenure Bucket')
_ = plt.ylabel('Agent Shift')

##### 1. Why did you pick the specific chart?

A heatmap is chosen to visualize the relationship between 'Tenure Bucket' and 'Agent Shift' because it effectively displays the distribution of agents across different tenure buckets and shifts using color intensity. This allows for easy identification of patterns and concentrations of agents within specific tenure and shift combinations. It helps to understand the workforce distribution and potential relationships between these two categorical variables

##### 2. What is/are the insight(s) found from the chart?

Agent distribution: The heatmap shows the distribution of agents across different tenure buckets and shifts. Darker colors indicate a higher concentration of agents in a particular combination of tenure and shift. This provides insights into the workforce composition and how agents are allocated across shifts based on their experience levels.
Shift preferences: You might observe that certain tenure buckets are more prevalent in specific shifts. For example, newer agents (lower tenure buckets) might be more concentrated in day shifts, while experienced agents (higher tenure buckets) might be more common in evening or night shifts.
Staffing patterns: The heatmap can reveal staffing patterns, such as whether there's a balance of experienced and newer agents across different shifts or if there are any shifts with a disproportionate number of agents in a particular tenure bucket

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Optimized staffing: By understanding the distribution of agents across tenure buckets and shifts, Flipkart can optimize staffing levels to ensure a balance of experience and expertise across different shifts. This can improve the quality and efficiency of support provided to customers.
Targeted training: The heatmap can help identify areas where targeted training might be needed. For example, if a particular shift has a higher concentration of newer agents, Flipkart can provide them with additional training to enhance their skills and knowledge.
Improved shift scheduling: Insights into shift preferences for different tenure buckets can help Flipkart create more effective shift schedules that align with agent experience and preferences. This can lead to better work-life balance for agents and potentially reduce employee turnover.
Negative Growth:

Imbalance in experience levels: If the heatmap shows a significant imbalance in experience levels across shifts, it could negatively impact customer support quality. For example, a shift with predominantly newer agents might struggle to handle complex issues effectively.
Reduced flexibility: If staffing patterns are too rigid, it might limit Flipkart's ability to adapt to changing customer demands or unexpected events. Some flexibility in shift assignments might be necessary to ensure adequate coverage during peak hours or to address specific customer needs.
Misinterpreting data: It's important to consider other factors that might influence agent distribution, such as agent preferences or operational constraints. The heatmap provides a valuable overview but shouldn't be the sole basis for staffing decisions.

Overall, the heatmap of 'Tenure Bucket' vs 'Agent Shift' provides a valuable visualization of agent distribution and staffing patterns. By carefully analyzing the patterns and potential imbalances, Flipkart can leverage these insights to optimize staffing levels, improve training programs, and enhance shift scheduling, ultimately contributing to a more efficient and effective customer support operation.



#### Chart - 11

In [None]:
# Chart - 11 visualization code
import plotly.express as px

# Create the volcano plot
fig = px.scatter(data,
                 x="Sub-category",  # X-axis: Sub-category
                 y="CSAT Score",     # Y-axis: CSAT Score
                 color="category",   # Color points by category
                 hover_data=["category", "Sub-category", "CSAT Score"], # Show this data on hover
                 title="Volcano Plot of CSAT Score by Category and Sub-category")

fig.update_layout(xaxis_title="Sub-category", yaxis_title="CSAT Score")
fig.show()

##### 1. Why did you pick the specific chart?

A volcano plot is a type of scatter plot that is particularly useful for visualizing the relationship between two variables (in this case, 'Sub-category' and 'CSAT Score') while also highlighting a third variable ('category') through color. It's chosen here because it allows you to:

Compare CSAT scores across sub-categories: The x-axis displays the different sub-categories, and the y-axis shows the corresponding CSAT scores. This allows for a direct comparison of customer satisfaction levels across various sub-categories.
Identify trends within categories: The color of the points represents the category, enabling you to observe if certain categories tend to have higher or lower CSAT scores for specific sub-categories. This helps identify trends and patterns within different categories.
Highlight outliers: The scatter plot nature of the volcano plot makes it easy to spot outliers, which are sub-categories with unusually high or low CSAT scores. These outliers can indicate areas that require further investigation or attention.


##### 2. What is/are the insight(s) found from the chart?

Sub-category performance: The volcano plot will show you which sub-categories have the highest and lowest CSAT scores. This can help Flipkart identify areas where customer satisfaction is particularly strong or needs improvement.
Category influence: By observing the color of the points, you can see if certain categories tend to have higher or lower CSAT scores for specific sub-categories. This can reveal insights into how different categories influence customer satisfaction within their respective sub-categories.
Areas for improvement: Sub-categories with low CSAT scores should be investigated further to understand the underlying reasons for customer dissatisfaction. This can help Flipkart prioritize areas for improvement in its customer support processes.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Targeted improvements: By identifying sub-categories with low CSAT scores, Flipkart can focus its efforts on improving customer support in those areas. This could involve providing additional training to agents, revising support processes, or addressing specific product or service issues.
Category-specific strategies: Understanding how categories influence CSAT scores within their respective sub-categories can help Flipkart develop category-specific support strategies. This could involve tailoring communication approaches, creating specialized support teams, or offering customized solutions.
Enhanced customer experience: By addressing areas of customer dissatisfaction and improving support processes, Flipkart can enhance the overall customer experience, leading to increased customer loyalty and positive word-of-mouth.
Negative Growth:

Ignoring low-performing sub-categories: If Flipkart fails to address the sub-categories with low CSAT scores, it could lead to decreased customer satisfaction and potential churn. Customers might switch to competitors who offer better support in those areas.
Misinterpreting category influence: It's important to avoid oversimplifying the relationship between category and CSAT score. Other factors, such as the complexity of the issue or customer demographics, might also play a role.
Overlooking outliers: Outliers can provide valuable insights into specific situations where Flipkart can improve its support processes. Ignoring them could miss opportunities for enhancing customer satisfaction.
Overall, the volcano plot of 'Sub-category', 'CSAT Score', and 'category' provides a comprehensive view of customer satisfaction across different sub-categories and categories. By carefully analyzing the trends, outliers, and areas for improvement, Flipkart can leverage these insights to optimize support strategies, enhance the customer experience, and drive positive business outcomes.



#### Chart - 12

In [None]:
!pip install vegafusion
!pip install "vl-convert-python>=1.6.0"

import altair as alt

# Enable VegaFusion data transformer
alt.data_transformers.enable("vegafusion")

# Create the histogram
alt.Chart(data).mark_bar().encode(
    alt.X("CSAT Score:Q", bin=alt.Bin(maxbins=5)),  # Use quantitative scale for CSAT Score
    alt.Y('count()', stack=None),
    tooltip=['CSAT Score', 'count()']
).properties(title="Distribution of CSAT Scores")


# Create the line plot for CSAT Score vs Issues
alt.Chart(data).mark_line().encode(
    x='CSAT Score',
    y='count()',  # Assuming you want to count the number of issues per CSAT Score
).properties(title="CSAT Score vs. Number of Issues")

##### 1. Why did you pick the specific chart?

This combination of charts is chosen because it provides a comprehensive view of CSAT Scores and their relationship with the number of issues. Here's why each chart is selected:

Histogram for CSAT Score Distribution: A histogram is an effective way to visualize the distribution of a single numerical variable, in this case, 'CSAT Score'. It helps us understand the frequency of different CSAT score ranges within the dataset, allowing for easy identification of common scores, outliers, and the overall shape of the distribution (e.g., skewed, normal).
Line Plot for CSAT Score vs. Issues: A line plot is suitable for visualizing the relationship between two numerical variables, in this case, 'CSAT Score' and the number of issues. It helps us understand if there is a trend or correlation between these variables. For example, it might reveal whether higher CSAT scores are associated with fewer issues or vice versa.


##### 2. What is/are the insight(s) found from the chart?

CSAT Score Distribution: The histogram will show the frequency of different CSAT score ranges. It might reveal a right-skewed distribution, indicating that most customers provide higher CSAT scores (e.g., 4 or 5 on a 5-point scale), suggesting overall satisfaction.
Potential for Improvement: The histogram can also highlight areas where customer satisfaction is lower. For example, if there's a significant number of customers providing lower scores (e.g., 1 or 2), it indicates room for improvement.

Line Plot:

CSAT Score vs. Issues: The line plot might reveal a negative correlation between CSAT score and the number of issues. This means that as CSAT scores increase, the number of issues tends to decrease, suggesting that satisfied customers are less likely to encounter problems.
Trend Identification: The line plot can help identify any trends or patterns in the relationship between CSAT score and issues. For example, it might show that certain CSAT score ranges are associated with a higher or lower number of issues.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Targeted Improvements: By identifying CSAT score ranges associated with a higher number of issues, Flipkart can focus on improving support in those areas. This could involve addressing specific product or service problems, providing additional training to agents, or streamlining support processes.
Enhanced Customer Experience: By understanding the overall CSAT score distribution and identifying areas for improvement, Flipkart can enhance the overall customer experience, leading to increased customer loyalty and positive word-of-mouth.
Proactive Issue Prevention: The line plot can help Flipkart identify potential issues before they escalate by observing trends in CSAT score changes. For example, if CSAT scores for a particular product start to decline, Flipkart can proactively investigate and address potential problems.

Negative Growth:

Ignoring Lower CSAT Scores: If Flipkart fails to address areas where customer satisfaction is lower, it could lead to decreased customer loyalty and potential churn. Customers might switch to competitors who offer better support experiences.
Misinterpreting Correlation: It's crucial to carefully analyze the relationship between CSAT score and issues. While a negative correlation is generally expected, other factors might influence the number of issues, such as product complexity or seasonal trends.
Overlooking Outliers: Outliers in the histogram or line plot might represent specific situations where Flipkart can improve its support processes. Ignoring them could miss opportunities for enhancing customer satisfaction.
Overall, the combination of a histogram and a line plot provides a valuable view of CSAT score distribution and its relationship with the number of issues. By carefully analyzing these charts, Flipkart can leverage the insights to prioritize improvements, enhance the customer experience, and drive positive business outcomes.



#### Chart - 13

In [None]:
# Plot categorical distributions
for col in categorical_cols:
    plt.figure(figsize=(20, 10))
    sns.countplot(x=col, data=data, hue='CSAT Score', palette='pastel')
    plt.title(f'Distribution of {col} by CSAT Score')
    plt.xticks(rotation=45, ha='right')
    plt.show()



##### 1. Why did you pick the specific chart?

Count plots (or bar plots) are chosen to visualize the categorical distributions because they effectively display the frequency of each category within a categorical variable, broken down by CSAT Score. This allows for a clear comparison of how different categories within a variable are associated with customer satisfaction levels. The use of color (hue) further enhances the visualization by differentiating CSAT scores within each category. This helps to quickly identify patterns and trends in customer satisfaction across different categories of various variables.



##### 2. What is/are the insight(s) found from the chart?

By examining the count plots for each categorical column,
Category-wise CSAT Distribution: The plots show how CSAT scores are distributed across different categories within each variable. For example, you might observe that certain categories have a higher proportion of satisfied customers (higher CSAT scores) compared to others.
Identifying Problematic Areas: Categories with a higher concentration of lower CSAT scores might indicate areas where customer satisfaction is a concern. These could be specific product categories, support channels, or issue types that need further investigation.
Understanding Customer Preferences: The plots can reveal customer preferences or patterns in their interactions. For example, you might find that certain support channels are more commonly used by satisfied customers, while others are associated with lower satisfaction levels.
Uncovering Trends: By comparing the distributions across different categorical columns, you might uncover underlying trends or relationships. For example, you might find that specific product categories tend to have lower CSAT scores regardless of the support channel used, suggesting a potential product-related issue.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Targeted Improvements: By identifying categories with lower CSAT scores, Flipkart can focus on improving support in those specific areas. This could involve addressing product issues, enhancing agent training for specific categories, or optimizing support processes for certain channels.
Personalized Support: Understanding customer preferences and patterns revealed in the plots can help Flipkart personalize its support approach. For example, if certain customer segments tend to prefer a particular support channel, Flipkart can tailor its communication and support offerings accordingly.
Proactive Issue Prevention: The insights gained from the plots can help Flipkart identify potential issues before they escalate. By monitoring CSAT score distributions across categories, Flipkart can proactively address areas of concern and prevent negative customer experiences.

Negative Growth:

Ignoring Low-Performing Categories: Failing to address categories with lower CSAT scores could lead to decreased customer satisfaction and potential churn. Customers might switch to competitors who offer better support in those areas.
Misinterpreting Data: It's crucial to carefully analyze the plots and consider other factors that might influence CSAT scores, such as seasonal trends or external factors.
Overlooking Context: It's important to interpret the plots in the context of the specific categorical variable being analyzed. What might be a concern for one variable might be less significant for another.
Overall, the use of multiple count plots to visualize categorical distributions by CSAT Score provides valuable insights into customer satisfaction across various aspects of Flipkart's support operations. By carefully analyzing the plots, identifying trends, and taking action on areas for improvement, Flipkart can leverage these insights to enhance the customer experience and drive positive business outcomes.



#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# Select numerical columns for correlation analysis
numerical_cols = data.select_dtypes(include=['number'])

# Calculate the correlation matrix
correlation_matrix = numerical_cols.corr()

# Create a heatmap
plt.figure(figsize=(12, 10))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?


A correlation heatmap is chosen to visualize the relationships between numerical columns because it effectively displays the correlation coefficients between all pairs of numerical variables in a dataset using color intensity. This allows for quick identification of:

Strong positive correlations: Indicated by dark red or warm colors, suggesting that as one variable increases, the other tends to increase as well.
Strong negative correlations: Indicated by dark blue or cool colors, suggesting that as one variable increases, the other tends to decrease.
Weak or no correlations: Indicated by lighter colors or white, suggesting little or no relationship between the variables.
The heatmap provides a comprehensive overview of the relationships between multiple numerical variables, making it easier to identify patterns and potential dependencies.



##### 2. What is/are the insight(s) found from the chart?

Relationship strengths: The color intensity and numerical values in the heatmap reveal the strength and direction of correlations between variables. For example, you might observe a strong positive correlation between 'Item_price' and 'connected_handling_time', suggesting that higher-priced items tend to require longer handling times.
Potential dependencies: Strong correlations might indicate dependencies between variables, suggesting that changes in one variable could influence the other. This information can be valuable for understanding the factors that impact key metrics like CSAT Score.
Redundant variables: If two variables have a very strong positive or negative correlation, it might indicate redundancy, meaning they provide similar information. This can be helpful for feature selection in modeling or to identify potential areas for data simplification.


#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
numerical_cols = ['Item_price', 'connected_handling_time', 'CSAT Score']
categorical_cols = ['channel_name', 'category', 'Product_category', 'Agent Shift', 'Tenure Bucket']
# Create a pair plot with customization
g = sns.pairplot(data, vars=numerical_cols, hue='CSAT Score', palette='viridis', diag_kind='kde')

# Customize the plot further :
plt.suptitle("Pair Plot of Numerical Variables with CSAT Score", y=1.02)
for ax in g.axes.flat:
    for tick in ax.get_xticklabels():
        tick.set_rotation(45)
    for tick in ax.get_yticklabels():
        tick.set_rotation(45)

plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A pair plot is chosen to visualize the relationships between numerical variables and CSAT Score because it provides a comprehensive overview of the relationships between multiple numerical variables in a dataset, along with their relationship to a categorical variable (CSAT Score in this case). It creates a matrix of scatter plots, where each scatter plot shows the relationship between two numerical variables. The diagonal of the matrix shows the distribution of each individual variable.



Exploring multiple relationships: It allows you to quickly explore the relationships between all pairs of numerical variables, providing insights into potential correlations, patterns, and clusters.
Visualizing CSAT Score influence: By using CSAT Score as the hue (color), you can see how customer satisfaction levels are distributed across different values of the numerical variables. This helps identify trends and patterns related to CSAT Score.
Identifying outliers: The scatter plots in the pair plot can reveal outliers, which are data points that fall significantly outside the typical range of values for a pair of variables. These outliers can indicate potential issues or unusual cases that require further investigation.


##### 2. What is/are the insight(s) found from the chart?

Correlations: The scatter plots show the relationships between pairs of numerical variables. You can observe positive correlations (variables tend to increase or decrease together), negative correlations (one variable tends to increase as the other decreases), or no correlation (no clear relationship). For example, you might find a positive correlation between 'Item_price' and 'connected_handling_time', suggesting that higher-priced items tend to require longer handling times.
CSAT Score patterns: By observing the color distribution (hue) based on CSAT Score, you can see if certain ranges of numerical variables are associated with higher or lower customer satisfaction levels. For example, you might find that lower 'connected_handling_time' is associated with higher CSAT Scores, indicating that faster resolution times lead to greater customer satisfaction.
Clusters or groupings: The scatter plots might reveal clusters or groupings of data points, suggesting potential segments or patterns within the data. For example, you might observe a cluster of customers with high 'Item_price' and high 'CSAT Score', indicating that customers who purchase expensive items tend to be more satisfied.
Distribution of individual variables: The diagonal plots show the distribution of each individual numerical variable, providing insights into their range, central tendency, and potential outliers.
Overall, the pair plot provides a comprehensive visual overview of the relationships between numerical variables and CSAT Score. By carefully analyzing the scatter plots, color distributions, and distributions of individual variables, you can gain valuable insights into factors that influence customer satisfaction, potential areas for improvement, and potential customer segments with varying needs and preferences. I

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.



```
# This is formatted as code
```

To enhance customer satisfaction and loyalty, Flipkart should focus on optimizing support channel utilization by prioritizing popular channels, reducing handling times, and offering self-service options. Simultaneously, agent performance can be improved through targeted training programs, performance monitoring, and empowering agents with the necessary resources and authority. Personalizing support interactions is crucial and can be achieved through customer segmentation, proactive support, and continuous feedback collection. Furthermore, Flipkart should develop proactive strategies to prevent customer issues by conducting root cause analysis, improving products based on feedback, and enhancing the knowledge base. Finally, continuous monitoring of key metrics, data-driven decision making, and an iterative approach to improvement are essential for ongoing adaptation and optimization of the customer support experience. By implementing these strategies, Flipkart can cultivate a customer-centric support system that fosters loyalty, enhances brand reputation, and drives business growth.



# **Conclusion**

In conclusion, this analysis of Flipkart's customer support data has revealed valuable insights into customer behavior, channel performance, agent effectiveness, and satisfaction drivers. By leveraging these insights through targeted improvements in channel optimization, agent training, personalized support, and proactive issue prevention, Flipkart can significantly enhance customer satisfaction and loyalty. Continuous monitoring of key metrics and an iterative approach to improvement are essential to ensure ongoing adaptation to evolving customer needs and business goals. By implementing the recommendations outlined in this analysis, Flipkart can cultivate a customer-centric support system that strengthens its brand reputation, fosters customer retention, and ultimately drives sustainable business growth and profitability. Investing in customer support is not merely an operational expense but a strategic investment in Flipkart's long-term success.



### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***