<a href="https://colab.research.google.com/github/Abhishekkaithwas/Global-Terrorism-dataset-EDA-Project/blob/main/Myntra_Online_Retail_Customer_Segmentation_EDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  Myntra Online Retail Customer Segmentation



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

Myntra, a leading Indian online fashion retailer, employs customer segmentation to enhance its marketing strategies, improve customer experience, and drive sales. Customer segmentation involves dividing Myntra's diverse customer base into distinct groups based on shared characteristics such as demographics, purchasing behavior, preferences, and engagement patterns. This approach allows Myntra to tailor its offerings, promotions, and communication to meet the specific needs of each segment.

Key segmentation criteria used by Myntra include:

**Demographic Segmentation**: Grouping customers by age, gender, income, and location. For example, Myntra targets young adults with trendy fashion and working professionals with premium brands.

**Behavioral Segmentation**: Analyzing purchasing habits, frequency of visits, average order value, and product preferences. Myntra identifies high-value customers, frequent shoppers, and those who respond to discounts or seasonal sales.

**Psychographic Segmentation**: Understanding lifestyle, interests, and fashion preferences. Myntra caters to fitness enthusiasts with activewear, eco-conscious buyers with sustainable fashion, and luxury seekers with high-end brands.

**Technographic Segmentation**: Segmenting based on the devices

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


Myntra, a leading online fashion retailer in India, faces the challenge of effectively understanding and catering to its diverse and rapidly growing customer base. With millions of users exhibiting varying preferences, purchasing behaviors, and engagement patterns, a one-size-fits-all approach to marketing, product recommendations, and customer engagement is no longer sufficient. The lack of a robust customer segmentation strategy leads to suboptimal targeting, missed sales opportunities, and reduced customer satisfaction.

To address this, Myntra needs to implement a data-driven customer segmentation framework that categorizes its users into distinct groups based on demographics, behavioral patterns, psychographics, and technographics. This will enable the company to deliver personalized experiences, optimize marketing campaigns, and enhance customer retention. The primary challenge lies in accurately analyzing vast amounts of customer data, identifying meaningful segments, and translating these insights into actionable strategies that drive business growth and improve customer loyalty.

The problem, therefore, is to develop and implement an effective customer segmentation model that empowers Myntra to better understand its customers, tailor its offerings, and maintain its competitive edge in the highly dynamic e-commerce landscape.

#### **Define Your Business Objective?**

The primary purpose of analyzing this dataset is to extract valuable insights to enhance Myntra Gifts Ltd.'s business strategies. Specific goals include:

1. Identifying Purchasing Trends:

Understanding patterns in customer purchases over time, including seasonal trends and product preferences, to better align inventory and marketing strategies.

2. Evaluating Product Performance:

Assessing which products are most and least popular to optimize product offerings and make informed decisions about stock management and new product introductions.

3. Understanding Customer Behavior:

Analyzing customer buying habits, frequency of purchases, and geographic distribution to tailor marketing efforts and improve customer segmentation.

4. Optimizing Pricing Strategies:

Evaluating the relationship between unit prices and sales volume to refine pricing models and maximize revenue while ensuring competitive pricing.

5. Streamlining Inventory Management:

Using sales and demand data to enhance inventory planning, reduce instances of overstock and stockouts, and improve overall inventory efficiency.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression

### Dataset Loading

In [None]:
# Load Dataset
file_path = '/content/Myntra Copy of Online Retail.xlsx'
df = pd.read_excel(file_path)
print(df)

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
num_rows, num_columns = df.shape
print(f"Number of rows:{num_rows}")
print(f"Number of columns:{num_columns}")

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_count = df.duplicated().sum()
print(f"Number of duplicate value:{duplicate_count}")


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values = df.isnull().sum()
print(f"Missing/null values count:{missing_values}")

In [None]:
# Visualizing the missing values
plt.figure(figsize=(10, 6))
sns.heatmap(df.isnull(), cmap='viridis', cbar=False)
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?

We conclude that the dataset have most missing values in column CustomerID and after that column Description is having missing values.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
columns = df.columns
print(f"Dataset columns:{columns}")

In [None]:
# Dataset Describe
df.describe()

### Variables Description

Variables and Their Statistics:


***Quantity***:

count: 541,909.000000

Total number of non-null entries for Quantity.

mean: 9.552250

The average quantity of items purchased per transaction is 9.55.

min: -80,995.000000

The minimum quantity is -80,995, which is unusual (likely indicates returns or cancellations).

25%: 1.000000

25% of transactions have a quantity of 1 or less.

50% (median): 3.000000

The median quantity is 3, meaning half of the transactions have a quantity of 3 or less.

75%: 10.000000

75% of transactions have a quantity of 10 or less.

max: 80,995.000000

The maximum quantity is 80,995, which is unusually high (likely bulk purchases or errors).

std: 218.081158

The standard deviation is 218.08, indicating high variability in the quantity of items purchased.



***InvoiceDate***:

count: 541,909

Total number of non-null entries for InvoiceDate.

mean: 2011-07-04 13:34:57.156386048

The average date and time of transactions is July 4, 2011, at 13:34:57.

min: 2010-12-01 08:26:00

The earliest transaction occurred on December 1, 2010, at 08:26:00.

25%: 2011-03-28 11:34:00

25% of transactions occurred on or before March 28, 2011, at 11:34:00.

50% (median): 2011-07-19 17:17:00

The median transaction date is July 19, 2011, at 17:17:00.

75%: 2011-10-19 11:27:00

75% of transactions occurred on or before October 19, 2011, at 11:27:00.

max: 2011-12-09 12:50:00

The latest transaction occurred on December 9, 2011, at 12:50:00.

std: NaN

Standard deviation is not applicable for datetime data.



***UnitPrice***:

count: 541,909.000000

Total number of non-null entries for UnitPrice.

mean: 4.611114

The average unit price of items is $4.61.

min: -11,062.060000

The minimum unit price is -$11,062.06, which is unusual (likely indicates refunds or errors).

25%: 1.250000

25% of items have a unit price of $1.25 or less.

50% (median): 2.080000

The median unit price is **
2.08
∗
∗
,
m
e
a
n
i
n
g
h
a
l
f
o
f
t
h
e
i
t
e
m
s
h
a
v
e
a
u
n
i
t
p
r
i
c
e
o
f
2.08∗∗,meaninghalfoftheitemshaveaunitpriceof2.08 or less.

75%: 4.130000

75% of items have a unit price of $4.13 or less.

max: 38,970.000000

The maximum unit price is $38,970.00, which is unusually high (likely premium or special items).

std: 96.759853

The standard deviation is 96.76, indicating high variability in unit prices.



***CustomerID***:

count: 406,829.000000

Total number of non-null entries for CustomerID (missing values exist).

mean: 15,287.690570

The average customer ID is 15,287.69 (not meaningful, as IDs are categorical).

min: 12,346.000000

The smallest customer ID is 12,346.

25%: 13,953.000000

25% of customer IDs are 13,953 or less.

50% (median): 15,152.000000

The median customer ID is 15,152.

75%: 16,791.000000

75% of customer IDs are 16,791 or less.

max: 18,287.000000

The largest customer ID is 18,287.

std: 1,713.600303

The standard deviation is 1,713.60, indicating variability in customer IDs (not meaningful for categorical data).

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_values = df.nunique()
print(f"Unique values for each variable:{unique_values}")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Drop rows with missing values in column CustomerID
df_cleaned = df.dropna(subset=['CustomerID'])

In [None]:
#Fill missing values with default value(0)
df['CustomerID']=df['CustomerID'].fillna(0)

In [None]:
#Fill missing values with a placeholder in column description
df['Description']=df['Description'].fillna('Unknown')

In [None]:
#Check and remove duplicate rows
df=df.drop_duplicates()

In [None]:
#Filter negative quantities and prices in column Quantity and UnitPrice
df=df[(df['Quantity']>=0) & (df['UnitPrice']>=0)]

In [None]:
#Filter values more than 10,000
df=df[(df['Quantity']<=10000) & (df['UnitPrice']<=10000)]

In [None]:
#Converting CustomerID to integer
df['CustomerID']=df['CustomerID'].astype(int)

In [None]:
#Extract date and time components for invoice date
df['InvoiceYear']=df['InvoiceDate'].dt.year
df['InvoiceMonth']=df['InvoiceDate'].dt.month
df['InvoiceDay']=df['InvoiceDate'].dt.day
df['InvoiceHour']=df['InvoiceDate'].dt.hour

In [None]:
#create a total price column
df['TotalPrice']=df['Quantity']*df['UnitPrice']

In [None]:
print(df)

In [None]:
#Grouping and aggregating data
#Total sales by country
sales_by_country=df.groupby('Country')['TotalPrice'].sum().reset_index()
print(sales_by_country)

In [None]:
#avg quantity purchased per customer
avg_quantity_per_customer=df.groupby('CustomerID')['Quantity'].mean().reset_index()
print(avg_quantity_per_customer)

In [None]:
#NUmber of transactions per customer
transactions_per_customer=df.groupby('CustomerID')['InvoiceNo'].nunique().reset_index()
print(transactions_per_customer)

In [None]:
#Split data by year
df_2010=df[df['InvoiceYear'] == 2010]
df_2011=df[df['InvoiceYear'] == 2011]

In [None]:
#Handling text data
#Remove special characters
df['Description']=df['Description'].str.replace(r'[^\w\s]', '', regex=True)


### What all manipulations have you done and insights you found?

All the following manipulations have been done to the data:-

1.Drop rows with missing values in column CustomerID.

2.Fill missing values with default value(0).

3.Fill missing values with a placeholder in column description.

4.Check and remove duplicate rows.

5.Filter negative quantities and prices in column Quantity and UnitPrice.

6.Filter values more than 10,000.

7.Converting CustomerID to integer.

8.Extract date and time components for invoice date.

9.create a total price column.


**Grouping and aggregating data**
10.Total sales by country.

11.Avg quantity purchased per customer.

12.Number of transactions per customer.

13.Split data by year.

**Handling text data**

14.Remove special characters.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
# Line chart which shows daily , monthly or yearly sales trends
df.set_index('InvoiceDate')['TotalPrice'].resample('M').sum().plot(kind='line', title='Monthly Sales Trend')

##### 1. Why did you pick the specific chart?

Line chart: To analyze trends and patterns over time.

##### 2. What is/are the insight(s) found from the chart?

November month has most sales compared to other months.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the following insight will create a positive business impact because sales has started increasing from August month and declined in December month.
Over the period of these four months customer is interested in buying goods as compared to other months so this gives an Idea to the company that what kind of products they are selling in these months and they should provide same kind og goods throughout the year.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# Area chart
df.set_index('InvoiceDate')['TotalPrice'].cumsum().plot(kind='area', title = 'Cumulative sales over time')

##### 1. Why did you pick the specific chart?

Area chart: To visualize cumulative sales over time.

##### 2. What is/are the insight(s) found from the chart?

Cumulative sales are increasing consecutively every month.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive business impact.

1. Identifies growth trends

2. Seasonal patterns.

3. Impact of business decisions.

4. Forecasting future sales.



Negative Growth Insights.

1. Declining Sales Trend

2. Sudden Drops in Sales

3. Ineffective Campaigns or Strategies

4. Stagnation



#### Chart - 3

In [None]:
# Chart - 3 visualization code
df.groupby('Country')['TotalPrice'].sum().plot(kind='bar', title='Total sales by country')

##### 1. Why did you pick the specific chart?

Bar chart: To compare total sales by country.

##### 2. What is/are the insight(s) found from the chart?

Top-Performing Countries:

The chart will highlight which countries contribute the most to total sales.

Insight: Identifying top-performing countries helps the business:

Focus marketing efforts and resources on high-revenue regions.

Understand regional preferences and tailor products/services accordingly.

Underperforming Countries:

The chart will also show countries with low sales.

Insight: Underperforming countries may indicate:

Lack of market penetration.

Cultural or regulatory barriers.

Ineffective marketing or distribution strategies.

Action: The business can:

Investigate the reasons for low sales.

Develop targeted campaigns to boost sales in these regions.

Market Potential:

Comparing sales across countries can reveal untapped markets.

Insight: Countries with moderate sales but high growth potential can be prioritized for expansion.

Regional Preferences:

Differences in sales across countries may reflect regional preferences.

Insight: The business can:

Customize product offerings to match local tastes.

Adjust pricing strategies based on purchasing power.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from a Bar Chart comparing total sales by country or product category can significantly contribute to creating a positive business impact. Here's how:

1. Focus on High-Performing Markets or Products:
Insight: Identifying top-performing countries or product categories.

Positive Impact:

The business can allocate more resources (e.g., marketing, inventory, staffing) to these areas to maximize revenue.

For example, if the UK generates 80% of sales, the business can focus on expanding its presence there or launching new products tailored to UK customers.

2. Improve Underperforming Markets or Products:
Insight: Identifying countries or product categories with low sales.

Positive Impact:

The business can investigate the reasons for underperformance (e.g., lack of awareness, poor product-market fit) and take corrective actions.

For example, if "Men's Shoes" contribute only 5% of sales, the business can revamp the product line, improve marketing, or offer discounts to boost sales.

3. Optimize Product Mix:
Insight: Understanding the contribution of each product category to total sales.

Positive Impact:

The business can diversify its product portfolio to reduce reliance on a single category.

For example, if "Women's Dresses" account for 40% of sales, the business can introduce complementary products like accessories or footwear to increase revenue.

4. Regional Customization:
Insight: Recognizing regional preferences and purchasing behavior.

Positive Impact:

The business can tailor its offerings to match local tastes and preferences.

For example, if customers in Germany prefer eco-friendly products, the business can introduce sustainable product lines in that region.

5. Strategic Decision-Making:
Insight: Identifying trends and patterns in sales data.

Positive Impact:

The business can make informed decisions about inventory management, pricing strategies, and marketing campaigns.

For example, if sales peak during the holiday season, the business can plan promotions and stock up on inventory in advance.

Are There Any Insights That Lead to Negative Growth?
Yes, some insights from the bar chart could indicate potential negative growth or risks to the business. Here are a few examples, along with justifications:

1. Over-Reliance on a Single Market or Product:
Insight: A significant portion of sales comes from one country or product category (e.g., 80% of sales from the UK or 40% from "Women's Dresses").

Negative Impact:

This creates a high dependency risk. If the UK market faces an economic downturn or if "Women's Dresses" fall out of fashion, the business could experience a significant revenue drop.

Justification:

Lack of diversification makes the business vulnerable to external shocks.

2. Underperforming Markets or Products:
Insight: Certain countries or product categories contribute very little to total sales (e.g., Germany contributes only 10% of sales, or "Men's Shoes" contribute only 5%).

Negative Impact:

This indicates missed opportunities or ineffective strategies in these areas.

Justification:

If the business fails to address underperformance, it may lose market share to competitors or waste resources on unprofitable products.

3. Declining Sales in Key Markets or Categories:
Insight: Sales in a previously high-performing country or product category are declining over time.

Negative Impact:

This could signal changing customer preferences, increased competition, or market saturation.

Justification:

If the business does not adapt to these changes, it may experience sustained negative growth.

4. Lack of Market Penetration:
Insight: Sales in certain countries are negligible or nonexistent.

Negative Impact:

This indicates untapped potential or barriers to entry (e.g., regulatory challenges, lack of brand awareness).

Justification:

If the business does not address these barriers, it may miss out on growth opportunities in emerging markets.

5. Inefficient Resource Allocation:
Insight: Resources are disproportionately allocated to low-performing markets or products.

Negative Impact:

This leads to wasted resources and reduced profitability.

Justification:

For example, if the business spends heavily on marketing "Men's Shoes" (which contribute only 5% of sales), it may not achieve a positive return on investment.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
df.groupby(['InvoiceMonth', 'Country'])['TotalPrice'].sum().unstack().plot(kind='bar', stacked=True, title='Monthly sales by country')

##### 1. Why did you pick the specific chart?

Stacked bar chart: Shows sales breakdown by country over time.

##### 2. What is/are the insight(s) found from the chart?

Regional Sales Trends:

The chart shows how sales in different countries change over time.

Insight: Identifying which regions are growing, declining, or remaining stable.

For example, if sales in the UK are growing while sales in Germany are declining, the business can focus on strengthening its presence in the UK and addressing issues in Germany.

Market Potential:

The chart can reveal untapped potential in specific regions.

Insight: If a country shows consistent growth, it may indicate a lucrative market for expansion.

For example, if sales in France are steadily increasing, the business can invest more in marketing and distribution there.

Regional Preferences:

The chart can highlight differences in regional preferences.

Insight: Certain products may perform better in specific countries.

For example, if "Winter Coats" sell well in colder regions, the business can tailor its offerings to match local preferences.

Impact of Economic or Cultural Factors:

The chart can reveal the impact of external factors (e.g., economic downturns, cultural events) on regional sales.

Insight: If sales in a country drop suddenly, it may be due to external factors like a recession or regulatory changes.

This helps the business adapt its strategies to mitigate risks.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from a Stacked Bar Chart showing sales breakdown by product category or country over time can significantly contribute to creating a positive business impact. Here's how:

1. Optimize Inventory Management:
Insight: Identifying seasonal trends in product sales (e.g., "Winter Coats" sell more in winter).

Positive Impact:

The business can stock inventory according to seasonal demand, reducing overstocking or stockouts.

For example, increasing stock of "Women's Dresses" in summer and "Winter Coats" in winter.

2. Targeted Marketing Campaigns:
Insight: Understanding which products or regions drive sales growth.

Positive Impact:

The business can allocate marketing budgets to high-performing products or regions.

For example, if "Women's Dresses" are a top-selling category, the business can run targeted ads to promote them.

3. Regional Expansion Strategies:
Insight: Identifying countries with consistent sales growth (e.g., the UK shows steady growth).

Positive Impact:

The business can focus on expanding its presence in high-growth regions.

For example, opening new stores or increasing marketing efforts in the UK.

4. Product Portfolio Optimization:
Insight: Recognizing which product categories contribute the most to sales.

Positive Impact:

The business can focus on high-performing categories and discontinue or revamp underperforming ones.

For example, if "Men's Shoes" contribute only 5% of sales, the business can introduce new designs or offer discounts to boost sales.

5. Strategic Decision-Making:
Insight: Understanding the impact of external factors (e.g., economic downturns, cultural events) on sales.

Positive Impact:

The business can adapt its strategies to mitigate risks and capitalize on opportunities.

For example, if sales in Germany decline due to a recession, the business can offer discounts or focus on cost-effective products.

Are There Any Insights That Lead to Negative Growth?
Yes, some insights from the stacked bar chart could indicate potential negative growth or risks to the business. Here are a few examples, along with justifications:

1. Declining Sales in Key Categories or Regions:
Insight: Sales of a previously high-performing product category or region are declining over time (e.g., "Men's Shoes" or Germany).

Negative Impact:

This could signal changing customer preferences, increased competition, or market saturation.

Justification:

If the business does not address the decline, it may lose market share and revenue.

2. Over-Reliance on a Single Category or Region:
Insight: A significant portion of sales comes from one product category or region (e.g., 50% of sales from "Women's Dresses" or 80% from the UK).

Negative Impact:

This creates a high dependency risk. If the category or region faces challenges (e.g., changing trends, economic downturns), the business could experience a significant revenue drop.

Justification:

Lack of diversification makes the business vulnerable to external shocks.

3. Seasonal Dependence:
Insight: Sales are heavily dependent on specific seasons (e.g., "Winter Coats" sell only in winter).

Negative Impact:

The business may struggle to maintain consistent revenue throughout the year.

Justification:

Over-reliance on seasonal products can lead to cash flow issues during off-seasons.

4. Underperforming Categories or Regions:
Insight: Certain product categories or regions contribute very little to total sales (e.g., "Men's Shoes" contribute only 5% of sales, or France contributes only 2%).

Negative Impact:

This indicates missed opportunities or ineffective strategies in these areas.

Justification:

If the business fails to address underperformance, it may waste resources and miss out on potential growth.

5. External Factors Impacting Sales:
Insight: Sales in a specific region decline due to external factors (e.g., economic recession, regulatory changes).

Negative Impact:

The business may experience sustained negative growth in that region.

Justification:

If the business does not adapt to external challenges, it may lose market share and revenue.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
df['CustomerID'].value_counts().plot(kind='hist', bins=50, title='Customer purchase frequency distribution')

##### 1. Why did you pick the specific chart?

Histogram: To analyze the distribution of customer purchase frequencies.

##### 2. What is/are the insight(s) found from the chart?

Customer Segmentation:

The histogram reveals distinct groups of customers based on their purchase frequencies.

Insight: Customers can be segmented into:

Frequent Buyers: Customers who make purchases very often (e.g., weekly or monthly).

Occasional Buyers: Customers who make purchases occasionally (e.g., quarterly).

One-Time Buyers: Customers who make only one purchase.

Implication: The business can tailor marketing strategies for each segment (e.g., loyalty programs for frequent buyers, re-engagement campaigns for one-time buyers).

Customer Loyalty:

The histogram shows how many customers are loyal (frequent buyers) versus those who are not.

Insight: A large number of frequent buyers indicates strong customer loyalty.

Implication: The business can focus on retaining these loyal customers through rewards, discounts, or personalized offers.

Revenue Contribution:

The histogram helps identify which customer segments contribute the most to revenue.

Insight: Frequent buyers may account for a significant portion of total revenue, even if they are a small percentage of the customer base.

Implication: The business can prioritize retaining and upselling to these high-value customers.

Churn Risk:

The histogram can reveal the proportion of one-time or infrequent buyers.

Insight: A large number of one-time buyers indicates a high churn rate.

Implication: The business should investigate why these customers are not returning and implement strategies to re-engage them (e.g., follow-up emails, special offers).

Purchase Behavior Trends:

The histogram can show trends in purchase frequencies over time.

Insight: If the number of frequent buyers is increasing, it indicates growing customer loyalty.

Implication: The business can analyze what is driving this trend (e.g., successful marketing campaigns, improved customer experience) and replicate it.

Targeted Marketing Opportunities:

The histogram helps identify customer segments that are underperforming.

Insight: If a large group of customers makes only one purchase, they represent an opportunity for re-engagement.

Implication: The business can create targeted campaigns to convert one-time buyers into repeat customers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from a Histogram analyzing the distribution of customer purchase frequencies can significantly contribute to creating a positive business impact. Here's how:

1. Improved Customer Retention:
Insight: Identifying frequent buyers and their purchasing patterns.

Positive Impact:

The business can implement loyalty programs or personalized offers to retain these high-value customers.

For example, offering exclusive discounts or early access to new products can increase customer loyalty.

2. Targeted Marketing Campaigns:
Insight: Segmenting customers into frequent, occasional, and one-time buyers.

Positive Impact:

The business can create targeted campaigns for each segment:

Frequent Buyers: Reward them with loyalty points or VIP perks.

Occasional Buyers: Encourage repeat purchases with special offers.

One-Time Buyers: Re-engage them with follow-up emails or discounts.

This improves marketing efficiency and maximizes ROI.

3. Increased Customer Lifetime Value (CLV):
Insight: Understanding which customer segments contribute the most to revenue.

Positive Impact:

The business can focus on upselling and cross-selling to high-value customers, increasing their CLV.

For example, recommending complementary products to frequent buyers can boost sales.

4. Reduced Churn Rate:
Insight: Identifying one-time buyers who are at risk of churn.

Positive Impact:

The business can implement re-engagement strategies to convert one-time buyers into repeat customers.

For example, sending personalized emails or offering discounts on their next purchase can reduce churn.

5. Optimized Resource Allocation:
Insight: Recognizing which customer segments are most profitable.

Positive Impact:

The business can allocate resources (e.g., marketing budget, inventory) to high-performing segments.

This ensures efficient use of resources and maximizes profitability.

Are There Any Insights That Lead to Negative Growth?
Yes, some insights from the histogram could indicate potential negative growth or risks to the business. Here are a few examples, along with justifications:

1. High Proportion of One-Time Buyers:
Insight: A large peak at 1 purchase in the histogram.

Negative Impact:

This indicates a high churn rate, meaning many customers do not return after their first purchase.

Justification:

If the business fails to re-engage these customers, it may struggle to grow its customer base and revenue.

Action:

Investigate why customers are not returning (e.g., poor product quality, lack of follow-up).

Implement re-engagement campaigns (e.g., discounts, personalized emails).

2. Low Number of Frequent Buyers:
Insight: A small or nonexistent peak at higher purchase frequencies (e.g., 10+ purchases).

Negative Impact:

This indicates low customer loyalty, meaning the business relies heavily on new customers rather than repeat purchases.

Justification:

Acquiring new customers is more expensive than retaining existing ones. A lack of loyal customers can lead to higher marketing costs and lower profitability.

Action:

Focus on building customer loyalty through rewards, personalized experiences, and excellent customer service.

3. Skewed Distribution:
Insight: The histogram is heavily skewed toward low purchase frequencies (e.g., most customers make only 1-2 purchases).

Negative Impact:

This indicates that the business is over-reliant on infrequent buyers, which is unsustainable for long-term growth.

Justification:

Infrequent buyers contribute less to revenue and are more likely to churn.

Action:

Develop strategies to increase purchase frequency (e.g., subscription models, bundling products).

4. Declining Purchase Frequencies Over Time:
Insight: A trend of decreasing purchase frequencies in the histogram over time.

Negative Impact:

This indicates that customers are becoming less engaged or less satisfied with the business.

Justification:

Declining engagement can lead to reduced revenue and market share.

Action:

Investigate the root cause (e.g., poor customer experience, increased competition).

Implement improvements to re-engage customers (e.g., better products, enhanced customer service).

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
sns.heatmap(df[['Quantity', 'UnitPrice', 'TotalPrice']].corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')

##### 1. Why did you pick the specific chart?

To visualize correlations between numerical columns.

##### 2. What is/are the insight(s) found from the chart?

Strong Positive Correlation between Quantity and TotalPrice:

The heatmap shows a strong positive correlation (0.92) between 'Quantity' and 'TotalPrice'.
This is intuitive and expected: as the quantity of items purchased increases, the total price of the order naturally increases as well.
Weak or No Correlation between UnitPrice and Others:

The heatmap indicates a very weak or no correlation between 'UnitPrice' and both 'Quantity' and 'TotalPrice'. The correlation coefficients are close to 0.
This suggests that the unit price of an item does not have a significant impact on the quantity purchased or the total price of the order. Customers might buy a wide range of items with varying unit prices, and the total price depends more on the quantity purchased.
Implications for Business:

Focus on Quantity: Since there's a strong positive correlation between Quantity and TotalPrice, the business should focus on strategies to encourage customers to purchase larger quantities of items. This could involve:

Offering volume discounts or promotions (e.g., "Buy 2, Get 1 Free").
Bundling products together.
Recommending complementary items to increase the average order size.
Pricing Strategies: The lack of correlation between UnitPrice and TotalPrice suggests that the business might have flexibility in pricing without significantly impacting sales. However, careful consideration is needed, as pricing strategies should align with overall business goals and customer expectations.

Product Mix and Recommendations: The weak correlation between UnitPrice and Quantity could indicate that customers purchase a mix of low-priced and high-priced items. The business should consider:

Offering a variety of products at different price points to cater to different customer segments.
Using data-driven recommendations to suggest related products that customers are likely to purchase together, regardless of their unit prices.

#### Chart - 15 - Pair Plot

In [None]:
# Select numerical columns for the pair plot
numerical_columns = ['Quantity', 'UnitPrice', 'TotalPrice']  # Replace with your numerical columns

# Create the pair plot
sns.pairplot(df[numerical_columns])

# Add a title
plt.suptitle('Pair Plot of Numerical Variables', y=1.02)

# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

To show relationships between multiple numerical variables.

##### 2. What is/are the insight(s) found from the chart?

Distribution of Individual Variables:

Quantity: The histogram for 'Quantity' is likely to be right-skewed, indicating that most orders have relatively small quantities, while a few orders have very large quantities (potential bulk purchases or outliers).
UnitPrice: The histogram for 'UnitPrice' might also be right-skewed, showing that most items have lower unit prices, with some higher-priced items.
TotalPrice: The histogram for 'TotalPrice' would probably be similar to 'Quantity', right-skewed with most orders having lower total prices and a few having much higher total prices.
Relationships between Variables:

Quantity vs. TotalPrice: You should see a clear positive linear relationship in the scatter plot between 'Quantity' and 'TotalPrice'. As the quantity of items in an order increases, the total price of the order also increases, which aligns with the strong positive correlation we observed in the heatmap.
UnitPrice vs. Quantity/TotalPrice: The scatter plots between 'UnitPrice' and 'Quantity', as well as 'UnitPrice' and 'TotalPrice', are likely to show a more scattered or random pattern, indicating a weak or no correlation, as we also saw in the heatmap. This suggests that the unit price of an item does not strongly influence the quantity purchased or the total price of the order.
Potential Outliers: You might observe some outliers in the scatter plots, particularly for 'Quantity' and 'TotalPrice'. These outliers could represent bulk purchases, unusual transactions, or data errors. They should be investigated further to understand their nature.
Business Implications and Actions

Focus on Quantity-Based Promotions: Since 'Quantity' has a strong influence on 'TotalPrice', the business could implement promotions or strategies to encourage customers to purchase larger quantities. This could involve volume discounts, bundling products, or offering loyalty rewards for higher-quantity purchases.
Pricing Flexibility: The weak relationship between 'UnitPrice' and 'TotalPrice' suggests that the business might have some flexibility in adjusting prices without significantly impacting sales. However, pricing strategies should still be carefully considered and aligned with overall business goals and customer expectations.
Product Mix and Recommendations: The pair plot might reveal patterns in the relationship between 'UnitPrice' and 'Quantity' that could inform product mix and recommendation strategies. For example, if customers tend to purchase a mix of low-priced and high-priced items, the business could offer a variety of products at different price points and use data-driven recommendations to suggest complementary products regardless of their unit prices.
Investigate Outliers: Outliers in the pair plot should be investigated to determine if they represent valid transactions or data errors. If they are errors, they should be corrected or removed from the dataset. If they are valid but unusual transactions, they might provide insights into specific customer segments or product categories.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

 Enhance Purchasing Trend Analysis and Inventory Management:

Seasonal Trends: The line chart and stacked bar chart revealed seasonal sales trends. Analyze sales patterns for different product categories and countries over time to anticipate future demand. Optimize inventory levels to ensure sufficient stock of popular items during peak seasons while minimizing overstocking during slower periods.
Product Performance: Evaluate product sales and popularity using visualizations like bar charts. Identify best-selling and underperforming products. Focus on promoting high-performing products and consider discontinuing or revamping underperforming ones.
2. Improve Customer Behavior Understanding and Segmentation:

Customer Purchase Frequency: The histogram of customer purchase frequency revealed different customer segments (frequent, occasional, one-time buyers). Tailor marketing strategies for each segment. Implement loyalty programs for frequent buyers, offer targeted promotions to occasional buyers, and re-engage one-time buyers with personalized campaigns.
Regional Preferences: The stacked bar chart highlighted regional variations in sales. Customize product offerings, promotions, and marketing messages based on customer preferences and cultural factors in different regions.
3. Optimize Pricing Strategies:

Quantity-Based Pricing: Since 'Quantity' has a strong influence on 'TotalPrice', consider implementing volume discounts or bundling strategies to encourage larger purchases.
Price Elasticity: The weak relationship between 'UnitPrice' and 'TotalPrice' suggests some pricing flexibility. Conduct experiments with price adjustments on specific products to understand their impact on sales volume and revenue.
4. Streamline Inventory Management:

Demand Forecasting: Use historical sales data and trend analysis to predict future demand for products. Optimize stock levels to minimize overstocking and stockouts.
Inventory Turnover: Analyze the speed at which inventory is sold. Identify slow-moving products and consider clearance sales or promotions to reduce inventory holding costs.
5. Enhance Customer Relationship Management (CRM):

Personalized Recommendations: Use customer purchase data to provide personalized product recommendations. Encourage cross-selling and upselling to increase the average order value.
Customer Feedback: Implement systems to collect customer feedback and address concerns promptly. Improve customer satisfaction and retention through better customer service.

# **Conclusion**

This analysis of Myntra Gifts Ltd.'s online retail data has revealed valuable insights into customer behavior, purchasing trends, and product performance. By leveraging these insights, the company can achieve its business objectives of enhancing marketing strategies, optimizing pricing and inventory management, and improving customer satisfaction. Key findings include:

Seasonal sales trends with peaks in November and opportunities to optimize inventory accordingly.
Strong correlation between quantity and total price, suggesting potential for volume-based promotions.
Weak correlation between unit price and sales, indicating flexibility in pricing strategies.
Diverse customer purchase frequencies, enabling targeted marketing for different segments.
Regional variations in sales, highlighting the need for customized product offerings and promotions.
By implementing data-driven recommendations aligned with these insights, Myntra Gifts Ltd. can improve its operational efficiency, customer retention, and overall business growth in the dynamic e-commerce landscape. Continuous monitoring of data and agile adaptation to market dynamics will be crucial for sustained success.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***