## E-COMMERCE SALES ANALYSIS

### PROBLEM STATEMENT

The objective of this data analysis project is to gain valuable insights from the ecommerce sales data in order to optimize business strategies and drive revenue growth. By examining the sales data, we aim to identify patterns, trends, and factors influencing sales performance, customer behavior, and product popularity. The analysis will enable us to make data-driven decisions and develop actionable recommendations for improving the ecommerce platform's sales performance and overall customer satisfaction.

### QUESTIONS

1.  What are the best performing segments and products (by total amount sold)?
2. What is the sales growth over time?
3. Contribution towards profitability by product. (if other datasets are related)
4. The most popular product category by state
5. Total number of cancelled and returned orders.
6. Highest amount of orders by city
7. States with the highest orders.
8. Are there any seasonal or temporal trends that significantly affect sales patterns?
9. Which product categories or specific products are top performers in terms of sales volume and revenue?
10. Can we identify customer segments based on purchasing behavior and preferences?
11. Which regions have the highest sales volume? Are there any specific geographic areas with potential for growth?
12. Are there any notable differences in sales performance between B2B and B2C customers?
13. What is the distribution of order statuses? Are there any bottlenecks or areas for improvement in the order fulfillment process?
14. Are there any notable differences in sales performance based on the courier status or the method of shipping?
15. How does the fulfillment method (fulfilled-by) impact customer satisfaction and repeat purchases?
16. How does the order quantity (Qty) affect the average order value and revenue?
17. Is there a correlation between promotional activities (promotion-ids) and sales performance? Which promotions have the highest impact on sales?

### IMPORTING LIBRARIES

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb 
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

In [8]:
df= pd.read_csv('Amazon Sale Report.csv')
df.head()

Unnamed: 0,index,Order ID,Date,Status,Fulfilment,Sales Channel,ship-service-level,Style,SKU,Category,...,currency,Amount,ship-city,ship-state,ship-postal-code,ship-country,promotion-ids,B2B,fulfilled-by,Unnamed: 22
0,0,405-8078784-5731545,04-30-22,Cancelled,Merchant,Amazon.in,Standard,SET389,SET389-KR-NP-S,Set,...,INR,647.62,MUMBAI,MAHARASHTRA,400081.0,IN,,False,Easy Ship,
1,1,171-9198151-1101146,04-30-22,Shipped - Delivered to Buyer,Merchant,Amazon.in,Standard,JNE3781,JNE3781-KR-XXXL,kurta,...,INR,406.0,BENGALURU,KARNATAKA,560085.0,IN,Amazon PLCC Free-Financing Universal Merchant ...,False,Easy Ship,
2,2,404-0687676-7273146,04-30-22,Shipped,Amazon,Amazon.in,Expedited,JNE3371,JNE3371-KR-XL,kurta,...,INR,329.0,NAVI MUMBAI,MAHARASHTRA,410210.0,IN,IN Core Free Shipping 2015/04/08 23-48-5-108,True,,
3,3,403-9615377-8133951,04-30-22,Cancelled,Merchant,Amazon.in,Standard,J0341,J0341-DR-L,Western Dress,...,INR,753.33,PUDUCHERRY,PUDUCHERRY,605008.0,IN,,False,Easy Ship,
4,4,407-1069790-7240320,04-30-22,Shipped,Amazon,Amazon.in,Expedited,JNE3671,JNE3671-TU-XXXL,Top,...,INR,574.0,CHENNAI,TAMIL NADU,600073.0,IN,,False,,


In [9]:
df.shape

(128975, 24)

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128975 entries, 0 to 128974
Data columns (total 24 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   index               128975 non-null  int64  
 1   Order ID            128975 non-null  object 
 2   Date                128975 non-null  object 
 3   Status              128975 non-null  object 
 4   Fulfilment          128975 non-null  object 
 5   Sales Channel       128975 non-null  object 
 6   ship-service-level  128975 non-null  object 
 7   Style               128975 non-null  object 
 8   SKU                 128975 non-null  object 
 9   Category            128975 non-null  object 
 10  Size                128975 non-null  object 
 11  ASIN                128975 non-null  object 
 12  Courier Status      122103 non-null  object 
 13  Qty                 128975 non-null  int64  
 14  currency            121180 non-null  object 
 15  Amount              121180 non-nul

In [13]:
df.columns

Index(['index', 'Order ID', 'Date', 'Status', 'Fulfilment', 'Sales Channel ',
       'ship-service-level', 'Style', 'SKU', 'Category', 'Size', 'ASIN',
       'Courier Status', 'Qty', 'currency', 'Amount', 'ship-city',
       'ship-state', 'ship-postal-code', 'ship-country', 'promotion-ids',
       'B2B', 'fulfilled-by', 'Unnamed: 22'],
      dtype='object')

### DATA CLEANING

In [None]:
((df.isna().sum()/len(df))*100).sort_values(ascending = False)

fulfilled-by          69.546811
promotion-ids         38.110487
Unnamed: 22           38.030626
currency               6.043807
Amount                 6.043807
Courier Status         5.328164
ship-country           0.025586
ship-postal-code       0.025586
ship-state             0.025586
ship-city              0.025586
ship-service-level     0.000000
Style                  0.000000
Date                   0.000000
B2B                    0.000000
Status                 0.000000
Fulfilment             0.000000
Sales Channel          0.000000
Qty                    0.000000
Order ID               0.000000
ASIN                   0.000000
Size                   0.000000
Category               0.000000
SKU                    0.000000
index                  0.000000
dtype: float64

In [49]:
df.isna().sum()

index                     0
Order ID                  0
Date                      0
Status                    0
Fulfilment                0
Sales Channel             0
ship-service-level        0
Style                     0
SKU                       0
Category                  0
Size                      0
ASIN                      0
Courier Status         6872
Qty                       0
currency               7795
Amount                    0
ship-city                33
ship-state               33
ship-postal-code         33
ship-country             33
promotion-ids             0
B2B                       0
fulfilled-by              0
Unnamed: 22           49050
dtype: int64

In [46]:
df['fulfilled-by'] = df['fulfilled-by'].fillna('Unknown')
df['promotion-ids'] = df['promotion-ids'].fillna('Unknown')
df['Amount'] = df['Amount'].fillna(0)

In [47]:
#changind amt to zero where qty is 0
df.loc[df['Qty']==0, 'Amount'] = 0

In [48]:
df[df['Qty']== 0].iloc[:,10:17]

Unnamed: 0,Size,ASIN,Courier Status,Qty,currency,Amount,ship-city
0,S,B09KXVBD7Z,,0,INR,0.0,MUMBAI
3,L,B099NRCT7B,,0,INR,0.0,PUDUCHERRY
8,3XL,B08L91ZZXN,Cancelled,0,,0.0,HYDERABAD
23,M,B099NK55YG,,0,INR,0.0,pune
29,3XL,B07JG3CND8,,0,,0.0,GUWAHATI
...,...,...,...,...,...,...,...
128903,M,B09SDZ4FH9,Cancelled,0,,0.0,ANANTAPUR
128907,3XL,B0928ZT74Y,Cancelled,0,,0.0,GREATER NOIDA
128908,3XL,B0928YCMQP,Cancelled,0,,0.0,GREATER NOIDA
128958,L,B07R487XRD,Cancelled,0,,0.0,Bengaluru


In [50]:
df.nunique()

index                 128975
Order ID              120378
Date                      91
Status                    13
Fulfilment                 2
Sales Channel              2
ship-service-level         2
Style                   1377
SKU                     7195
Category                   9
Size                      11
ASIN                    7190
Courier Status             3
Qty                       10
currency                   1
Amount                   867
ship-city               8955
ship-state                69
ship-postal-code        9459
ship-country               1
promotion-ids           5788
B2B                        2
fulfilled-by               2
Unnamed: 22                1
dtype: int64