## Data importing

In [22]:
import pandas as pd

raw_sales = pd.read_csv('./data/sales_data.csv')

raw_sales.head()

Unnamed: 0,order_date,time,aging,customer_id,gender,device_type,customer_login_type,product_category,product,sales,quantity,discount,profit,shipping_cost,order_priority,payment_method
0,2018-01-01,10:11:40,5.0,29317,Male,Web,Member,Auto & Accessories,Car Media Players,140.0,4.0,0.3,43.2,4.3,Medium,e_wallet
1,2018-01-01,22:30:44,7.0,42270,Male,Web,Member,Auto & Accessories,Car Pillow & Neck Rest,231.0,5.0,0.1,139.5,13.9,High,money_order
2,2018-01-01,21:55:31,10.0,14563,Male,Web,Member,Auto & Accessories,Car Speakers,211.0,5.0,0.1,120.5,12.0,High,credit_card
3,2018-01-01,13:57:15,9.0,58601,Male,Web,Member,Auto & Accessories,Tyre,250.0,4.0,0.2,150.0,15.0,Critical,credit_card
4,2018-01-01,15:17:41,2.0,48342,Male,Web,Member,Auto & Accessories,Tyre,250.0,1.0,0.1,165.0,16.5,High,credit_card


## Data transforming

In [29]:
sales = raw_sales.copy()
sales['order_date'] = pd.to_datetime(sales['order_date'])
sales = sales.drop('time', axis=1)
sales['sales_amount'] = sales['sales'] * sales['quantity']
sales['gender'] = sales['gender'].apply(lambda g: 'M' if g == 'Male' else 'F')
sales.columns = sales.columns.str.lower()

sales.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51290 entries, 0 to 51289
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   order_date           51290 non-null  datetime64[ns]
 1   aging                51289 non-null  float64       
 2   customer_id          51290 non-null  int64         
 3   gender               51290 non-null  object        
 4   device_type          51290 non-null  object        
 5   customer_login_type  51290 non-null  object        
 6   product_category     51290 non-null  object        
 7   product              51290 non-null  object        
 8   sales                51289 non-null  float64       
 9   quantity             51288 non-null  float64       
 10  discount             51289 non-null  float64       
 11  profit               51290 non-null  float64       
 12  shipping_cost        51289 non-null  float64       
 13  order_priority       51288 non-

## Data descriptive analysis

### Summary Statistics for Sales Amount

In [31]:
sales['sales_amount'].describe()

count    51287.000000
mean       382.864274
std        303.918182
min         33.000000
25%        149.000000
50%        248.000000
75%        545.000000
max       1250.000000
Name: sales_amount, dtype: float64

### Number of Product Categories and Their Popularity

In [32]:
sales['product_category'].value_counts()

product_category
Fashion               25646
Home & Furniture      15438
Auto & Accessories     7505
Electronic             2701
Name: count, dtype: int64

The most popular category is fashion and the most unpopular it is electronic

### Top 5 Most Popular Products

In [None]:
sales['product'].value_counts(sort=True, ascending=False).head(5)

product
Suits           2332
Jeans           2332
T - Shirts      2332
Fossil Watch    2332
Shirts          2332
Name: count, dtype: int64

We see that Suits, Jeans, T - Shirts, Fossil Watch and Shirts are the 5 most popular products and they have the same amount of sales

###  Purchase Frequency by Gender

In [46]:
sales_by_gender = sales['gender'].value_counts()

total_sales = sales.shape[0]

percentage_man_sales = sales_by_gender['M'] / total_sales * 100
percentage_female_sales = sales_by_gender['F'] / total_sales * 100

print(f"Percentage of male sales: {percentage_man_sales:.2f}%")
print(f"Percentage of female sales: {percentage_female_sales:.2f}%")

Percentage of male sales: 54.86%
Percentage of female sales: 45.14%


We observe that males purchase products more frequently, but the difference is not very large.

### Most Popular Payment Method

In [48]:
sales['payment_method'].value_counts(sort=True, ascending=False)

payment_method
credit_card    38137
money_order     9629
e_wallet        2789
debit_card       734
not_defined        1
Name: count, dtype: int64

We observe that the most popular payment method is Credit Card, followed by Ewallet and Cash.