Python Project (Diwali Sales Analysis)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [None]:
df = pd.read_csv(r'D:\Projects\Diwali_Sales_Data.csv', encoding = 'unicode_escape')

In [None]:
df.shape

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
#--getting info of the dataset to view Null-counts and Data type
df.info()

In [None]:
#--droping null columns
df.drop(['Status', 'unnamed1'], axis=1, inplace = True)

In [None]:
df.info()

In [None]:
#--viewing total null values in each column
pd.isnull(df).sum()

In [None]:
#--Removing all the null values
df.dropna(inplace = True)

In [None]:
df.shape

In [None]:
#--Changing the datatype of "Amount" column from 'float' to 'int'
df['Amount'] = df['Amount'].astype('int')

In [None]:
df['Amount'].dtypes

In [None]:
#--getting statistical description of the key columns to 
df[['Age', 'Orders', 'Amount']].describe()

Exploratory Data Analysis (EDA)

In [None]:
df.columns

____Gender Based Insights____

In [None]:
#--Total Male and Female count

ax = sns.countplot(x= 'Gender', hue = 'Gender', data = df, palette= 'Set2', legend = False)

for bars in ax.containers:
    ax.bar_label(bars)

plt.show()

In [None]:
#--Total Amount spent by Males and Females
sns.set({'figure.figsize':(7,6)})
sales_gen = df.groupby(['Gender'], as_index = False)['Amount'].sum().sort_values(by='Amount', ascending = False)
sns.barplot(x = 'Gender', y = 'Amount', hue = 'Gender', data = sales_gen, palette = 'Set1' )
plt.show()

Analysis of the dataset indicates that female buyers dominate the customer base.
Their expenditure is higher compared to male buyers, highlighting stronger purchasing power among women.


____Age Based Insights____

In [None]:
#--Total count by age group

ax = sns.countplot(x = 'Age Group', data = df, hue = 'Gender')

for bars in ax.containers:
    ax.bar_label(bars)

plt.show()

In [None]:
#--Expenditure by age group

sales_age = df.groupby(['Age Group'], as_index = False)['Amount'].sum().sort_values(by = 'Amount', ascending = False)
sns.barplot(x = 'Age Group', y = 'Amount', data = sales_age, hue = 'Age Group')

plt.show()

Above graphs indicates that most buyers are 26-35 yrs females

____State Based Insights____

In [None]:
#--total number of orders from top 10 states

sales_state = df.groupby(['State'], as_index = False)['Orders'].sum().sort_values(by = 'Orders', ascending = False).head(10)
sns.set(rc ={'figure.figsize' : (15, 5)})
sns.barplot(data = sales_state, x = 'State', y = 'Orders', hue = 'State')
plt.show()

In [None]:
#--total amount spent by top 10 states

sales_state = df.groupby(['State'], as_index = False)['Amount'].sum().sort_values(by = 'Amount', ascending = False).head(10)
sns.set(rc ={'figure.figsize' : (15, 5)})
sns.barplot(data = sales_state, x = 'State', y = 'Amount', hue = 'State')
plt.show()

The majority of orders were received from Uttar Pradesh (UP), followed by Maharashtra and Karnataka.
Kerala ranks 8th in terms of order volume, but does not appear in the top 10 states by expenditure.
Interestingly, Haryana and Bihar, despite having fewer orders, show higher expenditure per order compared to Kerala.


____Marital status based Insights____

In [None]:
#--Total married(0) and unmarried(1) count

ax = sns.countplot(data = df, x = 'Marital_Status')

sns.set(rc= {'figure.figsize':(5,8)})
for bars in ax.containers:
    ax.bar_label(bars)
plt.show()

In [None]:
#--Amount spent by Gender and Marital Status

sales_state = df.groupby(['Marital_Status', 'Gender'], as_index = False)['Amount'].sum().sort_values(by = 'Amount', ascending = False)
sns.set(rc = {'figure.figsize': (3,5)})
sns.barplot(data = sales_state, x = 'Marital_Status', y = 'Amount', hue = 'Gender')
plt.show()

The data reveals that married women form the largest buyer segment.
This group also demonstrates higher purchasing power, making them a key demographic for targeted marketing strategies.


____Occupation based Insights____

In [None]:
#--Count by Occupation

sns.set(rc = {'figure.figsize': (20,7)})
ax = sns.countplot(data = df, x = 'Occupation', hue = 'Occupation')
for bars in ax.containers:
    ax.bar_label(bars)

plt.show()

In [None]:
#--Amount spent by Occupation

sales_state = df.groupby(['Occupation'], as_index = False)['Amount'].sum().sort_values(by = 'Amount', ascending = False)

sns.set(rc = {'figure.figsize': (20,5)})
sns.barplot(data = sales_state, x = 'Occupation', y = 'Amount', hue = 'Occupation')
plt.show()

Most buyers are employed in the IT sector, followed by Healthcare and Aviation.
These industries represent the highest concentration of active buyers in the dataset.


In [None]:
df.columns

____Insights based on Product Category____

In [None]:
#--Products sold by Category

sns.set(rc= {'figure.figsize': (24,5)})
ax = sns.countplot(data = df, x = 'Product_Category', hue = 'Product_Category')

for bars in ax.containers:
    ax.bar_label(bars)

plt.show()

In [None]:
#--Revenue generated by Product Category

sales_state = df.groupby(['Product_Category'], as_index = False)['Amount'].sum().sort_values(by = 'Amount', ascending = False).head(10)
sns.set(rc= {'figure.figsize': (30,10)})
sns.barplot(data = sales_state, x = 'Product_Category', y = 'Amount', hue = 'Product_Category', palette = 'Set2', legend = False)
plt.show()

Top selling product categories are Food, Clothing and Electronics.

In [None]:
#--Top 10 most selling products by Product_ID

sales_state = df.groupby(['Product_ID'], as_index=False)['Orders'].sum().sort_values(by = 'Orders', ascending = False).head(10)
sns.set(rc={'figure.figsize': (15,5)})
sns.barplot(data = sales_state, x = 'Product_ID', y= 'Orders')
plt.show()

#CONCLUSION

Based on the analysis, the most likely buyers can be profiled as:

Married women in the 26â€“35 age group
Residing in Uttar Pradesh, Maharashtra, and Karnataka
Working in IT, Healthcare, or Aviation sectors
Primarily purchasing products from Food, Clothing, and Electronics categories
