# <center>Diwali Sales Analysis</center>

## Overview
In this project, we conducted a comprehensive analysis of Diwali sales data using Python and various data analysis libraries like **NumPy, Pandas, Matplotlib and Seaborn**. The primary aim of the analysis was to gain insights into customer behavior, purchasing trends, and key contributing factors to sales during the festive season. By leveraging data visualization and exploratory data analysis techniques, we delved into various aspects of the data to uncover meaningful patterns and observations.

## Project Description
The project involved the following steps:

**Data Import and Preliminary Analysis:**
We started by importing the Diwali sales data from a CSV file. Using Python libraries such as NumPy, Pandas, Matplotlib, and Seaborn, we loaded the dataset, examined its structure, and gained an initial understanding of the available columns.

**Data Cleaning and Preprocessing:**
Before proceeding with the analysis, we performed data cleaning and preprocessing steps. This included dropping irrelevant columns, handling missing values, and converting data types to appropriate formats.

**Exploratory Data Analysis:**
The heart of our analysis was the exploratory data analysis phase. We explored various aspects of the data, including gender distribution, age groups of buyers, sales across different states, marital status of customers, occupations, and popular product categories. Through visualizations like bar charts, we visually depicted trends, correlations, and insights in each of these categories.

In [1]:
# import python libraries

import numpy as np 
import pandas as pd 
from matplotlib import pyplot as plt
import seaborn as sns

In [2]:
# import csv file
df = pd.read_csv('Diwali_Sales_Data.csv')

In [3]:
df.shape

In [4]:
df.head()

In [5]:
df.info()

In [6]:
#drop unrelated/blank columns
df.drop(['Status', 'unnamed1'], axis=1, inplace=True)

In [7]:
#check for null values
pd.isnull(df).sum()

In [8]:
# drop null values
df.dropna(inplace=True)

In [9]:
#check data type of Amount column
df['Amount'].dtypes

In [10]:
# change data type
df['Amount'] = df['Amount'].astype('int')

In [11]:
df['Amount'].dtypes

In [12]:
df.columns

In [13]:
#rename column
df.rename(columns= {'Shadi_Status':'Marital_Status'}, inplace=True)

In [14]:
df.columns

In [15]:
# describe() method returns description of the data in the DataFrame (i.e. count, mean, std, etc)
df.describe()

In [16]:
# use describe() for specific columns
df[['Age', 'Orders', 'Amount']].describe()

# Exploratory Data Analysis

### Gender

In [17]:
# plotting a bar chart for Gender and it's count

plt.figure(figsize=(5,6))
ax = sns.countplot(x = 'Gender',data = df)
plt.title('Number of Male and Female buyers')

for bars in ax.containers:
    ax.bar_label(bars)

plt.show()

In [18]:
# plotting a bar chart for gender vs total amount

sales_gen = df.groupby(['Gender'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False)

plt.figure(figsize=(5,6))
sns.barplot(x = 'Gender',y= 'Amount' ,data = sales_gen)
plt.title('Gender vs Total Amount')

plt.show()

- *From above graphs we can see that most of the buyers are females and even the purchasing power of females are greater than men.*

### Age

In [19]:
plt.figure(figsize=(12,6))
ax = sns.countplot(data = df, x = 'Age Group', hue = 'Gender')
plt.xlabel('Age Group')
plt.ylabel('Number of buyers')
plt.title('Number of buyers from different age groups')

for bars in ax.containers:
    ax.bar_label(bars)
    
plt.show()

In [20]:
# Total Amount vs Age Group

sales_age = df.groupby(['Age Group'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False)
plt.figure(figsize=(12,6))
sns.barplot(x = 'Age Group',y= 'Amount' ,data = sales_age)
plt.xlabel('Age Group')
plt.ylabel('Amount')
plt.title('Total Amount vs Age Group')

plt.show()

- *From above graphs we can see that most of the buyers are of age group between 26-35 yrs female*

### State

In [21]:
# total number of orders from top 10 states

sales_state = df.groupby(['State'], as_index=False)['Orders'].sum().sort_values(by='Orders', ascending=False).head(10)

plt.figure(figsize=(16,5))
sns.barplot(data = sales_state, x = 'State',y= 'Orders')
plt.xlabel('States')
plt.ylabel('Orders')
plt.title('Total number of orders from top 10 states')

plt.show()

In [22]:
# total amount/sales from top 10 states

sales_state = df.groupby(['State'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False).head(10)

plt.figure(figsize=(12,5))
sns.barplot(data = sales_state, x = 'State',y= 'Amount')
plt.xticks(rotation='vertical')
plt.xlabel('States')
plt.ylabel('Amount')
plt.title('Total Sales from Top 10 states')

plt.show()

- *From above graphs we can see that most of the orders & total sales/amount are from Uttar Pradesh, Maharashtra and Karnataka.*


### Marital Status

In [23]:
plt.figure(figsize=(5,6))
ax = sns.countplot(data = df, x = 'Marital_Status')
plt.xlabel('Marital Status')
plt.ylabel('Number of Buyers')
plt.title('Number of Married and Unmaried Customers')

for bars in ax.containers:
    ax.bar_label(bars)
    
plt.show()

In [24]:
sales_state = df.groupby(['Marital_Status', 'Gender'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False)

plt.figure(figsize= (6,5))
sns.barplot(data = sales_state, x = 'Marital_Status',y= 'Amount', hue='Gender')
plt.xlabel('Marital Status')
plt.ylabel('Amount')
plt.title('Total amount of sales vs Marital status of both genders')

plt.show()

- *From above graphs we can see that most of the buyers are married (women) and they have high purchasing power*

### Occupation

In [25]:
plt.figure(figsize= (12,5))

ax = sns.countplot(data = df, x = 'Occupation')
plt.xticks(rotation='vertical')
plt.xlabel('Occupation')
plt.ylabel('Number of Buyers')
plt.title('Buyer Distribution by Occupation')

for bars in ax.containers:
    ax.bar_label(bars)
    
plt.show()

In [26]:
sales_state = df.groupby(['Occupation'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False)

plt.figure(figsize= (12,5))
sns.barplot(data = sales_state, x = 'Occupation',y= 'Amount')
plt.xticks(rotation='vertical')
plt.xlabel('Occupation')
plt.ylabel('Amount')
plt.title('Sales by Occupation')

plt.show()

- *From above graphs we can see that most of the buyers are working in IT, Healthcare and Aviation sector*

### Product Category

In [27]:
plt.figure(figsize= (12,5))
ax = sns.countplot(data = df, x = 'Product_Category')
plt.xticks(rotation='vertical')
plt.xlabel('Product Category')
plt.ylabel('Number of buyers')
plt.title('Number of buyers by product category')

for bars in ax.containers:
    ax.bar_label(bars)

plt.show()

In [28]:
sales_state = df.groupby(['Product_Category'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False).head(10)

plt.figure(figsize= (12,5))
sns.barplot(data = sales_state, x = 'Product_Category',y= 'Amount')
plt.xticks(rotation='vertical')
plt.xlabel('Product Category')
plt.ylabel('Amount')
plt.title('Sales by product category')


plt.show()


- *From above graphs we can see that most of the sold products are from Food, Clothing and Electronics category*

In [29]:
# top 10 most sold products (same thing as above)
sales_state = df.groupby(['Product_ID'], as_index=False)['Orders'].sum().sort_values(by='Orders', ascending=False).head(10)

plt.figure(figsize= (12,5))
sns.barplot(data = sales_state, x = 'Product_ID',y= 'Orders')
plt.xlabel('Product ID')
plt.ylabel('Number of Orders')
plt.title('Number of Orders by product ID')

plt.show()

- *From this graph we can see that products with these product IDs are the most sold ones.*

## Key Observations:

- Most of the buyers were females, and their purchasing power surpassed that of men.
- Buyers in the age group of 26-35 years, especially females, constituted a significant portion of the customer base.
- Uttar Pradesh, Maharashtra, and Karnataka were the top states contributing to the majority of orders and total sales.
- Married women demonstrated a higher purchasing power compared to other demographics.
- Occupations in IT, Healthcare, and Aviation sectors were prominent among buyers.
- The most sold product categories included Food, Clothing, and Electronics**.
- Top Sold Products:
  We identified the top 10 most sold products based on the number of orders, providing insights into customer preferences and popular product choices.

## Conclusions:
The Diwali Sales Analysis project provided valuable insights into customer behavior and sales trends during the festive season. The data revealed patterns that can be leveraged to optimize marketing strategies, target specific customer demographics, and tailor product offerings. Understanding the preferences of different age groups, genders, and regions can assist businesses in making informed decisions to enhance their sales and profitability.

The project showcased the power of data analysis and visualization in extracting meaningful information from raw data. By applying data science techniques, we transformed raw sales data into actionable insights, contributing to effective decision-making and strategic planning.