# Exploratory Data Analysis (EDA) on Retail Dataset

## Objectives:
- Understand and summarize the structure of raw retail sales data.
- Identify sales trends, patterns, and seasonality using graphs.
- Present key KPIs such as Total Sales, Revenue per Category, and Average Order Value.

## Tools:
- Python (Pandas, Matplotlib, Seaborn)

## Deliverable:
- Jupyter Notebook with Visualizations and Explanations

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Load dataset
df = pd.read_csv('Retail Sales Analysis_utf.csv')

## Data Cleaning
Fill missing values and convert dates.

In [None]:
df['age'].fillna(df['age'].median(), inplace=True)
df['quantiy'].fillna(df['quantiy'].median(), inplace=True)
df['price_per_unit'].fillna(df['price_per_unit'].median(), inplace=True)
df['cogs'].fillna(df['cogs'].median(), inplace=True)
df['total_sale'].fillna(df['total_sale'].median(), inplace=True)
df['sale_date'] = pd.to_datetime(df['sale_date'])
df['month'] = df['sale_date'].dt.to_period('M')
df['day_of_week'] = df['sale_date'].dt.day_name()

## Dataset Overview

In [None]:
df.info()
df.describe()
df.head()

## Total Sales Over Time

In [None]:
monthly_sales = df.groupby('month')['total_sale'].sum()
monthly_sales.plot(kind='line', figsize=(10,6), title='Total Sales by Month')
plt.ylabel('Total Sales')
plt.xlabel('Month')
plt.show()

## Revenue per Category

In [None]:
category_sales = df.groupby('category')['total_sale'].sum().sort_values(ascending=False)
sns.barplot(x=category_sales.index, y=category_sales.values)
plt.title('Revenue per Category')
plt.ylabel('Total Sales')
plt.xlabel('Category')
plt.show()

## Average Order Value

In [None]:
aov = df['total_sale'].mean()
print(f'Average Order Value: ₹{aov:.2f}')

## Sales by Gender

In [None]:
gender_sales = df.groupby('gender')['total_sale'].sum()
sns.barplot(x=gender_sales.index, y=gender_sales.values)
plt.title('Sales by Gender')
plt.ylabel('Total Sales')
plt.xlabel('Gender')
plt.show()

## Sales by Day of Week

In [None]:
dow_sales = df.groupby('day_of_week')['total_sale'].sum()
dow_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
dow_sales = dow_sales.reindex(dow_order)
sns.barplot(x=dow_sales.index, y=dow_sales.values)
plt.title('Sales by Day of Week')
plt.ylabel('Total Sales')
plt.xlabel('Day of Week')
plt.xticks(rotation=45)
plt.show()