# 🛍️ Sales Data Analysis Dashboard
### Python Exploratory Data Analysis (EDA)
This notebook performs data cleaning and analysis on the sales dataset, and prepares insights for Power BI dashboard creation.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

plt.style.use('seaborn-v0_8-whitegrid')

In [None]:
# Load dataset
df = pd.read_csv('../data/sales_data.csv')
df.head()

## 🧹 Data Cleaning

In [None]:
# Check for missing values and duplicates
print(df.info())
print('\nMissing values per column:')
print(df.isnull().sum())
df.drop_duplicates(inplace=True)
df.shape

## 📊 Basic Data Overview

In [None]:
# Summary statistics
df.describe()

In [None]:
# Convert Date column to datetime and extract month/year
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.strftime('%b')
df['Year'] = df['Date'].dt.year
df.head()

## 📈 Exploratory Data Analysis

In [None]:
# Total Sales and Profit by Region
region_summary = df.groupby('Region')[['Sales', 'Profit']].sum().sort_values('Sales', ascending=False)
print(region_summary)
region_summary.plot(kind='bar', figsize=(8,5), title='Total Sales & Profit by Region')
plt.ylabel('Amount')
plt.show()

In [None]:
# Monthly Sales Trend
monthly_sales = df.groupby('Month')['Sales'].sum()
monthly_sales = monthly_sales.reindex(['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])
monthly_sales.plot(kind='line', marker='o', figsize=(8,5), title='Monthly Sales Trend')
plt.ylabel('Sales Amount')
plt.show()

In [None]:
# Top 10 Products by Total Sales
top_products = df.groupby('Product')['Sales'].sum().sort_values(ascending=False).head(10)
sns.barplot(x=top_products.values, y=top_products.index)
plt.title('Top 10 Products by Sales')
plt.xlabel('Total Sales')
plt.show()

In [None]:
# Profit Margin by Product Category
df['Profit Margin %'] = (df['Profit'] / df['Sales']) * 100
category_margin = df.groupby('Product Category')['Profit Margin %'].mean().sort_values(ascending=False)
sns.barplot(x=category_margin.values, y=category_margin.index)
plt.title('Average Profit Margin by Product Category')
plt.xlabel('Profit Margin (%)')
plt.show()

## 💡 Insights
- South region generates the highest total sales.
- Electronics have the highest average profit margin.
- Sales peak during November–December.
- Top 5 products contribute to ~40% of total revenue.

These insights will be visualized further in the Power BI dashboard.