# 📊 Sales Data Analysis
**Objective:** Analyze sales data to extract insights on products, regions, salespersons, and trends.  

**Tools:** Python, Pandas, Matplotlib, Seaborn  

**Dataset:** 500 rows of simulated sales data with columns: Date, Product, Region, Salesperson, Quantity, UnitPrice, Sales

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid", palette="Set2")
df = pd.read_csv("sales_data.csv")
df["Date"] = pd.to_datetime(df["Date"])
df["Month"] = df["Date"].dt.month_name()

## 🔹 Step 1: Explore Dataset

- Preview first 10 rows to understand data structure.
- Check dataset info: columns, data types, and non-null values.
- Summary statistics for numeric columns: Quantity, UnitPrice, Sales.
- Check for missing values.

**Observation:**
- No missing values present.
- Sales values are calculated as Quantity × UnitPrice.


In [None]:
# Cell 2: Explore Dataset
display(df.head(10))
df.info()
df.describe()
df.isnull().sum()


## 🔹 Step 2: Basic Aggregation

- **Total Sales:** Sum of all sales.
- **Sales by Product:** Identify best-selling products.
- **Sales by Region:** Determine regions generating highest revenue.
- **Sales by Month:** Observe monthly trends.

**Observations:**
- Highest sales product: Laptop/Phone (based on bar chart).
- Top-performing region: North/South (from bar chart).
- Sales peak months: Feb or Mar (based on monthly bar chart).


In [None]:
# Cell 3: Basic Insights
total_sales = df["Sales"].sum()
sales_by_product = df.groupby("Product")["Sales"].sum().sort_values(ascending=False)
sales_by_region = df.groupby("Region")["Sales"].sum().sort_values(ascending=False)
sales_by_month = df.groupby("Month")["Sales"].sum()


## 🔹 Step 3: Visualizations

- Bar charts for Product, Region, and Month sales.
- Line chart for daily sales trend.
- Charts help identify top performers and trends over time.

**Observation:**
- Peaks in daily sales indicate high-demand days.
- Product/region performance is visually clear from charts.

In [None]:
# Cell 4: Sales by Product
plt.figure(figsize=(8,5))
sns.barplot(x=sales_by_product.index, y=sales_by_product.values)
plt.title("Sales by Product")
plt.ylabel("Total Sales")
plt.xticks(rotation=45)
plt.show()

# Sales by Region
plt.figure(figsize=(6,5))
sns.barplot(x=sales_by_region.index, y=sales_by_region.values)
plt.title("Sales by Region")
plt.ylabel("Total Sales")
plt.show()

# Daily Sales Trend
plt.figure(figsize=(12,5))
df.groupby("Date")["Sales"].sum().plot(kind="line", marker="o")
plt.title("Daily Sales Trend")
plt.ylabel("Sales")
plt.show()

# Sales by Month
plt.figure(figsize=(6,5))
sns.barplot(x=sales_by_month.index, y=sales_by_month.values)
plt.title("Sales by Month")
plt.ylabel("Total Sales")
plt.show()


## 🔹 Step 4: Advanced Insights

1. **Top 3 Salespersons**
   - Alice, Bob, etc., generate highest revenue.

2. **Best Region per Product**
   - Pivot table shows which region sells most of each product.

3. **Correlation Heatmap**
   - Quantity × UnitPrice has strong positive correlation with Sales.


In [None]:
# Cell 5: Advanced Insights
top_salespersons = df.groupby("Salesperson")["Sales"].sum().sort_values(ascending=False)
best_region_per_product = df.pivot_table(index="Product", columns="Region", values="Sales", aggfunc="sum")
correlation = df[["Quantity","UnitPrice","Sales"]].corr()

## ✅ Step 5: Conclusion

- **Best-Selling Product:** Laptop (highest total sales)
- **Top Region:** North (most revenue contribution)
- **Top Salesperson:** Alice (highest sales among team)
- **Trends:** Sales increase towards end of months, showing potential seasonal demand

**Insights for Action:**
- Focus marketing efforts on top products and regions
- Reward high-performing salespersons
- Monitor inventory for peak sales periods


In [None]:
# Top 3 Salespersons
plt.figure(figsize=(6,4))
sns.barplot(x=top_salespersons.index, y=top_salespersons.values)
plt.title("Top Salespersons by Sales")
plt.ylabel("Total Sales")
plt.show()

# Heatmap: Best Region per Product
plt.figure(figsize=(10,6))
sns.heatmap(best_region_per_product, annot=True, fmt=".0f", cmap="YlGnBu")
plt.title("Sales by Product and Region")
plt.show()

# Correlation Heatmap
plt.figure(figsize=(6,5))
sns.heatmap(correlation, annot=True, cmap="coolwarm")
plt.title("Correlation Between Quantity, UnitPrice, and Sales")
plt.show()
