
# Sales Data - Exploratory Data Analysis (EDA)

## Dataset Information:
This dataset contains sales data for various products across different regions over a period of time.  
Columns in the dataset:  
- **Date**: The date of the sales record  
- **Product**: The type of product sold  
- **Region**: The region where the product was sold  
- **Price**: The price of a single unit of the product  
- **Units_Sold**: The number of units sold  
- **Total_Sales**: The total revenue generated (Price * Units_Sold)  

We will perform EDA to analyze sales trends, distributions, and correlations.


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load dataset
file_path = "sales_data.csv"
df = pd.read_csv(file_path)

# Convert Date column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Display first few rows
df.head()


In [None]:

# Display dataset columns
print("Columns in the dataset:")
print(df.columns.tolist())


## Data Summary and Missing Values

In [None]:

# Check dataset info
df.info()

# Check for missing values
df.isnull().sum()


## Sales Trend Over Time

In [None]:

# Plot sales over time
plt.figure(figsize=(12,6))
sns.lineplot(x=df['Date'], y=df['Total_Sales'], ci=None)
plt.xlabel("Date")
plt.ylabel("Total Sales")
plt.title("Sales Trend Over Time")
plt.xticks(rotation=45)
plt.show()


## Sales by Product Category

In [None]:

# Sales by Product
plt.figure(figsize=(10,5))
sns.barplot(x=df['Product'], y=df['Total_Sales'], estimator=sum, ci=None)
plt.xlabel("Product")
plt.ylabel("Total Sales")
plt.title("Total Sales by Product")
plt.xticks(rotation=45)
plt.show()


## Sales Distribution by Region

In [None]:

# Sales by Region
plt.figure(figsize=(8,5))
sns.boxplot(x=df['Region'], y=df['Total_Sales'])
plt.xlabel("Region")
plt.ylabel("Total Sales")
plt.title("Sales Distribution by Region")
plt.show()


## Correlation Analysis

In [None]:

# Correlation heatmap
plt.figure(figsize=(6,4))
sns.heatmap(df[['Price', 'Units_Sold', 'Total_Sales']].corr(), annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Feature Correlation")
plt.show()
