# Sales Data Analysis and Visualization

## Project Objective
This project aims to analyze a sales dataset to identify trends, top-performing products, and regional performance. The goal is to derive actionable insights that can help guide business strategy.

### 1. Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Set plot styles
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

### 2. Load and Inspect the Dataset

In [None]:
df = pd.read_csv('sales_data.csv')

# Display the first few rows
print("First 5 rows of the dataset:")
df.head()

In [None]:
# Get a concise summary of the dataframe
print("\nDataset Information:")
df.info()

### 3. Data Cleaning

In [None]:
# Check for missing values
print("Missing values before cleaning:")
print(df.isnull().sum())

In [None]:
# Handle missing values
# For 'Revenue' and 'Profit', we can fill with the median or recalculate if possible.
# Let's fill with the median for simplicity.
df['Revenue'].fillna(df['Revenue'].median(), inplace=True)
df['Profit'].fillna(df['Profit'].median(), inplace=True)

print("\nMissing values after cleaning:")
print(df.isnull().sum())

In [None]:
# Check for duplicate rows
print(f"\nNumber of duplicate rows: {df.duplicated().sum()}")

# Remove duplicates
df.drop_duplicates(inplace=True)
print(f"Number of duplicate rows after removal: {df.duplicated().sum()}")

In [None]:
# Convert 'Date' column to datetime objects
df['Date'] = pd.to_datetime(df['Date'])

# Verify data types
print("\nData types after conversion:")
df.info()

### 4. Descriptive Statistics

In [None]:
print("Descriptive Statistics:")
df.describe()

### 5. Time Series Analysis

In [None]:
# Extract Month and Quarter for trend analysis
df['Month'] = df['Date'].dt.to_period('M')
df['Quarter'] = df['Date'].dt.to_period('Q')

# Group by month to get monthly trends
monthly_sales = df.groupby('Month').agg({'Revenue': 'sum', 'Profit': 'sum'}).reset_index()
monthly_sales['Month'] = monthly_sales['Month'].dt.to_timestamp()

print("Monthly Sales and Profit:")
print(monthly_sales)

#### Visualization: Monthly Revenue and Profit Trends

In [None]:
plt.figure(figsize=(14, 7))
plt.plot(monthly_sales['Month'], monthly_sales['Revenue'], marker='o', linestyle='-', label='Total Revenue')
plt.plot(monthly_sales['Month'], monthly_sales['Profit'], marker='x', linestyle='--', label='Total Profit')
plt.title('Monthly Revenue and Profit Trends')
plt.xlabel('Month')
plt.ylabel('Amount ($)')
plt.legend()
plt.grid(True)
plt.show()

### 6. Product and Category Analysis

In [None]:
# Category-wise analysis
category_analysis = df.groupby('Category').agg({'Revenue': 'sum', 'Profit': 'sum'}).sort_values(by='Revenue', ascending=False)
print("Category-wise Revenue and Profit:")
print(category_analysis)

#### Visualization: Category-wise Revenue and Profit

In [None]:
category_analysis.plot(kind='bar', y=['Revenue', 'Profit'], secondary_y='Profit', figsize=(14, 7))
plt.title('Category-wise Revenue and Profit')
plt.xlabel('Category')
plt.ylabel('Total Revenue')
plt.show()

In [None]:
# Product-wise analysis
product_analysis = df.groupby('Product').agg({'Revenue': 'sum', 'Profit': 'sum'}).sort_values(by='Revenue', ascending=False)

# Top 5 products by revenue
top_5_revenue_products = product_analysis.nlargest(5, 'Revenue')
print("Top 5 Products by Revenue:")
print(top_5_revenue_products)

In [None]:
# Top 5 products by profit
top_5_profit_products = product_analysis.nlargest(5, 'Profit')
print("\nTop 5 Products by Profit:")
print(top_5_profit_products)

#### Visualization: Top 5 Products by Revenue

In [None]:
plt.figure(figsize=(12, 6))
sns.barplot(x=top_5_revenue_products.index, y=top_5_revenue_products['Revenue'], palette='viridis')
plt.title('Top 5 Products by Revenue')
plt.xlabel('Product')
plt.ylabel('Total Revenue')
plt.xticks(rotation=45)
plt.show()

### 7. Regional Analysis

In [None]:
# Region-wise analysis
region_analysis = df.groupby('Region').agg({'Revenue': 'sum', 'Profit': 'sum'}).sort_values(by='Revenue', ascending=False)
print("Region-wise Revenue and Profit:")
print(region_analysis)

#### Visualization: Revenue Share by Region (Pie Chart)

In [None]:
fig = px.pie(region_analysis, 
             values='Revenue', 
             names=region_analysis.index, 
             title='Revenue Share by Region',
             hole=0.3)
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()

### 8. Correlation Analysis

In [None]:
# Select numerical columns for correlation analysis
numerical_cols = ['Units_Sold', 'Unit_Price', 'Revenue', 'Cost', 'Profit']
correlation_matrix = df[numerical_cols].corr()

print("Correlation Matrix:")
print(correlation_matrix)

#### Visualization: Correlation Heatmap

In [None]:
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=.5)
plt.title('Correlation Matrix of Sales Data')
plt.show()

### 9. Insights and Recommendations

#### Key Insights:
1.  **Sales Trends**: The monthly sales and profit trends show seasonality, with potential peaks during certain months (e.g., holiday seasons). Understanding these patterns can help with inventory management and marketing campaigns.
2.  **Top Categories**: The `Electronics` category consistently generates the highest revenue and profit. This is a key area of strength for the business.
3.  **Top Products**: Products like `Laptop` and `Smartphone` are the primary drivers of revenue. Focusing marketing efforts on these items could yield significant returns.
4.  **Regional Performance**: The `East` and `West` regions contribute the most to total revenue. There might be an opportunity to grow the business in the `North` and `South` regions.
5.  **Correlations**: There is a strong positive correlation between `Revenue` and `Profit`, as expected. `Units_Sold` and `Revenue` are also highly correlated, indicating that sales volume is a major driver of revenue.

#### Actionable Recommendations:
1.  **Targeted Marketing**: Launch marketing campaigns for top-performing products (`Laptop`, `Smartphone`) in high-revenue regions (`East`, `West`) to maximize sales. For underperforming regions, consider promotional offers to boost market share.
2.  **Inventory Management**: Align inventory levels with the observed monthly sales trends. Increase stock for high-demand products and categories ahead of peak seasons to avoid stockouts.
3.  **Product Portfolio Strategy**: Invest more in the `Electronics` category. For lower-performing categories like `Books` or `Clothing`, consider bundling them with popular items or running special promotions to increase their sales.
4.  **Regional Growth**: Analyze the market dynamics in the `North` and `South` regions to understand why sales are lower. This could involve market research, competitor analysis, or customer surveys to identify barriers and opportunities.