# Monthly Sales Analysis Project

**Objective:** Generate, analyze, and visualize monthly sales data for four products over one year.

**Tools:** NumPy, Pandas, Matplotlib, Seaborn

## 1. Setup & Imports

Import all necessary libraries for data generation, manipulation, and visualization.

In [2]:
# Data manipulation and analysis
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Custom utility functions
from utils import generate_random_sales, generate_monthly_dates

# Set visualization style for better-looking plots
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("âœ… All libraries imported successfully!")

ModuleNotFoundError: No module named 'seaborn'

## 2. Data Generation

Generate random monthly sales data for 4 products with different sales ranges.

In [None]:
# Generate 12 monthly dates (Jan 2025 - Dec 2025)
monthly_dates = generate_monthly_dates()

print("ðŸ“… Generated Monthly Dates:")
for i, date in enumerate(monthly_dates, 1):
    print(f"  Month {i:2d}: {date.strftime('%Y-%m-%d')}")

In [None]:
# Generate random sales for each product (12 months)
# Each product has different sales ranges based on their market performance

sales_product_a = generate_random_sales(min_val=50, max_val=100, size=12)
sales_product_b = generate_random_sales(min_val=30, max_val=80, size=12)
sales_product_c = generate_random_sales(min_val=20, max_val=60, size=12)
sales_product_d = generate_random_sales(min_val=10, max_val=50, size=12)

print("ðŸ“Š Generated Sales Data:")
print(f"Product A (50-100 units): {sales_product_a}")
print(f"Product B (30-80 units):  {sales_product_b}")
print(f"Product C (20-60 units):  {sales_product_c}")
print(f"Product D (10-50 units):  {sales_product_d}")

## 3. Create Initial DataFrame

Combine dates and sales data into a structured Pandas DataFrame.

In [None]:
# Create DataFrame with all the data
df_initial = pd.DataFrame({
    'Date': monthly_dates,
    'Product_A': sales_product_a,
    'Product_B': sales_product_b,
    'Product_C': sales_product_c,
    'Product_D': sales_product_d
})

print("âœ… Initial DataFrame created!")
print(f"\nShape: {df_initial.shape} (rows, columns)")
print("\nFirst few rows:")
df_initial.head()

In [None]:
# Display full DataFrame
print("ðŸ“‹ Complete Initial Dataset:")
df_initial

## 4. Save Initial Dataset

Save the raw generated data as `initial.csv` for reproducibility.

In [None]:
# Save to CSV file
df_initial.to_csv('data/initial.csv', index=False)

print("âœ… Data saved to 'data/initial.csv'")
print(f"   File contains {len(df_initial)} rows of monthly sales data")

## 5. Quick Data Validation

Verify that our generated data meets the requirements.

In [None]:
# Check basic statistics for each product
print("ðŸ“Š Sales Statistics Summary:\n")
print(df_initial[['Product_A', 'Product_B', 'Product_C', 'Product_D']].describe())

print("\nâœ… Validation:")
print(f"   Product A range: {df_initial['Product_A'].min()}-{df_initial['Product_A'].max()} (expected: 50-100)")
print(f"   Product B range: {df_initial['Product_B'].min()}-{df_initial['Product_B'].max()} (expected: 30-80)")
print(f"   Product C range: {df_initial['Product_C'].min()}-{df_initial['Product_C'].max()} (expected: 20-60)")
print(f"   Product D range: {df_initial['Product_D'].min()}-{df_initial['Product_D'].max()} (expected: 10-50)")

---

## âœ… Phase 1 Complete!

**What we've accomplished:**
- âœ… Generated 12 monthly dates
- âœ… Created random sales data for 4 products
- âœ… Built initial DataFrame
- âœ… Saved to `initial.csv`
- âœ… Validated data ranges

**Next Steps:** 
- Enhance DataFrame with calculated metrics
- Add quarterly information
- Identify max/min products per month