# E-commerce Sales Optimization Analysis

## Project Overview
This notebook analyzes the Sample Superstore Sales Dataset to identify:
- Top-performing products, categories, and regions
- Profitability vs revenue patterns
- Time trends and seasonality
- Business recommendations for sales optimization

**Dataset Source**: [Kaggle - Sales Forecasting](https://www.kaggle.com/datasets/rohitsahoo/sales-forecasting)

## 1. Setup and Data Loading

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Libraries imported successfully!")

In [None]:
# Load the dataset
try:
    df = pd.read_csv('../data/superstore_sales.csv', encoding='latin-1')
    print(f"Dataset loaded successfully! Shape: {df.shape}")
except FileNotFoundError:
    print("Dataset not found. Please download from Kaggle and place in data/ folder.")
    print("See data_download_instructions.md for details.")

## 2. Data Exploration and Cleaning

In [None]:
# Explore dataset structure
print("Dataset Info:")
print(df.info())
print("\nFirst 5 rows:")
display(df.head())
print("\nMissing values:")
print(df.isnull().sum())

In [None]:
# Data cleaning and preparation
df_clean = df.copy()

# Convert date columns
for col in df_clean.columns:
    if 'date' in col.lower():
        try:
            df_clean[col] = pd.to_datetime(df_clean[col])
            print(f"Converted {col} to datetime")
        except:
            print(f"Could not convert {col}")

# Find key columns
sales_col = next((col for col in df_clean.columns if 'sales' in col.lower()), None)
profit_col = next((col for col in df_clean.columns if 'profit' in col.lower()), None)

print(f"Sales column: {sales_col}")
print(f"Profit column: {profit_col}")

# Create profit margin
if sales_col and profit_col:
    df_clean['Profit_Margin'] = (df_clean[profit_col] / df_clean[sales_col]) * 100
    print("Created Profit_Margin column")