# E-Commerce Business Performance Analytics

This notebook analyzes an e-commerce company's performance using synthetic but realistic transactional data.

We will cover:
- Data understanding and cleaning
- Exploratory data analysis (EDA)
- Revenue and order trends
- Customer behavior and segmentation
- Product and category performance
- Payment method preferences

Each section includes **business-focused explanations** and **visualizations** suitable for stakeholders.


In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use("seaborn-v0_8")
sns.set_palette("Set2")

base_dir = r"C:\Users\Archana\Desktop\data analysis\E-Commerce-Analytics"
raw_dir = os.path.join(base_dir, "data", "raw")
processed_dir = os.path.join(base_dir, "data", "processed")
os.makedirs(processed_dir, exist_ok=True)

customers_path = os.path.join(raw_dir, "customers.csv")
orders_path = os.path.join(raw_dir, "orders.csv")
order_items_path = os.path.join(raw_dir, "order_items.csv")
products_path = os.path.join(raw_dir, "products.csv")
payments_path = os.path.join(raw_dir, "payments.csv")

customers = pd.read_csv(customers_path, parse_dates=["signup_date"])
orders = pd.read_csv(orders_path, parse_dates=["order_date"])
order_items = pd.read_csv(order_items_path)
products = pd.read_csv(products_path)
payments = pd.read_csv(payments_path)

customers.head(), orders.head(), order_items.head(), products.head(), payments.head()


## Data quality checks and cleaning

In this section we:
- Inspect schema and key distributions
- Handle data types and duplicates
- Create a unified order-level fact table for analysis

We focus on **delivered and shipped orders** to represent realized revenue.


In [None]:
# Basic info
print("Customers:")
print(customers.info())
print("\nOrders:")
print(orders.info())
print("\nOrder items:")
print(order_items.info())
print("\nProducts:")
print(products.info())
print("\nPayments:")
print(payments.info())

# Check duplicates
print("\nDuplicate counts:")
print({
    "customers": customers.duplicated(subset=["customer_id"]).sum(),
    "orders": orders.duplicated(subset=["order_id"]).sum(),
    "order_items": order_items.duplicated(subset=["order_item_id"]).sum(),
    "products": products.duplicated(subset=["product_id"]).sum(),
})

# Drop any full-duplicate rows just in case
customers = customers.drop_duplicates()
orders = orders.drop_duplicates()
order_items = order_items.drop_duplicates()
products = products.drop_duplicates()
payments = payments.drop_duplicates()

# Ensure types
orders["order_status"] = orders["order_status"].astype("category")
products["product_category"] = products["product_category"].astype("category")
payments["payment_type"] = payments["payment_type"].astype("category")

orders["order_date"] = pd.to_datetime(orders["order_date"])
customers["signup_date"] = pd.to_datetime(customers["signup_date"])

orders.head()
