ðŸ“Œ Problem Statement

An e-commerce company wants to better understand its sales performance, customer behavior, and operational patterns. The business generates thousands of orders each year across multiple regions and product categories. However, raw transactional data alone does not provide the insights needed for decision-making.

To support strategic planning, the company needs an analytical workflow capable of answering key business questions such as:

1. How do different product categories contribute to overall revenue?

The business wants to identify which categories generate the most sales, attract the most customer orders, and deliver the highest average order value.

2. How does revenue vary across regions and months?

Understanding seasonal trends and regional performance helps guide inventory decisions, marketing campaigns, and staffing.

3. Which customers are most valuable, and how frequently do they return?

The company wants to measure repeat-purchase behavior by analyzing first vs. last order dates, spending patterns, and engagement over time.

4. How many orders are abandoned (unpaid), and does this indicate a checkout-funnel issue?

A portion of customers fail to complete payment. Identifying the rate of abandoned orders helps diagnose revenue leaks and customer-experience problems.

ðŸŽ¯ Project Goal

The goal of this project is to load and analyze an e-commerce dataset to produce actionable insights regarding:

Product category performance

Monthly and regional revenue patterns

Customer lifetime behavior

Abandoned order analysis

The project uses Python and Pandas to perform data loading, transformation, aggregation, and summary reporting. The final deliverables help business teams understand trends, evaluate customer value, and detect potential issues in the purchasing process.

In [30]:
# ecommerce_analysis_with_csv.py
import pandas as pd

# --- Load dataset ---
orders = pd.read_csv("orders.csv", parse_dates=["order_date"])

# --- Abandoned cart analysis ---
abandoned = orders[orders["paid"] == False]
num_abandoned = len(abandoned)
total_orders = len(orders)
abandoned_rate = num_abandoned / total_orders if total_orders else 0
print("tatol orders:" total_orders)

# --- Aggregation ---
cat_sales = orders.groupby("product_category").agg(
    total_revenue=("revenue", "sum"),
    orders=("order_id", "count"),
    avg_order_value=("revenue", "mean")
).sort_values("total_revenue", ascending=False)

# --- Monthly revenue pivot ---
orders["month"] = orders["order_date"].dt.to_period("M")
monthly = orders.pivot_table(
    index="month",
    columns="region",
    values="revenue",
    aggfunc="sum"
).fillna(0)

# --- Repeat customers ---
cust = orders.groupby("customer_id").agg(
    first_order=("order_date", "min"),
    last_order=("order_date", "max"),
    num_orders=("order_id", "count"),
    total_spend=("revenue", "sum")
)
cust["days_between"] = (cust["last_order"] - cust["first_order"]).dt.days

# --- Outputs ---
print("Category sales summary:\n", cat_sales)
print("\nMonthly revenue (first 6 months):\n", monthly.head(6))
print("\nTop customers:\n", cust.sort_values("total_spend", ascending=False).head(5))
print(f"\nAbandoned carts: {num_abandoned}")
print(f"Abandoned cart rate: {abandoned_rate:.2%}\n")


SyntaxError: invalid syntax (1466918014.py, line 12)