# **Time-Series Revenue Analysis**

### **Objective:**
- Analyze revenue trends over time
- Aggregate metrics at monthly granularity
- Prepare clean time-series data for cohort analysis

### **1) Install Necessary Libraries**

In [1]:
pip install pandas sqlalchemy psycopg2-binary

Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd
from sqlalchemy import create_engine

engine = create_engine(
    "postgresql://postgres:1205@localhost:5432/customer revenue analytics"
)

In [3]:
engine.connect()

<sqlalchemy.engine.base.Connection at 0x25054cfc440>

### **2) Monthly revenue aggregation**

In [6]:
query_monthly_revenue = """
SELECT
    DATE_TRUNC('month', d.full_date)::date AS revenue_month,
    SUM(f.revenue) AS total_revenue,
    COUNT(DISTINCT f.invoice_no) AS total_orders,
    COUNT(DISTINCT f.customer_key) AS active_customers
FROM fact_sales f
JOIN dim_date d
    ON f.date_key = d.date_key
GROUP BY DATE_TRUNC('month', d.full_date)
ORDER BY revenue_month;
"""

In [7]:
df_monthly_revenue = pd.read_sql(query_monthly_revenue, engine)
df_monthly_revenue.head()

Unnamed: 0,revenue_month,total_revenue,total_orders,active_customers
0,2009-12-01,663272.05,1900,1045
1,2010-01-01,531952.9,1296,786
2,2010-02-01,489399.58,1335,807
3,2010-03-01,635996.48,1907,1111
4,2010-04-01,560635.02,1615,998


### **3) Derived metrics**

In [8]:
df_monthly_revenue["avg_order_value"] = (
    df_monthly_revenue["total_revenue"] /
    df_monthly_revenue["total_orders"]
)

df_monthly_revenue["revenue_per_customer"] = (
    df_monthly_revenue["total_revenue"] /
    df_monthly_revenue["active_customers"]
)

### **4) Exporting CSV**

In [9]:
df_monthly_revenue.to_csv(
    "../data/processed/monthly_revenue_metrics.csv",
    index=False
)

### **5) Sanity Checks**

In [10]:
df_monthly_revenue.describe()
df_monthly_revenue.isna().sum()

revenue_month           0
total_revenue           0
total_orders            0
active_customers        0
avg_order_value         0
revenue_per_customer    0
dtype: int64

In [11]:
df_monthly_revenue.head()

Unnamed: 0,revenue_month,total_revenue,total_orders,active_customers,avg_order_value,revenue_per_customer
0,2009-12-01,663272.05,1900,1045,349.090553,634.710096
1,2010-01-01,531952.9,1296,786,410.457485,676.78486
2,2010-02-01,489399.58,1335,807,366.591446,606.443098
3,2010-03-01,635996.48,1907,1111,333.506282,572.454077
4,2010-04-01,560635.02,1615,998,347.142427,561.758537
