# BAN CARBON Revenue Forecast

This notebook implements the revenue forecasting methodology described in `/home/fdvom/ban-carbon-hq/research/revenue-forecast/context/forecasting-methodology.md`.

## Overview

The forecast computes M-month grassroots donation revenue based on:
- **Channels**: Ways to reach potential donors
- **Segments**: Donor groups with shared characteristics
- **Funnel parameters**: Reachable audience, lead rates, conversion rates
- **Donor behavior**: Gift amounts and attrition rates

## Revenue Model

```
revenue = Σ_c Σ_s [ n_cs * lead_c * conv * Σ_{t=m_c}^M gift_s * (1 - attr)^{t - m_c} ]
```

In [25]:
import pandas as pd
import numpy as np
from pathlib import Path

## 1. Load Tables

In [27]:
# Define paths
data_dir = Path("../../../data/raw/revenue-forecasts")

# Load core tables
channels = pd.read_csv(data_dir / "channels.csv")
segments = pd.read_csv(data_dir / "segments.csv")

# Load lead rates and gift amounts
leads_raw = pd.read_csv(data_dir / "lead-rates/combined.csv")
gifts_raw = pd.read_csv(data_dir / "gift-amounts/combined.csv")

# Average lead rates across LLMs
leads = leads_raw[['id_cha']].copy()
leads['lead'] = leads_raw[['lead_chatgpt', 'lead_claude', 'lead_gemini']].mean(axis=1)

# Average gift amounts across LLMs
gifts = gifts_raw[['id_seg']].copy()
gifts['gift'] = gifts_raw[['gift_chatgpt', 'gift_claude', 'gift_gemini']].mean(axis=1)

# Load audience data
# NOTE: I've ignored claude and gemini because they seem overoptimistic,
audience = pd.read_csv(data_dir / "audience-size/chatgpt.csv")

# Load channel sequencing from all three sources
seq_chatgpt = pd.read_csv(data_dir / "channel-sequence/chatgpt.csv")
seq_claude = pd.read_csv(data_dir / "channel-sequence/claude.csv")
seq_gemini = pd.read_csv(data_dir / "channel-sequence/gemini.csv")

# Quick check: verify same id_cha keys
assert set(seq_chatgpt['id_cha']) == set(seq_claude['id_cha']) == set(seq_gemini['id_cha']), "Channel IDs don't match!"

# Rename month columns and merge
seq_chatgpt = seq_chatgpt.rename(columns={'month': 'month_chatgpt'})
seq_claude = seq_claude.rename(columns={'month': 'month_claude'})
seq_gemini = seq_gemini.rename(columns={'month': 'month_gemini'})

channel_seq = seq_chatgpt[['id_cha', 'month_chatgpt']].merge(
    seq_claude[['id_cha', 'month_claude']], on='id_cha', how='outer'
).merge(
    seq_gemini[['id_cha', 'month_gemini']], on='id_cha', how='outer'
)

# Average and round to nearest integer
channel_seq['month'] = channel_seq[['month_chatgpt', 'month_claude', 'month_gemini']].mean(axis=1).round().astype(int)

# Drop LLM-specific columns
channel_seq = channel_seq[['id_cha', 'month']]

## 2. Define Scalar Parameters

In [28]:
# Load structural parameters
struct_params = pd.read_csv(data_dir / "structural-parameters.csv")
conv = struct_params['conv'].iloc[0]
attr = struct_params['attr'].iloc[0]

# Forecast horizon in months
M = 12

## 3. Calculate Revenue

Implement the revenue formula by channel and segment.

In [29]:
# Step 1: Create base table with all channel-segment pairs
base = audience.merge(leads[['id_cha', 'lead']], on='id_cha', how='left')
base = base.merge(gifts[['id_seg', 'gift']], on='id_seg', how='left')
base = base.merge(channel_seq[['id_cha', 'month']], on='id_cha', how='left')

# Rename for clarity
base = base.rename(columns={'month': 'm_c', 'lead': 'lead_c', 'gift': 'gift_s', 'n': 'n_cs'})

# Step 2: Calculate lifetime value for each cohort
# LTV = Σ_{t=m_c}^M gift_s × (1 - attr)^{t - m_c}
# This is a geometric series: gift_s × [1 + (1-attr) + (1-attr)^2 + ... + (1-attr)^{M-m_c}]

def calculate_ltv(m_c, gift_s, attr, M):
    """Calculate lifetime value for a donor acquired in month m_c"""
    if pd.isna(m_c) or m_c > M:
        return 0
    
    n_periods = M - m_c + 1
    
    # Geometric series sum: (1 - r^n) / (1 - r) where r = (1 - attr)
    if attr == 0:
        return gift_s * n_periods
    else:
        r = 1 - attr
        return gift_s * (1 - r**n_periods) / (1 - r)

base['ltv'] = base.apply(lambda row: calculate_ltv(row['m_c'], row['gift_s'], attr, M), axis=1)

# Step 3: Calculate expected donors and revenue per channel-segment
base['expected_donors'] = base['n_cs'] * base['lead_c'] * conv
base['revenue'] = base['expected_donors'] * base['ltv']

## 4. Results

Display the total forecasted revenue and breakdowns by channel/segment.

In [30]:
# Step 4: Aggregate and display results

# Total revenue
total_revenue = base['revenue'].sum()
print(f"Total Forecasted Revenue (M={M} months): ${total_revenue:,.2f}")
print()

# Revenue by channel
revenue_by_channel = base.groupby('id_cha')['revenue'].sum().reset_index()
revenue_by_channel = revenue_by_channel.merge(channels[['id_cha', 'name_cha']], on='id_cha')
revenue_by_channel = revenue_by_channel.sort_values('revenue', ascending=False)
revenue_by_channel = revenue_by_channel.set_index('name_cha')['revenue']
print("Revenue by Channel:")
print(revenue_by_channel.to_string())
print()

# Revenue by segment
revenue_by_segment = base.groupby('id_seg')['revenue'].sum().sort_values(ascending=False)
print("Revenue by Segment:")
print(revenue_by_segment.to_string())
print()

# Summary statistics
print(f"Number of channel-segment pairs: {len(base)}")
print(f"Total expected donors: {base['expected_donors'].sum():,.0f}")
print(f"Average revenue per donor: ${total_revenue / base['expected_donors'].sum():,.2f}")

Total Forecasted Revenue (M=12 months): $105,179.97

Revenue by Channel:
name_cha
Catholic Climate Covenant                                                29448.070529
National Catholic Reporter / EarthBeat                                   21414.502152
Laudato Si' Action Platform (LSAP)                                       15245.929118
Diocesan and parish media (bulletins and e-newsletters)                  14035.523945
Laudato Si' Movement (global and U.S. chapters)                           6169.648429
America Magazine (Jesuit Review)                                          4337.261390
GreenFaith                                                                3802.629717
Commonweal Magazine                                                       2849.181162
Interfaith Power & Light (IPL) national network                           2476.752225
FADICA (Foundations and Donors Interested in Catholic Activities)         1863.829698
Catholic podcasts and webinar series                      