# LTV Foundations – Probabilistic Building Blocks

This notebook prepares the **behavioural probability components** required for
benchmark 2-year LTV modelling.

Scope:
- Ecosystem expansion probability (WHEN)
- Ecosystem transition conditional on expansion (WHAT)

Out of scope:
- Revenue modelling
- ML / regression
- Optimisation

All outputs from this notebook are **inputs** to the final LTV model.


In [2]:
import pandas as pd
from pathlib import Path

DATA_DIR = Path("../data/interim")

df = pd.read_parquet(DATA_DIR / "ecosystem_add_events.parquet")

print("Shape:", df.shape)
print("\nColumns:")
print(df.columns.tolist())

# quick peek
df.head(10)


Shape: (77894, 8)

Columns:
['anon', 'purchase_k', 'date', 'order_ecos', 'added_ecosystem', 'has_expansion', 'prev_ecos_str', 'curr_ecos_str']


Unnamed: 0,anon,purchase_k,date,order_ecos,added_ecosystem,has_expansion,prev_ecos_str,curr_ecos_str
0,ANON_0000009,2,2025-11-19,[bottle],bottle,True,pitcher,"bottle, pitcher"
1,ANON_0000011,2,2024-02-19,"[other, pitcher]",other,True,pitcher,"other, pitcher"
2,ANON_0000012,2,2024-09-26,"[pitcher, bottle]",bottle,True,pitcher,"bottle, pitcher"
3,ANON_0000019,2,2023-06-20,[bottle],,False,bottle,bottle
4,ANON_0000027,2,2023-04-06,[pitcher],pitcher,True,bottle,"bottle, pitcher"
5,ANON_0000029,2,2025-02-08,"[pitcher, bottle]",,False,"bottle, pitcher","bottle, pitcher"
6,ANON_0000033,2,2025-02-09,[bottle],,False,bottle,bottle
7,ANON_0000037,2,2023-04-19,[bottle],,False,bottle,bottle
8,ANON_0000040,2,2023-06-25,[bottle],,False,bottle,bottle
9,ANON_0000043,2,2023-07-03,[pitcher],,False,"bottle, pitcher","bottle, pitcher"


In [8]:
import pandas as pd

def count_ecos(ecos_str):
    if pd.isna(ecos_str):
        return 0
    s = str(ecos_str).strip()
    if s == "":
        return 0

    # split by comma and trim whitespace around each ecosystem
    parts = [p.strip() for p in s.split(",")]

    # drop empty tokens (just in case)
    parts = [p for p in parts if p != ""]

    return len(parts)

df["prev_ecos_count"] = df["prev_ecos_str"].apply(count_ecos)


df[["prev_ecos_str", "prev_ecos_count"]].drop_duplicates().sort_values(
    "prev_ecos_count"
).head(15)


Unnamed: 0,prev_ecos_str,prev_ecos_count
0,pitcher,1
5661,keton,1
24169,PushAir,1
25113,other,1
141,flow_comfort,1
28398,CO2,1
31,container,1
3,bottle,1
16,sink,1
40550,Proskin,1


In [11]:
check = df[df["has_expansion"] == 1].copy()

# derive current ecosystem count the same way
check["curr_ecos_count"] = check["curr_ecos_str"].apply(count_ecos)

(check["curr_ecos_count"] > check["prev_ecos_count"]).value_counts()


True    18789
Name: count, dtype: int64

### WHEN component: Expansion probability

This table estimates the probability that a customer expands into a new ecosystem
at purchase `k`, conditional on how many ecosystems they already owned before
that purchase.

P(expansion | purchase_k, prev_ecos_count)

This component captures *timing* only.
Destination choice is handled separately in the WHAT matrices.


In [12]:
expansion_when = (
    df
    .groupby(["purchase_k", "prev_ecos_count"], as_index=False)
    .agg(
        n_events=("anon", "count"),
        n_expansions=("has_expansion", "sum")
    )
)

expansion_when.head(10)


Unnamed: 0,purchase_k,prev_ecos_count,n_events,n_expansions
0,2,1,44493,11070
1,2,2,7849,1482
2,2,3,768,133
3,2,4,33,3
4,2,5,4,0
5,3,1,12048,3003
6,3,2,2449,524
7,3,3,229,36
8,3,4,8,1
9,3,5,1,0


In [13]:
expansion_when["p_expansion"] = (
    expansion_when["n_expansions"] / expansion_when["n_events"]
)

expansion_when.sort_values(
    ["purchase_k", "prev_ecos_count"]
).head(10)


Unnamed: 0,purchase_k,prev_ecos_count,n_events,n_expansions,p_expansion
0,2,1,44493,11070,0.248803
1,2,2,7849,1482,0.188814
2,2,3,768,133,0.173177
3,2,4,33,3,0.090909
4,2,5,4,0,0.0
5,3,1,12048,3003,0.249253
6,3,2,2449,524,0.213965
7,3,3,229,36,0.157205
8,3,4,8,1,0.125
9,3,5,1,0,0.0


## Expansion probability (WHEN component)

`expansion_when` estimates the probability that a customer expands into a new
ecosystem at purchase `k`, conditional on how many ecosystems they already owned.

This component captures *timing* only:
- WHEN an expansion happens
- NOT which ecosystem is chosen

It will later be combined multiplicatively with ecosystem transition matrices
(the WHAT component) when building LTV simulations.


### Support threshold for WHEN component

Cells with very low event counts produce unstable expansion probabilities.
We flag cells with insufficient support instead of dropping them,
so downstream simulations can choose how to handle them (pooling, truncation, etc.).


In [16]:
MIN_EVENTS = 50

expansion_when["is_supported"] = (
    expansion_when["n_events"] >= MIN_EVENTS
)

expansion_when["is_supported"].value_counts()


is_supported
False    42
True     20
Name: count, dtype: int64

In [17]:
expansion_when[
    ~expansion_when["is_supported"]
].sort_values(["purchase_k", "prev_ecos_count"]).head(10)


Unnamed: 0,purchase_k,prev_ecos_count,n_events,n_expansions,p_expansion,is_supported
3,2,4,33,3,0.090909,False
4,2,5,4,0,0.0,False
8,3,4,8,1,0.125,False
9,3,5,1,0,0.0,False
13,4,4,6,1,0.166667,False
16,5,3,48,6,0.125,False
17,5,4,3,0,0.0,False
18,5,5,2,0,0.0,False
21,6,3,23,3,0.130435,False
22,6,4,1,0,0.0,False


## Saved artifact: Expansion probability (WHEN)

This parquet file contains the benchmark expansion probability surface:

P(expansion | purchase_k, prev_ecos_count)

Columns:
- purchase_k
- prev_ecos_count
- n_events
- n_expansions
- p_expansion
- is_supported

This artifact is model-free and can be safely reused across
different LTV simulations and horizon assumptions.


In [18]:
from pathlib import Path

OUT_DIR = Path("../data/interim/ecosystem_ltv_components")
OUT_DIR.mkdir(parents=True, exist_ok=True)

expansion_when.to_parquet(
    OUT_DIR / "expansion_probability_when.parquet",
    index=False
)


## WHAT component: ecosystem transition matrices

These matrices encode *conditional destination choice*:
P(added ecosystem | previous ecosystems, expansion)

They are conditional co-engagement matrices:
- rows do NOT sum to 1
- expansion probability is handled separately (WHEN component)


In [19]:
from pathlib import Path
import pandas as pd

TM_DIR = Path("../data/interim/ecosystem_transitions")

# load one example matrix (k -> k+1)
tm = pd.read_parquet(
    TM_DIR / "ecosystem_transition_1_to_2.parquet"
)

tm.head()


Unnamed: 0,CO2,Proskin,PushAir,bottle,container,flow_comfort,keton,other,pitcher,sink
CO2,0.700944,0.002099,0.30745,0.141658,0.026233,0.024134,0.004197,0.001049,0.110178,0.009444
Proskin,0.0,0.846154,0.0,0.153846,0.015385,0.030769,0.015385,0.0,0.153846,0.015385
PushAir,0.297343,0.0,0.37818,0.297908,0.037309,0.066139,0.005653,0.010175,0.245336,0.012436
bottle,0.010069,0.000571,0.030447,0.871359,0.015419,0.007875,0.001743,0.048481,0.205675,0.006733
container,0.025357,0.000792,0.068146,0.267829,0.515055,0.034073,0.004754,0.068146,0.350238,0.017433


## Joint ecosystem state definition

At each purchase k, a customer is described by:
1. purchase_k
2. prev_ecos_count (number of ecosystems already owned)
3. prev_ecos_str (which ecosystems they already own)

Two stochastic mechanisms act on this state:
- WHEN: whether an expansion happens
- WHAT: which ecosystem is added, conditional on expansion

These components are kept separate and only combined during simulation.


In [25]:
state_support = (
    expansion_when[["purchase_k", "prev_ecos_count", "n_events", "is_supported"]]
    .sort_values(["purchase_k", "prev_ecos_count"])
)

state_support.head(10)


Unnamed: 0,purchase_k,prev_ecos_count,n_events,is_supported
0,2,1,44493,True
1,2,2,7849,True
2,2,3,768,True
3,2,4,33,False
4,2,5,4,False
5,3,1,12048,True
6,3,2,2449,True
7,3,3,229,True
8,3,4,8,False
9,3,5,1,False


In [26]:
state_support.shape


(62, 4)

### State compatibility

The WHEN component conditions on ecosystem *count* only.
The WHAT component conditions on the *identity* of ecosystems owned.

During simulation:
- WHEN decides if expansion occurs
- WHAT selects the destination ecosystem using prev_ecos_str


In [27]:
# states where expansion is possible AND supported
expandable_states = state_support[
    state_support["is_supported"]
]

expandable_states.head(10)


Unnamed: 0,purchase_k,prev_ecos_count,n_events,is_supported
0,2,1,44493,True
1,2,2,7849,True
2,2,3,768,True
5,3,1,12048,True
6,3,2,2449,True
7,3,3,229,True
10,4,1,4266,True
11,4,2,977,True
12,4,3,106,True
14,5,1,1804,True


In [28]:
# ensure we only use purchase rows
df_purch = df.copy()

# if df already only contains purchase rows, this is safe
df_purch["date"] = pd.to_datetime(df_purch["date"])

# first purchase per customer
first_purchase = (
    df_purch
    .groupby("anon", as_index=False)["date"]
    .min()
    .rename(columns={"date": "first_purchase_date"})
)

first_purchase.head()


Unnamed: 0,anon,first_purchase_date
0,ANON_0000009,2025-11-19
1,ANON_0000011,2024-02-19
2,ANON_0000012,2024-09-26
3,ANON_0000019,2023-06-20
4,ANON_0000027,2023-04-06


In [55]:
cust_day_group = pd.read_parquet("../data/interim/cust_day_group.parquet")

In [56]:
df_p = cust_day_group[cust_day_group["is_purchase"]].copy()

df_p["date"] = pd.to_datetime(
    df_p["date"],
    errors="coerce",
    dayfirst=True
).dt.normalize()


  df_p["date"] = pd.to_datetime(


In [57]:
df_p = cust_day_group[cust_day_group["is_purchase"]].copy()

df_p["date"] = pd.to_datetime(
    df_p["date"],
    errors="coerce",
    dayfirst=True
).dt.normalize()

orders = (
    df_p.dropna(subset=["date"])
       .drop_duplicates(subset=["anon", "date"])
       .sort_values(["anon", "date"])
       [["anon", "date"]]
       .copy()
)

orders.head(10)


  df_p["date"] = pd.to_datetime(


Unnamed: 0,anon,date
0,ANON_0000001,2022-09-11
3,ANON_0000002,2022-09-11
4,ANON_0000003,2022-09-11
5,ANON_0000004,2022-09-11
6,ANON_0000005,2022-09-11
7,ANON_0000006,2022-09-11
8,ANON_0000007,2022-09-11
11,ANON_0000008,2022-09-11
12,ANON_0000009,2022-10-11
13,ANON_0000009,2025-11-19


In [58]:
orders["purchase_k"] = orders.groupby("anon").cumcount() + 1
orders.head(10)


Unnamed: 0,anon,date,purchase_k
0,ANON_0000001,2022-09-11,1
3,ANON_0000002,2022-09-11,1
4,ANON_0000003,2022-09-11,1
5,ANON_0000004,2022-09-11,1
6,ANON_0000005,2022-09-11,1
7,ANON_0000006,2022-09-11,1
8,ANON_0000007,2022-09-11,1
11,ANON_0000008,2022-09-11,1
12,ANON_0000009,2022-10-11,1
13,ANON_0000009,2025-11-19,2


In [59]:
# first order date per customer (explicit column name)
first_order = (
    orders.groupby("anon")["date"].min()
    .rename("date_first")
)

orders = orders.join(first_order, on="anon")

orders["days_since_first"] = (orders["date"] - orders["date_first"]).dt.days

orders[["anon", "date", "date_first", "days_since_first", "purchase_k"]].head(10)


Unnamed: 0,anon,date,date_first,days_since_first,purchase_k
0,ANON_0000001,2022-09-11,2022-09-11,0,1
3,ANON_0000002,2022-09-11,2022-09-11,0,1
4,ANON_0000003,2022-09-11,2022-09-11,0,1
5,ANON_0000004,2022-09-11,2022-09-11,0,1
6,ANON_0000005,2022-09-11,2022-09-11,0,1
7,ANON_0000006,2022-09-11,2022-09-11,0,1
8,ANON_0000007,2022-09-11,2022-09-11,0,1
11,ANON_0000008,2022-09-11,2022-09-11,0,1
12,ANON_0000009,2022-10-11,2022-10-11,0,1
13,ANON_0000009,2025-11-19,2022-10-11,1135,2


In [61]:
orders_2y = orders[orders["days_since_first"].between(0, 1095)]

max_k_2y = orders_2y.groupby("anon")["purchase_k"].max()

max_k_2y.value_counts().sort_index()


purchase_k
1     281765
2      37719
3       9209
4       2978
5       1214
6        494
7        245
8        124
9         58
10        33
11        20
12        13
13         7
14         7
15         5
16         3
17         2
18         2
20         1
22         1
26         1
Name: count, dtype: int64

In [64]:
vc = max_k_2y.value_counts().sort_index()

cum = vc.cumsum() / vc.sum()

summary_k = (
    pd.DataFrame({"n_customers": vc, "cum_share": cum})
    .reset_index()
    .rename(columns={"index": "max_k_within_3y"})
)

summary_k.head(15), summary_k.tail(10)


(    purchase_k  n_customers  cum_share
 0            1       281765   0.843858
 1            2        37719   0.956823
 2            3         9209   0.984403
 3            4         2978   0.993321
 4            5         1214   0.996957
 5            6          494   0.998437
 6            7          245   0.999170
 7            8          124   0.999542
 8            9           58   0.999715
 9           10           33   0.999814
 10          11           20   0.999874
 11          12           13   0.999913
 12          13            7   0.999934
 13          14            7   0.999955
 14          15            5   0.999970,
     purchase_k  n_customers  cum_share
 11          12           13   0.999913
 12          13            7   0.999934
 13          14            7   0.999955
 14          15            5   0.999970
 15          16            3   0.999979
 16          17            2   0.999985
 17          18            2   0.999991
 18          20            1   0.999994

## Horizon + truncation rule (benchmark)

We use a 3-year horizon (1095 days from first order).

For modelling, we truncate purchase index at K_max = 5:
- purchase_k_bucket = 1..5
- any purchase_k >= 6 is pooled into the tail bucket "6+"


In [66]:
K_MAX = 5
HORIZON_DAYS = 1095  # 3 years

orders_h = orders[orders["days_since_first"].between(0, HORIZON_DAYS)].copy()

orders_h["k_bucket"] = orders_h["purchase_k"].clip(upper=K_MAX + 1)
orders_h["k_bucket"] = orders_h["k_bucket"].astype(int)

# optional: label tail nicely
orders_h["k_bucket_label"] = orders_h["k_bucket"].astype(str)
orders_h.loc[orders_h["k_bucket"] == K_MAX + 1, "k_bucket_label"] = f"{K_MAX+1}+"

orders_h[["anon", "date", "purchase_k", "k_bucket_label"]].head(10)


Unnamed: 0,anon,date,purchase_k,k_bucket_label
0,ANON_0000001,2022-09-11,1,1
3,ANON_0000002,2022-09-11,1,1
4,ANON_0000003,2022-09-11,1,1
5,ANON_0000004,2022-09-11,1,1
6,ANON_0000005,2022-09-11,1,1
7,ANON_0000006,2022-09-11,1,1
8,ANON_0000007,2022-09-11,1,1
11,ANON_0000008,2022-09-11,1,1
12,ANON_0000009,2022-10-11,1,1
15,ANON_0000010,2022-10-11,1,1


In [67]:
orders_h["k_bucket_label"].value_counts().sort_index()


k_bucket_label
1     333901
2      52136
3      14417
4       5208
5       2230
6+      2269
Name: count, dtype: int64

In [68]:
import json
from pathlib import Path

OUT_DIR = Path("../data/interim/ecosystem_ltv_components")
OUT_DIR.mkdir(parents=True, exist_ok=True)

rule = {
    "horizon_days": HORIZON_DAYS,
    "horizon_years": 3,
    "k_max": K_MAX,
    "tail_bucket_label": f"{K_MAX+1}+",
    "definition": "purchase_k is order index per customer; k_bucket pools purchase_k >= k_max+1"
}

with open(OUT_DIR / "horizon_3y_k_rule.json", "w") as f:
    json.dump(rule, f, indent=2)


In [69]:
import pandas as pd
from pathlib import Path

K_MAX = 5

# load the saved WHEN table (if you already have it in memory, you can skip this load)
WHEN_PATH = Path("../data/interim/ecosystem_ltv_components/expansion_probability_when.parquet")
expansion_when = pd.read_parquet(WHEN_PATH)

# create k_bucket: 1..5, and 6+ pooled
expansion_when["k_bucket"] = expansion_when["purchase_k"].clip(upper=K_MAX + 1).astype(int)

expansion_when["k_bucket_label"] = expansion_when["k_bucket"].astype(str)
expansion_when.loc[expansion_when["k_bucket"] == K_MAX + 1, "k_bucket_label"] = f"{K_MAX+1}+"

expansion_when[["purchase_k", "k_bucket_label", "prev_ecos_count", "n_events", "n_expansions", "p_expansion"]].head(10)


Unnamed: 0,purchase_k,k_bucket_label,prev_ecos_count,n_events,n_expansions,p_expansion
0,2,2,1,44493,11070,0.248803
1,2,2,2,7849,1482,0.188814
2,2,2,3,768,133,0.173177
3,2,2,4,33,3,0.090909
4,2,2,5,4,0,0.0
5,3,3,1,12048,3003,0.249253
6,3,3,2,2449,524,0.213965
7,3,3,3,229,36,0.157205
8,3,3,4,8,1,0.125
9,3,3,5,1,0,0.0


In [70]:
expansion_when_bucketed = (
    expansion_when
    .groupby(["k_bucket_label", "prev_ecos_count"], as_index=False)
    .agg(
        n_events=("n_events", "sum"),
        n_expansions=("n_expansions", "sum")
    )
)

expansion_when_bucketed["p_expansion"] = (
    expansion_when_bucketed["n_expansions"] / expansion_when_bucketed["n_events"]
)

expansion_when_bucketed.sort_values(["k_bucket_label", "prev_ecos_count"]).head(20)


Unnamed: 0,k_bucket_label,prev_ecos_count,n_events,n_expansions,p_expansion
0,2,1,44493,11070,0.248803
1,2,2,7849,1482,0.188814
2,2,3,768,133,0.173177
3,2,4,33,3,0.090909
4,2,5,4,0,0.0
5,3,1,12048,3003,0.249253
6,3,2,2449,524,0.213965
7,3,3,229,36,0.157205
8,3,4,8,1,0.125
9,3,5,1,0,0.0


In [71]:
expansion_when_bucketed["k_bucket_label"].value_counts().sort_index()


k_bucket_label
2     5
3     5
4     4
5     5
6+    5
Name: count, dtype: int64

In [72]:
OUT_DIR = Path("../data/interim/ecosystem_ltv_components")
OUT_DIR.mkdir(parents=True, exist_ok=True)

expansion_when_bucketed.to_parquet(
    OUT_DIR / "expansion_probability_when_kbucket_3y_k5.parquet",
    index=False
)


In [73]:
from pathlib import Path

TM_DIR = Path("../data/interim/ecosystem_transitions")

files = sorted(TM_DIR.glob("ecosystem_transition_*_to_*.parquet"))
[len(files), files[:10]]

import re
import pandas as pd

pat = re.compile(r"ecosystem_transition_(\d+)_to_(\d+)\.parquet$")

pairs = []
for fp in files:
    m = pat.search(fp.name)
    if m:
        pairs.append((int(m.group(1)), int(m.group(2)), fp.name))

pairs_df = pd.DataFrame(pairs, columns=["k_from", "k_to", "file"]).sort_values(["k_from", "k_to"])
pairs_df


Unnamed: 0,k_from,k_to,file
0,1,2,ecosystem_transition_1_to_2.parquet
1,2,3,ecosystem_transition_2_to_3.parquet
2,3,4,ecosystem_transition_3_to_4.parquet
3,4,5,ecosystem_transition_4_to_5.parquet


## WHAT component: k-bucket alignment

Observed ecosystem transition matrices exist for:
1→2, 2→3, 3→4, 4→5

We align these to the 3-year, K_max=5 framework as follows:
- For k = 1..4: use the corresponding observed transition matrix
- For k >= 5 (tail bucket "6+"): reuse the 4→5 transition matrix

This assumes ecosystem destination choice stabilises after the 5th purchase.


In [74]:
import pandas as pd
from pathlib import Path

TM_DIR = Path("../data/interim/ecosystem_transitions")

WHAT = {
    1: pd.read_parquet(TM_DIR / "ecosystem_transition_1_to_2.parquet"),
    2: pd.read_parquet(TM_DIR / "ecosystem_transition_2_to_3.parquet"),
    3: pd.read_parquet(TM_DIR / "ecosystem_transition_3_to_4.parquet"),
    4: pd.read_parquet(TM_DIR / "ecosystem_transition_4_to_5.parquet"),
}


In [78]:
canon_index = WHAT[1].index
canon_cols  = WHAT[1].columns

WHAT_aligned = {
    k: WHAT[k].reindex(index=canon_index, columns=canon_cols)
    for k in WHAT
}

all(WHAT_aligned[k].index.equals(canon_index) and WHAT_aligned[k].columns.equals(canon_cols) for k in WHAT_aligned)


True

In [79]:
from pathlib import Path

OUT_DIR = Path("../data/interim/ecosystem_ltv_components/what_matrices_aligned_k5")
OUT_DIR.mkdir(parents=True, exist_ok=True)

for k, mat in WHAT_aligned.items():
    mat.to_parquet(OUT_DIR / f"what_transition_{k}_to_{k+1}.parquet")


In [80]:
import json
from pathlib import Path

OUT_DIR = Path("../data/interim/ecosystem_ltv_components/what_matrices_aligned_k5")
OUT_DIR.mkdir(parents=True, exist_ok=True)

meta = {
    "horizon_years": 3,
    "horizon_days": 1095,
    "k_max": 5,
    "what_matrices_available_for_k": [1, 2, 3, 4],
    "what_definition": "conditional co-engagement matrices (not stochastic kernels); rows need not sum to 1",
    "tail_handling": {
        "k_bucket": "6+",
        "policy": "stop",
        "details": "No WHAT transitions applied for k >= 5 due to missing 5→6 matrix; ecosystem expansion effectively ends after purchase 5 in WHAT component."
    },
    "ecosystem_labels": list(canon_cols)
}

with open(OUT_DIR / "what_metadata_3y_k5.json", "w") as f:
    json.dump(meta, f, indent=2)


In [81]:
meta


{'horizon_years': 3,
 'horizon_days': 1095,
 'k_max': 5,
 'what_matrices_available_for_k': [1, 2, 3, 4],
 'what_definition': 'conditional co-engagement matrices (not stochastic kernels); rows need not sum to 1',
 'tail_handling': {'k_bucket': '6+',
  'policy': 'stop',
  'details': 'No WHAT transitions applied for k >= 5 due to missing 5→6 matrix; ecosystem expansion effectively ends after purchase 5 in WHAT component.'},
 'ecosystem_labels': ['CO2',
  'Proskin',
  'PushAir',
  'bottle',
  'container',
  'flow_comfort',
  'keton',
  'other',
  'pitcher',
  'sink']}

## Checkpoint: Ecosystem LTV foundations (complete)

Completed in this notebook:
1) Horizon + truncation rule
- Horizon: 3 years (1095 days from first order)
- Order-level purchase index (purchase_k) built from unique (anon, date)
- Truncation: K_max = 5 with tail bucket "6+"
- Saved rule: data/interim/ecosystem_ltv_components/horizon_3y_k_rule.json

2) WHEN component (timing of expansion)
- Built P(expansion | purchase_k, prev_ecos_count)
- Added support flag (MIN_EVENTS = 50)
- Bucketed by k (1..5, 6+ pooled), pooled counts then recomputed p
- Saved: data/interim/ecosystem_ltv_components/expansion_probability_when_3y_k5.parquet

3) WHAT component (destination choice given expansion)
- Loaded WHAT matrices for k=1..4 (1→2, 2→3, 3→4, 4→5)
- Verified consistent 10×10 state space across k
- Enforced common ordering + saved aligned matrices:
  data/interim/ecosystem_ltv_components/what_matrices_aligned_k5/what_transition_{k}_to_{k+1}.parquet
- Tail policy: stop WHAT after k=5 (no 5→6 matrix)
- Saved metadata:
  data/interim/ecosystem_ltv_components/what_matrices_aligned_k5/what_metadata_3y_k5.json

Not done yet (next notebooks):
- Product-group foundations (entry products, retention, replenishment)
- Joint simulator wiring WHEN × WHAT (no revenue yet)
- Any revenue / margin / pricing overlay
