In [2]:
import pandas as pd

# Product-group foundations
entry_pg = pd.read_parquet("../data/interim/entry_counts.parquet")

T_pg = {
    1: pd.read_parquet("../data/interim/transition_1_to_2.parquet"),
    2: pd.read_parquet("../data/interim/transition_2_to_3.parquet"),
    3: pd.read_parquet("../data/interim/transition_3_to_4.parquet"),
    4: pd.read_parquet("../data/interim/transition_4_to_5.parquet"),
}

# Ecosystem foundations (already locked earlier)
T_ec = {
    1: pd.read_parquet("../data/interim/ecosystem_transitions/ecosystem_transition_1_to_2.parquet"),
    2: pd.read_parquet("../data/interim/ecosystem_transitions/ecosystem_transition_2_to_3.parquet"),
    3: pd.read_parquet("../data/interim/ecosystem_transitions/ecosystem_transition_3_to_4.parquet"),
    4: pd.read_parquet("../data/interim/ecosystem_transitions/ecosystem_transition_4_to_5.parquet"),
}


In [7]:
order_sets = pd.read_parquet(
    "../data/interim/order_pg_ecosystem_sets_RAW.parquet"
)

timeline = pd.read_parquet("../data/interim/order_timeline_3y_k5.parquet")

order_sets = (
    order_sets
    .merge(
        timeline[["anon", "purchase_k", "n_purchases_in_horizon"]],
        on=["anon", "purchase_k"],
        how="inner"
    )
)

order_sets = order_sets[
    (order_sets["n_purchases_in_horizon"] >= 2) &
    (order_sets["purchase_k"] <= 5)
]

order_sets.to_parquet(
    "../data/interim/order_pg_ecosystem_sets_3y_k5.parquet",
    index=False
)



In [8]:
import pandas as pd

order_sets = pd.read_parquet(
    "../data/interim/order_pg_ecosystem_sets_3y_k5.parquet"
)

long = (
    order_sets
    .explode("product_groups")
    .explode("ecosystems")  # safe even if usually length=1
    .rename(columns={
        "product_groups": "product_group",
        "ecosystems": "ecosystem"
    })
)


In [9]:
counts = (
    long
    .groupby(["ecosystem", "purchase_k", "product_group"])
    .size()
    .rename("n")
    .reset_index()
)

counts["p"] = (
    counts["n"]
    / counts.groupby(["ecosystem", "purchase_k"])["n"].transform("sum")
)


In [10]:
P_pg_given_ec_k = counts.sort_values(
    ["ecosystem", "purchase_k", "p"],
    ascending=[True, True, False]
)

P_pg_given_ec_k.head()


Unnamed: 0,ecosystem,purchase_k,product_group,n,p
20,CO2,1,32_nabój CO2 - wymiana,720,0.540947
19,CO2,1,31_nabój CO2 - zakup,204,0.153268
22,CO2,1,34_suplementy PushAir,81,0.060856
18,CO2,1,30_saturator PushAir,79,0.059354
21,CO2,1,33_butelki do saturatora PushAir,78,0.058603


In [11]:
P_pg_given_ec_k.to_parquet(
    "../data/interim/P_product_group_given_ecosystem_k.parquet",
    index=False
)


## Relationship between Ecosystems and Product Groups

In this project, **ecosystems and product groups are modeled as separate but connected layers**.

### 1. Ecosystem dynamics (state evolution)
Ecosystems evolve **over purchase steps** according to a Markov process:

P(E_{k+1} | E_k)

These transitions describe **how the customer’s ecosystem changes from one purchase to the next**  
(e.g. Bottle → Pitcher → Mixed).

This is the **horizontal dynamic** of the LTV engine.

---

### 2. Product-group dynamics (within-product persistence)
Separately, product groups have their own step-based transitions:

P(G_{k+1} | G_k)

These capture **repeat / substitution behaviour of product groups across purchases**,  
independent of ecosystem context.

This is another **horizontal dynamic**, but in product-group space.

---

### 3. Product-group composition within an ecosystem (static, per step)
At each purchase step k, conditional on the current ecosystem, we observe:

P(G_k | E_k)

This is **not a transition**.

It describes:
- which product groups typically appear
- **given** the ecosystem
- **at the same purchase step**

This is a **vertical relationship** between ecosystem and basket composition.

---

### 4. Full conceptual structure

E_1  →  E_2  →  E_3  → ...
 |        |        |
 G_1      G_2      G_3

- Horizontal arrows: ecosystem transitions P(E_{k+1} | E_k)
- Vertical links: product-group composition P(G_k | E_k)
- Product-group transitions P(G_{k+1} | G_k) remain separate and unchanged

---

### 5. Key clarification
- Ecosystem transitions and product-group transitions are **not merged**
- P(product_group | ecosystem, k) is a **bridge**, not a replacement
- All three probability objects coexist and serve different roles in the LTV engine
