
# Repurchase & Next-Purchase Intelligence

This notebook builds on your purchase data to deliver:
1. **Repurchase metrics** (repeat buyers, repurchase rate)
2. **Inter-purchase intervals** (overall and last-interval per user)
3. **Churn flag** per user (no repeat within *X* days)
4. **Next-purchase predictions** at two levels:
   - **ServiceType** (fine-grained)
   - **Category** (Walk vs Sitting), derived from `serviceType`
5. **Time-aware transitions** (recency-weighted, plus seasonal by month-of-year)
6. **Personalized ranking** that blends a user's own history with global patterns via shrinkage

> **Parameters you can tweak** live in the next cell (e.g., churn threshold, half-life for recency weighting, top-K predictions).


In [10]:
# --- Configuration ---
from pathlib import Path

# Where your CSVs live
DATA_DIR = Path('/Users/tree/Projects/tubitak-ai-agent/tubitakaiagentprojeleriiinverisetleri')  # change if needed, e.g., Path('/mnt/data')

# File & field assumptions
PURCHASE_FILE = 'Purchase.csv'           # has columns: ordercreatedtime, ownerid/user_id/id, serviceType
TIME_COL = 'ordercreatedtime'            # purchase timestamp
USER_ID_CANDIDATES = ['ownerid','user_id','id']
SERVICETYPE_CANDS = ['serviceType','servicetype','service_type']

# Category taxonomy
WALK_TYPES    = {'AdHoc','Planned','Package','Customize','Walking'}
SITTING_TYPES = {'Boarding','Sitting','CatBoarding','CatSitting'}

# Modeling knobs
CHURN_THRESHOLD_DAYS = 90         # flag churn risk if no repeat within this many days
HALFLIFE_DAYS = 90                # recency half-life for weighting transitions (larger -> slower decay)
PERSONALIZATION_M = 5             # shrinkage strength toward global distribution (higher = more global)
TOPK = 3                          # top-K predicted serviceTypes to output

# Seasonal model toggle (month-of-year conditioning). Keep True for diagnostics; predictions use recency-weighted + personalization.
USE_SEASONAL = True


In [None]:
# Add the .gitignore first
git add .gitignore

# Add the notebooks
git add notebooks/funnel_analysis.ipynb
git add notebooks/repurchase_next_purchase.ipynb

# Commit the changes
git commit -m "add funnel analysis and repurchase notebooks"

# Push to remote
git push origin main

In [None]:
# Add the .gitignore first
git add .gitignore

# Add the notebooks
git add notebooks/funnel_analysis.ipynb
git add notebooks/repurchase_next_purchase.ipynb

# Commit the changes
git commit -m "add funnel analysis and repurchase notebooks"

# Push to remote
git push origin main

In [None]:
# Add the .gitignore first
git add .gitignore

# Add the notebooks
git add notebooks/funnel_analysis.ipynb
git add notebooks/repurchase_next_purchase.ipynb

# Commit the changes
git commit -m "add funnel analysis and repurchase notebooks"

# Push to remote
git push origin main

In [None]:
# Add the .gitignore first
git add .gitignore

# Add the notebooks
git add notebooks/funnel_analysis.ipynb
git add notebooks/repurchase_next_purchase.ipynb

# Commit the changes
git commit -m "add funnel analysis and repurchase notebooks"

# Push to remote
git push origin main

In [None]:
# Add the .gitignore first
git add .gitignore

# Add the notebooks
git add notebooks/funnel_analysis.ipynb
git add notebooks/repurchase_next_purchase.ipynb

# Commit the changes
git commit -m "add funnel analysis and repurchase notebooks"

# Push to remote
git push origin main

In [None]:
# Add the .gitignore first
git add .gitignore

# Add the notebooks
git add notebooks/funnel_analysis.ipynb
git add notebooks/repurchase_next_purchase.ipynb

# Commit the changes
git commit -m "add funnel analysis and repurchase notebooks"

# Push to remote
git push origin main

In [11]:
# --- Helpers ---
import pandas as pd
import numpy as np

def coalesce_id(df, candidates):
    for c in candidates:
        if c in df.columns:
            return df[c]
    return pd.Series([np.nan]*len(df))

def parse_time_utc(s, local_tz='Europe/Istanbul'):
    ts = pd.to_datetime(s, errors='coerce', utc=True, infer_datetime_format=True)
    if getattr(ts.dt, 'tz', None) is None:
        ts = pd.to_datetime(s, errors='coerce')  # naive
        ts = ts.dt.tz_localize(local_tz, ambiguous='NaT', nonexistent='NaT').dt.tz_convert('UTC')
    return ts

def normalize_service(series: pd.Series) -> pd.Series:
    out = series.astype(str).str.strip()
    # treat placeholder strings as missing
    out = out.replace({'nan': np.nan, 'None': np.nan, 'NaT': np.nan, '': np.nan})
    return out

def service_category(st: str) -> str:
    if pd.isna(st):
        return np.nan
    s = str(st)
    if s in WALK_TYPES:
        return 'Walk'
    if s in SITTING_TYPES:
        return 'Sitting'
    return 'Other'

def season_month(ts):
    return ts.dt.tz_convert('Europe/Istanbul').dt.month

def days_between(t1, t0):
    return (t1 - t0).dt.total_seconds() / (24*3600)

def recency_weight(age_days, halflife_days):
    # weight = 0.5^(age/halflife)
    return np.power(0.5, np.clip(age_days, 0, None) / max(halflife_days, 1e-9))

def excel_safe(df, to_tz='Europe/Istanbul'):
    out = df.copy()
    for col in out.columns:
        if pd.api.types.is_datetime64tz_dtype(out[col]):
            out[col] = out[col].dt.tz_convert(to_tz).dt.tz_localize(None)
    return out


In [12]:
# --- Load purchases ---
from pathlib import Path
DATA_DIR = Path(DATA_DIR)
p_path = DATA_DIR / PURCHASE_FILE
p = pd.read_csv(p_path)

# Coalesce ids and timestamps
p['user_id'] = coalesce_id(p, USER_ID_CANDIDATES).astype(str).replace({'nan': np.nan})
if TIME_COL not in p.columns:
    raise KeyError(f"Time column '{TIME_COL}' not found in {PURCHASE_FILE}")
p['purchase_time'] = parse_time_utc(p[TIME_COL])

# serviceType
st_col = None
for c in SERVICETYPE_CANDS:
    if c in p.columns:
        st_col = c
        break
p['serviceType_raw'] = p[st_col] if st_col else np.nan
p['serviceType'] = normalize_service(p['serviceType_raw'])
p['category'] = p['serviceType'].apply(service_category)

# Drop invalid
p = p.dropna(subset=['user_id','purchase_time']).copy()
p['user_id'] = p['user_id'].astype(str)

# Sort chronologically per user
p = p.sort_values(['user_id','purchase_time']).reset_index(drop=True)

max_time = p['purchase_time'].max()
min_time = p['purchase_time'].min()
print('Loaded purchases:', len(p), '| Users:', p['user_id'].nunique())
print('Time range UTC:', min_time, '→', max_time)
p.head()

Loaded purchases: 28701 | Users: 1614
Time range UTC: 2024-12-24 21:41:02.022000+00:00 → 2025-08-04 09:34:52.801000+00:00


  ts = pd.to_datetime(s, errors='coerce', utc=True, infer_datetime_format=True)


Unnamed: 0,serviceid,ownerid,ordercreatedtime,servicetype,user_id,purchase_time,serviceType_raw,serviceType,category
0,d2ca43d6-737f-4bd6-9482-3149127453da,005fc863-7c9c-42ce-af56-38fed3c545f3,2025-05-14 18:20:44.706000,AdHoc,005fc863-7c9c-42ce-af56-38fed3c545f3,2025-05-14 18:20:44.706000+00:00,AdHoc,AdHoc,Walk
1,3a2b4eeb-7366-4ab8-b64f-da683dc22a6e,005fc863-7c9c-42ce-af56-38fed3c545f3,2025-05-15 18:58:14.754000,AdHoc,005fc863-7c9c-42ce-af56-38fed3c545f3,2025-05-15 18:58:14.754000+00:00,AdHoc,AdHoc,Walk
2,2a88a44d-7e9b-4c58-9eec-44e7c50d3d0b,00bc3aed-8e44-4a3f-8a87-c24e9c72106f,2025-05-23 18:28:44.253000,AdHoc,00bc3aed-8e44-4a3f-8a87-c24e9c72106f,2025-05-23 18:28:44.253000+00:00,AdHoc,AdHoc,Walk
3,77dfb189-b068-4034-a94d-9f62db799930,00bc3aed-8e44-4a3f-8a87-c24e9c72106f,2025-06-11 17:34:34.581000,AdHoc,00bc3aed-8e44-4a3f-8a87-c24e9c72106f,2025-06-11 17:34:34.581000+00:00,AdHoc,AdHoc,Walk
4,0e3f614f-dde4-49e9-b297-f295033dc2c4,00bc3aed-8e44-4a3f-8a87-c24e9c72106f,2025-06-24 17:06:10.411000,AdHoc,00bc3aed-8e44-4a3f-8a87-c24e9c72106f,2025-06-24 17:06:10.411000+00:00,AdHoc,AdHoc,Walk


In [13]:
# --- Repurchase metrics & interpurchase intervals ---
user_counts = p.groupby('user_id').size().rename('purchase_count')
buyers = int((user_counts >= 1).sum())
repeat_buyers = int((user_counts >= 2).sum())
repurchase_rate = repeat_buyers / buyers if buyers else np.nan

repurchase_summary = pd.DataFrame([{
    'buyers': buyers,
    'repeat_buyers': repeat_buyers,
    'repurchase_rate': repurchase_rate
}])

# Interpurchase intervals
p['prev_time'] = p.groupby('user_id')['purchase_time'].shift(1)
p['delta_days'] = days_between(p['purchase_time'], p['prev_time'])
intervals = p.dropna(subset=['delta_days']).copy()

def statblock(s):
    s = s.dropna()
    return pd.Series({
        'count': int(s.size),
        'mean_days': float(s.mean()) if s.size else np.nan,
        'median_days': float(s.median()) if s.size else np.nan,
        'p25_days': float(s.quantile(0.25)) if s.size else np.nan,
        'p75_days': float(s.quantile(0.75)) if s.size else np.nan,
        'min_days': float(s.min()) if s.size else np.nan,
        'max_days': float(s.max()) if s.size else np.nan,
    })

overall_interval_stats = statblock(intervals['delta_days']).to_frame().T
overall_interval_stats.insert(0, 'scope', 'all_intervals')

last_purchase = p.groupby('user_id').tail(1).copy()
second_last = p.groupby('user_id').nth(-2).reset_index()
last_interval = last_purchase.merge(
    second_last[['user_id','purchase_time']].rename(columns={'purchase_time':'prev_purchase_time'}),
    on='user_id', how='left'
)
last_interval['last_delta_days'] = days_between(last_interval['purchase_time'], last_interval['prev_purchase_time'])
last_interval = last_interval.dropna(subset=['last_delta_days'])
last_interval_stats = statblock(last_interval['last_delta_days']).to_frame().T
last_interval_stats.insert(0, 'scope', 'last_interval_per_user')

interval_stats = pd.concat([overall_interval_stats, last_interval_stats], ignore_index=True)

repurchase_summary, interval_stats.head()

(   buyers  repeat_buyers  repurchase_rate
 0    1614           1279         0.792441,
                     scope    count  mean_days  median_days  p25_days  \
 0           all_intervals  27087.0   2.022260     0.000000  0.000000   
 1  last_interval_per_user   1279.0   6.682605     1.160648  0.000841   
 
    p75_days  min_days    max_days  
 0  1.023671       0.0  137.020692  
 1  7.860352       0.0   84.976695  )

In [14]:
# --- Churn flag (no repeat within X days) ---
# For each user, compute days since last purchase relative to max_time in dataset
last_by_user = p.groupby('user_id').tail(1).copy()
last_by_user['days_since_last'] = days_between(max_time, last_by_user['purchase_time'])
last_by_user['churn_risk'] = last_by_user['days_since_last'] > CHURN_THRESHOLD_DAYS

churn_summary = last_by_user['churn_risk'].value_counts(dropna=False).rename_axis('churn_risk').to_frame('users').reset_index()
churn_summary

Unnamed: 0,churn_risk,users
0,False,1590
1,True,24


In [15]:
# --- Transition models ---
# Build consecutive purchase transitions per user (serviceType), also season/age
p['next_service'] = p.groupby('user_id')['serviceType'].shift(-1)
p['this_time'] = p['purchase_time']
p['next_time'] = p.groupby('user_id')['purchase_time'].shift(-1)
trans = p.dropna(subset=['serviceType','next_service','this_time','next_time']).copy()

# Global counts/probs
glob_counts = (trans.groupby(['serviceType','next_service'])
               .size().rename('count').reset_index())
glob_totals = glob_counts.groupby('serviceType')['count'].transform('sum')
glob_counts['prob'] = glob_counts['count'] / glob_totals

# Category transitions (serviceType -> category of next)
trans['next_category'] = trans['next_service'].apply(service_category)
cat_counts = (trans.groupby(['category','next_category']).size().rename('count').reset_index())
cat_totals = cat_counts.groupby('category')['count'].transform('sum')
cat_counts['prob'] = cat_counts['count'] / cat_totals

# Recency-weighted transitions (time-aware)
# Weight each transition by age relative to dataset max_time
trans['age_days'] = days_between(max_time, trans['this_time'])
trans['w'] = recency_weight(trans['age_days'], HALFLIFE_DAYS)
rw_counts = (trans.groupby(['serviceType','next_service'])['w'].sum()
             .rename('w_count').reset_index())
rw_totals = rw_counts.groupby('serviceType')['w_count'].transform('sum')
rw_counts['prob'] = np.where(rw_totals > 0, rw_counts['w_count'] / rw_totals, np.nan)

# Seasonal (by month-of-year)
if USE_SEASONAL:
    trans['month'] = season_month(trans['this_time'])
    seas_counts = (trans.groupby(['serviceType','month','next_service']).size()
                   .rename('count').reset_index())
    # Convert to conditional probabilities by (serviceType, month)
    seas_counts['total'] = seas_counts.groupby(['serviceType','month'])['count'].transform('sum')
    seas_counts['prob'] = seas_counts['count'] / seas_counts['total']
else:
    seas_counts = pd.DataFrame(columns=['serviceType','month','next_service','count','total','prob'])

rw_counts.sort_values(['serviceType','prob'], ascending=[True, False]).head(10)

Unnamed: 0,serviceType,next_service,w_count,prob
0,AdHoc,AdHoc,3849.470909,0.832128
4,AdHoc,Planned,698.303155,0.15095
2,AdHoc,Customize,52.005153,0.011242
1,AdHoc,Boarding,12.891097,0.002787
5,AdHoc,Sitting,7.536987,0.001629
6,AdHoc,WalkAndCare,3.701048,0.0008
3,AdHoc,Grooming,2.149005,0.000465
13,Boarding,WalkAndCare,63.564595,0.589692
8,Boarding,Boarding,22.843754,0.211923
7,Boarding,AdHoc,7.673942,0.071192


In [16]:
# --- Personalized ranking (shrinkage to global) + predictions ---
# We'll use recency-weighted globals as prior (rw_counts). For each user, compute their own last-k transitions from their last service.
# Shrinkage: posterior p(next|last_svc, user) = (c_user(next) + m * p_global(next|last_svc)) / (sum_user + m)

m = PERSONALIZATION_M

# Build a global prior dict: for each current service, a mapping to next prob
from collections import defaultdict

glob_prior = defaultdict(dict)
for _, r in rw_counts.iterrows():
    glob_prior[r['serviceType']][r['next_service']] = r['prob']

# If a current service has no recency-weighted prior, fall back to plain global
if not glob_prior:
    for _, r in glob_counts.iterrows():
        glob_prior[r['serviceType']][r['next_service']] = r['prob']

# User-specific last-k transitions from the SAME current service as their last purchase
# (We weight the user's own transitions by recency as well)
def user_transition_counts(df_user, last_service):
    # consider transitions where current service == last_service
    d = df_user[(df_user['serviceType'] == last_service) & (~df_user['next_service'].isna())].copy()
    if d.empty:
        return {}
    d['age_days'] = days_between(df_user['purchase_time'].max(), d['this_time'])
    d['w'] = recency_weight(d['age_days'], HALFLIFE_DAYS)
    by_next = d.groupby('next_service')['w'].sum()
    return by_next.to_dict()

# Predict top-K serviceTypes per user
last_txn = p.groupby('user_id').tail(1)[['user_id','serviceType','purchase_time','category']].rename(columns={'serviceType':'last_service'})
pred_rows = []

for _, row in last_txn.iterrows():
    uid = row['user_id']
    last_svc = row['last_service']
    last_cat = row['category']
    last_time = row['purchase_time']
    dfu = trans[trans['user_id'] == uid] if 'user_id' in trans.columns else pd.DataFrame()
    user_counts = user_transition_counts(dfu, last_svc) if not dfu.empty else {}
    sum_user = sum(user_counts.values()) if user_counts else 0.0

    prior = glob_prior.get(last_svc, {})
    # Combine with shrinkage
    next_candidates = set(prior.keys()) | set(user_counts.keys())
    scored = []
    for nxt in next_candidates:
        pu = user_counts.get(nxt, 0.0)
        pg = prior.get(nxt, np.nan)
        # if prior missing, use uniform over observed next candidates
        if np.isnan(pg):
            pg = 1.0 / max(len(next_candidates), 1)
        post = (pu + m * pg) / (sum_user + m if (sum_user + m) > 0 else 1.0)
        scored.append((nxt, post))
    if not scored:
        # fallback to most common overall serviceType
        most_common = p['serviceType'].value_counts(dropna=True)
        if len(most_common):
            nxt = most_common.index[0]
            scored = [(nxt, 1.0)]
        else:
            scored = []
    ranked = sorted(scored, key=lambda x: x[1], reverse=True)[:TOPK]

    # Expected next date: use user's median interpurchase if available; else global median
    user_ip = intervals[intervals['user_id']==uid]['delta_days']
    if user_ip.notna().any():
        median_days = float(user_ip.median())
    else:
        median_days = float(intervals['delta_days'].median()) if len(intervals) else np.nan
    expected_next_date = pd.NaT
    if not np.isnan(median_days):
        expected_next_date = last_time + pd.Timedelta(days=median_days)

    # Churn risk: if days since last > threshold
    days_since = (max_time - last_time).total_seconds()/(24*3600)
    churn_risk = days_since > CHURN_THRESHOLD_DAYS

    pred_rows.append({
        'user_id': uid,
        'last_purchase_time_utc': last_time,
        'last_serviceType': last_svc,
        'last_category': last_cat,
        'predicted_top1_serviceType': ranked[0][0] if ranked else np.nan,
        'predicted_top1_conf': ranked[0][1] if ranked else np.nan,
        'predicted_topk': [s for s,_ in ranked],
        'predicted_topk_conf': [float(c) for _,c in ranked],
        'predicted_top1_category': service_category(ranked[0][0]) if ranked else np.nan,
        'expected_next_date_utc': expected_next_date,
        'days_since_last': days_since,
        'churn_risk': churn_risk
    })

pred_df = pd.DataFrame(pred_rows)
pred_df.head()

Unnamed: 0,user_id,last_purchase_time_utc,last_serviceType,last_category,predicted_top1_serviceType,predicted_top1_conf,predicted_topk,predicted_topk_conf,predicted_top1_category,expected_next_date_utc,days_since_last,churn_risk
0,005fc863-7c9c-42ce-af56-38fed3c545f3,2025-05-15 18:58:14.754000+00:00,AdHoc,Walk,AdHoc,0.860106,"[AdHoc, Planned, Customize]","[0.8601064962269586, 0.12579163022558515, 0.00...",Walk,2025-05-16 19:35:44.801999999+00:00,80.608774,False
1,00bc3aed-8e44-4a3f-8a87-c24e9c72106f,2025-06-24 17:06:10.411000+00:00,AdHoc,Walk,AdHoc,0.877718,"[AdHoc, Planned, Customize]","[0.877717640908321, 0.10995576551523148, 0.008...",Walk,2025-07-10 16:24:53.490000+00:00,40.686602,False
2,00fc54bf-46d8-4328-8960-a40415005299,2025-06-17 16:29:28.280000+00:00,AdHoc,Walk,AdHoc,0.879096,"[AdHoc, Planned, Customize]","[0.8790963610663846, 0.10871602634486194, 0.00...",Walk,2025-07-05 15:35:18.397500+00:00,47.712089,False
3,015cdb4a-9f28-4d45-bc79-a374589bd648,2025-08-01 10:56:49.938000+00:00,AdHoc,Walk,AdHoc,0.687963,"[AdHoc, Planned, Customize]","[0.6879626440127831, 0.2508750931787598, 0.059...",Walk,2025-08-01 10:56:49.938000+00:00,2.943089,False
4,01a7af8b-0dea-45df-8eea-6c35682c359d,2025-07-19 11:38:31.598000+00:00,Customize,Walk,Customize,0.997701,"[Customize, Planned, AdHoc]","[0.997700959130749, 0.0015069800184234956, 0.0...",Walk,2025-07-19 11:38:31.598000+00:00,15.914134,False


In [17]:
# --- Seasonal diagnostic (optional) ---
if USE_SEASONAL and not seas_counts.empty:
    # For demonstration: show seasonal distribution for the most common current service
    top_current = p['serviceType'].value_counts(dropna=True).index[0]
    seas_demo = seas_counts[seas_counts['serviceType']==top_current].sort_values(['month','prob'], ascending=[True, False]).head(24)
    seas_demo.head(12)
else:
    'Seasonal model disabled or insufficient data.'

In [18]:
# --- Save outputs ---
OUT_DIR = DATA_DIR / 'repurchase_outputs_enhanced'
OUT_DIR.mkdir(exist_ok=True)

excel_engine = None
try:
    import xlsxwriter  # noqa: F401
    excel_engine = 'xlsxwriter'
except ImportError:
    try:
        import openpyxl  # noqa: F401
        excel_engine = 'openpyxl'
    except ImportError:
        excel_engine = None

# CSVs
repurchase_summary.to_csv(OUT_DIR / 'repurchase_summary.csv', index=False)
intervals[['user_id','prev_time','purchase_time','delta_days','serviceType','category']].to_csv(OUT_DIR / 'interpurchase_intervals.csv', index=False)
excel_safe(interval_stats).to_csv(OUT_DIR / 'interpurchase_interval_stats.csv', index=False)
glob_counts.to_csv(OUT_DIR / 'service_transition_global.csv', index=False)
rw_counts.to_csv(OUT_DIR / 'service_transition_recency_weighted.csv', index=False)
cat_counts.to_csv(OUT_DIR / 'category_transition_global.csv', index=False)
if USE_SEASONAL and not seas_counts.empty:
    seas_counts.to_csv(OUT_DIR / 'service_transition_seasonal_monthly.csv', index=False)
excel_safe(pred_df).to_csv(OUT_DIR / 'user_next_service_predictions.csv', index=False)
excel_safe(last_by_user[['user_id','purchase_time','days_since_last','churn_risk']]).to_csv(OUT_DIR / 'user_churn_flags.csv', index=False)

# Excel pack (if an engine exists)
if excel_engine:
    excel_path = OUT_DIR / 'repurchase_and_prediction_report.xlsx'
    with pd.ExcelWriter(excel_path, engine=excel_engine) as writer:
        excel_safe(repurchase_summary).to_excel(writer, sheet_name='01_repurchase', index=False)
        excel_safe(interval_stats).to_excel(writer, sheet_name='02_interval_stats', index=False)
        excel_safe(glob_counts).to_excel(writer, sheet_name='03_transitions_global', index=False)
        excel_safe(rw_counts).to_excel(writer, sheet_name='04_transitions_recency', index=False)
        excel_safe(cat_counts).to_excel(writer, sheet_name='05_category_transitions', index=False)
        if USE_SEASONAL and not seas_counts.empty:
            excel_safe(seas_counts).to_excel(writer, sheet_name='06_transitions_seasonal', index=False)
        excel_safe(pred_df).to_excel(writer, sheet_name='07_user_predictions', index=False)
        excel_safe(last_by_user[['user_id','purchase_time','days_since_last','churn_risk']]).to_excel(writer, sheet_name='08_churn_flags', index=False)
    excel_path
else:
    print("No Excel engine available; wrote CSVs only. Install XlsxWriter or openpyxl to get the Excel pack.")

  if pd.api.types.is_datetime64tz_dtype(out[col]):
  if pd.api.types.is_datetime64tz_dtype(out[col]):
  if pd.api.types.is_datetime64tz_dtype(out[col]):


### Notes
- **Churn flag** uses `CHURN_THRESHOLD_DAYS` against the dataset's max timestamp. Adjust to fit your business cadence (e.g., 60/90/120 days).
- **Recency weighting** uses a **half-life** (`HALFLIFE_DAYS`). Larger values make older transitions count more; smaller values focus the model on recent behavior.
- **Personalization** uses shrinkage with `PERSONALIZATION_M`: higher values lean more on global patterns; lower values trust user-specific history more (when available).
- **Seasonality** is included for diagnostics. You can condition predictions by current month if you prefer, though recency weighting often performs better with sparse data.
- **Category mapping** is derived from `serviceType` using the Walk/Sitting taxonomy; adjust sets in the config if you add types.
