# ORDER OF MODELING STEPS


This notebook follows the modeling framework outlined below.
1. Conceptualize Model (What do I want to model and why?)
2. Build model (figure out equations, write code)
3. Fit model to surrogate data (parameter and model recovery)

**CHECKPOINT: Only continue if the model and experiment can answer the question in theory, and if parameters and model are recoverable**

4. Fit model to participant data(1. parameter and model fit, 2. validate model)

**CHECKPOINT: only continue if the model can account for data.**
5. latent variable analysis and report results

Based on Prof. Musslick’s lecture slides in his class about cognitive modeling in the winter semester 2024.




# MODEL-COMPARISON LOCK (Triangle task; P01–P03; mixed trial availability)

## DATA AVAILABILITY NOTE (CURRENT BRANCH)
- Merged dataset is built from `data/elias.csv`, `data/evan.csv` (short), and `data/maik.csv`.
- Pre-exclusion row counts are fixed to current files:
  - `P01`: `160` rows (`4 x 40`)
  - `P02`: `147` rows (`block 1 = 27`; blocks 2-4 = 40 each)
  - `P03`: `160` rows (`4 x 40`)
- Total pre-exclusion rows: `467`.
- No participant-trace duplication is present in the current merged data.

## Step 1) Conceptualize model and claim boundaries
I will conclude only: "Among these candidate models, under the pre-specified scoring + held-out evaluation + checks, model X provides the best predictive account of choice+RT in this dataset."
I will NOT conclude: "X is the true brain model."

## Hazard input choice and interpretation (locked)
- Locked hazard input is `subjective_h_snapshot` (participant-specific estimate).
- What this tests: whether model performance improves when conditioning on the participant's inferred volatility belief.
- Pro: can capture imperfect hazard learning/adaptation and support a psychological interpretation.
- Con: if `subjective_h_snapshot` is derived from the same fitted data without temporal constraints, it can leak flexibility and inflate apparent performance.
- Constraint: treat `subjective_h_snapshot` as one-step-ahead / past-only / externally fixed during fitting and evaluation; do not let current-trial outcomes leak into the predictor used at that trial.
- Interpretation if it wins: decision dynamics are best explained when the model uses participants' subjective volatility belief.
- Required reporting sentence: "Claims are conditional on the provenance and temporal validity of `subjective_h_snapshot`."

## Step 2) Build model with fixed data/scoring specification

### Data + preprocessing (fixed)
- Observables (targets): `choice_t` (binary) and `rt_t` (milliseconds).
- Inputs (given to models): `LLR_t`, hazard/block info via `subjective_h_snapshot`, and any DNM-derived `psi_t` if applicable.
- Exclusion rules (apply identically to all models):
  - drop trials with missing choice or RT
  - drop RT < 150 ms or RT > 5000 ms
- Units: keep RT in milliseconds everywhere.

### Data availability (current file)
- Current merged file has `3` participants and `467` rows before exclusions.
- After locked exclusions, current retained rows are `456` (`11` removed: `10` RT > 5000 and `1` RT < 150).
- Train/test split is defined by `trial_index` cutoff (TRAIN `<= 30`, TEST `>= 31`), not by fixed retained trial counts.
- For `P02` block 1 (short block), the split is still valid and yields `17` TRAIN and `10` TEST trials.

### Candidate models (fixed)
Model A: **DNM + CNM (blockwise threshold)**
- DNM provides trial-wise belief/prior quantities (e.g., `psi_t`) from hazard + evidence.
- CNM uses a **block-specific threshold** parameter to generate choice+RT distribution.

Model B: **DNM + CNM (blockwise asymptote)**
- Same DNM inputs.
- CNM uses a **block-specific asymptote (non-absorbing stabilization)** parameter to generate choice+RT distribution.

Model C: **DNM + DDM (standard bounded diffusion)**
- Map DNM outputs to DDM per trial:
  - start point: `x0_t = k_z * psi_t`
  - drift: `v_t = k_v * LLR_t`
  - bounds: +/-a, nondecision time: t0, diffusion scale fixed (see below).

(If any mapping differs, that becomes a separate model and must be reported as such.)

### Fit vs fixed parameters (locked)
Numerical hyperparameters (fixed for all models):
- `dt = 1 ms`, `t_max = 5000 ms`, `n_sims_per_trial = 2000`
- RNG: seeded and logged (default seed = 0); same seed policy for all models.
- RT density for likelihood: histogram density with bin width `20 ms` + epsilon smoothing `1e-12`.

Model parameters to fit **per participant** (fit on TRAIN only):
- Model A: 4 block params (threshold per block) + `t0` + one evidence/noise gain (single global).
- Model B: 4 block params (asymptote per block) + `t0` + one evidence/noise gain (single global).
- Model C (DDM): `a`, `t0`, `k_v`, `k_z`; diffusion scale fixed `s=1.0` (identifiability).

## Step 3) Surrogate-data recovery checkpoint (SOFT GATE)
Recovery protocol:
1. Simulate surrogate datasets from each fitted candidate model.
2. Refit all candidate models to each surrogate dataset.
3. Summarize model-recovery matrix and parameter-recovery behavior.

Soft-gate checkpoint rule:
- If recovery is weak (models not distinguishable/recoverable above chance), continue to participant fitting and evaluation, but downgrade all winner claims to weak/inconclusive evidence.

## Step 4) Fit participant data and compare models
### Primary evaluation protocol (locked)
Train/test split (forward-chaining; preserves sequential dependence):
- For each participant-block, TRAIN = trials 1-30 by `trial_index`, TEST = trials 31-40 by `trial_index` when those indices are present.
- Fit parameters on TRAIN only; evaluate scores on TEST only.
- Aggregate TEST scores across the 4 blocks per participant.

History usage:
- PRIMARY scoring uses one-step-ahead prediction: condition on observed history up to t-1 (if a model uses it).
- SECONDARY validation uses free-running simulations (model feeds itself its own simulated history).

### Primary scoring rule (locked)
For each trial t in TEST:
- Joint negative log score:
  - `L_t = -log p(choice_t) - log p(rt_t | choice_t)`
- `p(choice_t)` and `p(rt_t | choice_t)` are estimated from model simulations (same n_sims, dt, seed rules).

Total score per participant = sum over TEST trials across all 4 blocks.
Report also (but do NOT use for winner selection):
- choice-only score = sum `-log p(choice_t)`
- RT-only conditional score = sum `-log p(rt_t | choice_t)`

Winner selection uses ONLY the joint score.

### "Winner" vs "inconclusive" rules (locked)
Per participant:
- A model is a clear winner if:
  1) it has the best TEST joint score overall, AND
  2) it is best in >= 3 of 4 blocks (blockwise consistency), AND
  3) block-bootstrap over the 4 blocks gives DeltaScore(best - runner-up) > 0 with 95% CI strictly > 0.
Otherwise: "no clear winner" for that participant.

Group-level (P01-P03):
- Only claim a group preference if the same model is a clear winner in >= 2 of 3 participants.
Otherwise: report heterogeneity / inconclusive.

## Step 5) Latent-variable analysis and reporting
Mandatory checks and reporting (run regardless of winner):
1) Posterior predictive checks (per participant; per block):
   - RT distribution overlay (data vs simulated)
   - RT quantiles (10/30/50/70/90%)
   - accuracy by block
2) Change-point / hazard signatures:
   - accuracy and RT near change-point vs later steady-state
   - dependence of RT/choice on prior strength `|psi|` (if DNM present)
3) Latent-variable analysis:
   - report interpretable latent trajectories/quantities used by the winning or tied models
4) Recovery-aware conclusion:
   - if surrogate recovery is weak, downgrade any winner statement to weak/inconclusive evidence.

Interpretation mapping:
- If Model A wins: bounded/thresholded continuous accumulation with blockwise caution provides the best predictive account.
- If Model B wins: non-absorbing stabilization/asymptote mechanism better captures behavior than strict bound crossing.
- If Model C wins: standard DDM driven by trial-wise prior (start) + evidence (drift) is sufficient; extra CNM nonlinearity is not supported here.
- If inconclusive: dataset does not disambiguate these mechanisms under the locked protocol; report equivalence and required future data.


# Implementation, partly still: Implementation Plan (Detailed; Must Follow MODEL-COMPARISON LOCK)

This section is the executable plan and must follow the `MODEL-COMPARISON LOCK` order exactly.

## DATA AVAILABILITY NOTE (CURRENT BRANCH)
- Current source CSVs are `data/elias.csv`, `data/evan.csv` (short), and `data/maik.csv`.
- Current merged dataset contains `467` rows before exclusions and `456` rows after exclusions.
- `P02` has a short block 1 (`27` observed trials by index), while all other participant-blocks contain `40` rows before exclusions.

## 0) [x] Scope and constraints
- [x] Scope is non-hierarchical, participant-wise fitting (`P01`, `P02`, `P03`).
- [x] Compare exactly 3 candidate models:
  1. Model A: DNM + CNM (blockwise threshold)
  2. Model B: DNM + CNM (blockwise asymptote)
  3. Model C: DNM + DDM (start from `psi`, drift from `LLR`)
- [x] Primary target is joint prediction of `choice` and `rt`.
- [x] Winner selection uses TEST joint score only (as locked).

## 1) Step 2 build phase: notebook setup and reproducibility [x]
- [x] Add one setup code cell that:
  1. Loads `src/elias/elias_ddm.py`.
  2. Imports `numpy`, `pandas`, `matplotlib`, `scipy`.
  3. Sets global constants from lock:
     - `DT = 1` (ms)
     - `T_MAX = 5000` (ms)
     - `N_SIMS_PER_TRIAL = 2000`
     - `RT_BIN_WIDTH = 20` (ms)
     - `EPS = 1e-12`
     - `SEED = 0`
  4. Initializes deterministic RNG policy (fixed seeds logged per participant/model).

## 2) Step 2 build phase: data prep, scoring interface, and parameterization
- [x] Data load, preprocessing overview, exclusions, and split.
- [ ] Unified simulation-to-likelihood interface:
  1. `p(choice_t)`
  2. `p(rt_t | choice_t)` via histogram density (`20 ms` bins + `EPS` smoothing)
  3. joint NLL `L_t = -log p(choice_t) - log p(rt_t | choice_t)`
  4. aggregate outputs: joint (primary), choice-only, RT-only
- [ ] Model parameterization and bounds (participant-wise):
  1. Model A: `theta_A = [thr_b1, thr_b2, thr_b3, thr_b4, t0, g]`
  2. Model B: `theta_B = [asy_b1, asy_b2, asy_b3, asy_b4, t0, g]`
  3. Model C: `theta_C = [a, t0, k_v, k_z]`, `s = 1.0`

## 3) Step 3 surrogate-data recovery checkpoint (SOFT GATE)
- [ ] Simulate surrogate datasets from each fitted model.
- [ ] Refit all candidate models to each surrogate dataset.
- [ ] Build model-recovery matrix and parameter-recovery summary.
- [ ] Apply soft-gate rule:
  - if recovery is weak, continue to participant TEST evaluation,
  - but force interpretation to weak/inconclusive evidence.

## 4) Step 4 participant fitting and held-out model comparison
- [ ] Use one optimizer protocol for all models:
  1. same number of multi-starts
  2. same convergence criteria and max iterations
  3. same seed policy for simulation-based likelihood evaluation
- [ ] Fit independently for each participant on TRAIN only.
- [ ] Save full fit artifacts:
  - best params
  - best TRAIN joint score
  - optimizer status and iterations
  - per-start results
- [ ] Evaluate fitted params on TEST only.
- [ ] Compute per participant:
  1. TEST joint score by block
  2. TEST joint score total
  3. TEST choice-only and RT-only totals (secondary)
- [ ] Apply locked participant-level winner rule.
- [ ] Apply locked group rule.

## 5) Step 5 latent-variable analysis and reporting
- [ ] Posterior predictive checks per participant and block:
  1. RT distribution overlay
  2. RT quantile comparison (10/30/50/70/90)
  3. accuracy by block
- [ ] Change-point/hazard signatures:
  1. RT and accuracy near change-point vs late block
  2. dependence on prior strength `|psi|`
- [ ] Latent-variable analysis for interpretability.
- [ ] Discussion text for hazard-input interpretation (`subjective_h_snapshot`):
  - what it tests
  - benefits
  - leakage risk and temporal-validity caveat
  - interpretation if it wins
  - required sentence: "Claims are conditional on the provenance and temporal validity of `subjective_h_snapshot`."

## 6) Deliverables and export
- [ ] Export participant-level result table:
  - fitted params per model
  - TRAIN and TEST scores
  - winner/inconclusive status
- [ ] Export block-level table for bootstrap and consistency checks.
- [ ] Save all diagnostic plots used in the report.
- [ ] Write concise conclusion text following lock interpretation mapping.

## 7) Definition of done
- [ ] Lock constraints are satisfied in the declared order (Steps 1->5).
- [ ] Surrogate soft-gate decision is documented and propagated to conclusions.
- [ ] Winner decision is reproducible from saved tables and seed logs.
- [ ] Mandatory checks are present for every participant.
- [ ] Hazard-input caveats are explicitly reported in the discussion section.


In [None]:
from pathlib import Path
import importlib
import sys
from typing import Final

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats

# I resolve the repository root so the notebook works from different launch folders.
REPO_ROOT = Path.cwd().resolve().parent.parent
if not (REPO_ROOT / "src").exists() and (REPO_ROOT.parent / "src").exists():
    REPO_ROOT = REPO_ROOT.parent

# I add both `src/` and `src/elias/` to the import path for shared + model modules.
SRC_ROOT = REPO_ROOT / "src"
ELIAS_SRC = SRC_ROOT / "elias"
for import_path in (SRC_ROOT, ELIAS_SRC):
    if str(import_path) not in sys.path:
        sys.path.insert(0, str(import_path))

# I reload local modules so notebook edits are picked up without stale imports.
import common_helpers.preprocessing as _preprocessing
import elias_ddm as _elias_ddm
importlib.reload(_preprocessing)
importlib.reload(_elias_ddm)

# I import shared preprocessing helpers from common_helpers.
from common_helpers.preprocessing import load_participant_data, preprocess_loaded_participant_data

# I import the model runners from elias_ddm.
from elias_ddm import (
    run_all_models_for_participant,
    run_model_a_threshold,
    run_model_b_asymptote,
    run_model_c_ddm,
)
# ------------------------------------------------------------------------------
# 4) Locked hard-coded constants (must match MODEL-COMPARISON LOCK).
# Keep these fixed across models/participants for fair comparison.
# `elias_ddm` APIs now use milliseconds directly.


# Simulation integration step (DT = delta time) in milliseconds.
DT: Final[float] = 1

# Maximum allowed RT window per trial in milliseconds.
T_MAX: Final[float] = 5000

# Monte Carlo samples per trial for probability estimates.
N_SIMS_PER_TRIAL: Final[int] = 2000

# RT histogram bin width in ms for density scoring.
RT_BIN_WIDTH: Final[float] = 20

# Small smoothing constant (EPS=epsilon) to avoid log(0) in likelihood terms.
EPS: Final[float] = 1e-12

# Global base RNG seed for reproducible simulation/scoring.
SEED: Final[int] = 0
# ------------------------------------------------------------------------------

print("Setup complete.")
print(f"REPO_ROOT = {REPO_ROOT}")


## 2) Data load, preprocessing overview, exclusions, and split [x]

- [x] Add one data-prep code cell that:
  - [x] Loads all participants with `load_participant_data(...)`.
  - [x] Applies locked exclusions:
     - drop missing `choice` or RT
     - drop RT < 150 ms
     - drop RT > 5000 ms
  - [x] Verifies expected structure per participant in the current dataset:
     - 3 participants and 4 blocks each
     - `P01`: 40/40/40/40 rows before exclusions
     - `P02`: 27/40/40/40 rows before exclusions (short block 1)
     - `P03`: 40/40/40/40 rows before exclusions
  - [x] Creates split labels per block using trial index:
     - TRAIN: `trial_index <= 30`
     - TEST: `trial_index >= 31`
  - [x] Saves preprocessing overview table (`participant`, `block`, `n_train`, `n_test`, `n_dropped`).


In [None]:
# Data loading, exclusions, preprocessing overview table, and split labels

# I use the existing merged participant CSV that is already stored in `data/`.
participants_csv_path = REPO_ROOT / "data" / "participants.csv"

# I load all participant rows and derive model-ready state columns.
df_loaded = load_participant_data(
    csv_path=participants_csv_path,
    participant_ids=None,
    hazard_col="subjective_h_snapshot",
    reset_on=("participant", "block"),
)

# I apply one shared preprocessing helper to keep this notebook cell simple.
prep_outputs = preprocess_loaded_participant_data(
    df_loaded,
    min_rt_ms=150,
    max_rt_ms=5000,
    train_trial_max_index=30,
    expected_blocks_per_participant=4,
    nominal_trials_per_block_before=40,
)

# I unpack outputs used in later notebook sections.
df_all = prep_outputs["df_all"]
removed_rows_df = prep_outputs["removed_rows_df"]
preprocessing_overview_table = prep_outputs["preprocessing_overview_table"]
participant_structure_table = prep_outputs["participant_structure_table"]

# I report whether this safety step changed anything.
print(f"Participants CSV path:    {participants_csv_path}")
print(f"Rows before safety check: {prep_outputs['before_n']}")
print(f"Rows after safety check:  {prep_outputs['after_n']}")
print(f"Rows removed:             {prep_outputs['removed_n']}")
print(f"Safety check changed data: {prep_outputs['safety_check_changed_data']}")

# I show participants still present after exclusions.
print("\nParticipants:\n", df_all["participant_id"].unique())

# I show participant-level structure checks requested in section 2.
print("\nParticipant structure checks:")
display(participant_structure_table)



# I show the preprocessing overview table requested in section 2.
print("\nPreprocessing overview table:")
display(preprocessing_overview_table)

# I show all rows removed by validity/RT filters for auditability.
print("\nRemoved rows (all rows excluded by validity/RT checks):")
display(removed_rows_df)

# I show the prepared modeling DataFrame used by downstream cells.
print("\nPrepared modeling DataFrame (head):")
display(df_all.head())


### 2) RESULTS:

- Using the current merged file `data/participants.csv`, the dataset contains `3` participants with `467` rows before exclusions:
  - `P01`: `160` rows
  - `P02`: `147` rows (short block 1)
  - `P03`: `160` rows
- Block structure before exclusions:
  - `P01`: `40/40/40/40`
  - `P02`: `27/40/40/40`
  - `P03`: `40/40/40/40`
- After locked exclusions (`drop missing required values`, `RT < 150 ms`, `RT > 5000 ms`), `456` rows remain and `11` rows are removed.
- Removal reasons in current data:
  - `10` rows with `RT > 5000 ms`
  - `1` row with `RT < 150 ms`
  - `0` rows dropped due to missing/non-finite required values
- Participant-level counts after exclusions:
  - `P01`: `149/160` kept (`11` dropped)
  - `P02`: `147/147` kept (`0` dropped)
  - `P03`: `160/160` kept (`0` dropped)
- Split consequences with trial-index rule:
  - `P02` block 1 yields `17` TRAIN and `10` TEST trials (valid, but fewer TRAIN trials than full blocks).


## 2) Build step: unified simulation-to-likelihood interface

- [ ] Add helper code cells implementing one common scoring interface for all models.
- [ ] For each trial, estimate from simulations:
  1. `p(choice_t)`
  2. `p(rt_t | choice_t)` via histogram density (`20 ms` bins + `EPS` smoothing)
- [ ] Trial joint negative log score:
  - `L_t = -log p(choice_t) - log p(rt_t | choice_t)`
- [ ] Return all three aggregates:
  1. joint score (primary)
  2. choice-only score
  3. RT-only conditional score


### 3) RESULTS (Surrogate recovery and checkpoint):


## 2) Build step: model parameterization and bounds (participant-wise)

- [ ] Define explicit parameter vectors and transforms.

### Model A parameters (fit on TRAIN)
- [ ] `theta_A = [thr_b1, thr_b2, thr_b3, thr_b4, t0, g]`
- [ ] Constrain thresholds positive and `t0` in valid range.
- [ ] `g` is a global evidence/noise gain for participant.

### Model B parameters (fit on TRAIN)
- [ ] `theta_B = [asy_b1, asy_b2, asy_b3, asy_b4, t0, g]`
- [ ] Constrain asymptotes positive and `t0` in valid range.

### Model C parameters (fit on TRAIN)
- [ ] `theta_C = [a, t0, k_v, k_z]`
- [ ] Fix diffusion scale `s = 1.0`.
- [ ] Use mapping from lock:
  - `x0_t = k_z * psi_t`
  - `v_t = k_v * LLR_t`

## 3) Surrogate-data recovery checkpoint (SOFT GATE)

- [ ] Simulate surrogate datasets from each fitted model.
- [ ] Refit all candidate models to each surrogate dataset.
- [ ] Summarize recovery matrix and distinguishability.
- [ ] Apply soft-gate interpretation rule:
  - if recovery is weak, continue analysis but downgrade winner claims to weak/inconclusive evidence.

## 4) Fit participant data and compare models on TEST

### 4a) Fitting procedure (TRAIN only)
- [ ] Use one optimizer protocol for all models:
  1. Multi-start optimization (same number of starts per model).
  2. Same convergence criteria and max iterations.
  3. Same seed policy for simulation-based likelihood evaluation.
- [ ] Fit independently for each participant.
- [ ] Save full fit artifacts:
  - best params
  - best TRAIN joint score
  - optimizer status and iterations
  - per-start results

### 4b) TEST evaluation and winner decision
- [ ] Evaluate fitted params on TEST only.
- [ ] Compute per participant:
  1. TEST joint score by block
  2. TEST joint score total (sum over 4 blocks)
  3. TEST choice-only and RT-only totals (secondary)
- [ ] Apply locked participant-level winner rule:
  1. best TEST joint total
  2. best in >= 3/4 blocks
  3. block-bootstrap CI for DeltaScore(best - runner-up) strictly > 0
- [ ] Apply locked group rule:
  - same model is clear winner in >= 2/3 participants
  - otherwise report heterogeneity/inconclusive

## 5) Latent-variable analysis, validity checks, and reporting

- [ ] Posterior predictive checks per participant and block:
  1. RT distribution overlay (data vs simulated)
  2. RT quantile comparison (10/30/50/70/90)
  3. accuracy by block
- [ ] Change-point/hazard signatures:
  1. RT and accuracy near change-point vs late block
  2. dependence on prior strength `|psi|`
- [ ] Latent-variable analysis and interpretation.
- [ ] Discussion caveat for hazard-input choice (`subjective_h_snapshot`):
  - include leakage/circularity risk and temporal-validity constraint
  - include required sentence template from lock

## 6) Deliverables and export

- [ ] Export participant-level result table:
  - fitted params per model
  - TRAIN and TEST scores
  - winner/inconclusive status
- [ ] Export block-level table for bootstrap and consistency checks.
- [ ] Save all diagnostic plots used in report.
- [ ] Write concise conclusion text strictly following lock interpretation mapping.

## 7) Definition of done

- [ ] All lock constraints are satisfied in order (Step 1 -> Step 5).
- [ ] Surrogate soft-gate outcome is documented and propagated into conclusions.
- [ ] Winner decision is reproducible from saved tables and seed logs.
- [ ] Mandatory validity checks are present for every participant.
- [ ] If model recovery is weak, final claim is explicitly downgraded to weak/inconclusive evidence.
