# ORDER OF MODELING STEPS


This notebook follows the modeling framework outlined below.
1. Conceptualize Model (What do I want to model and why?)
2. Build model (figure out equations, write code)
3. Fit model to surrogate data (parameter and model recovery)

**CHECKPOINT: Only continue if the model and experiment can answer the question in theory, and if parameters and model are recoverable**

4. Fit model to participant data(1. parameter and model fit, 2. validate model)

**CHECKPOINT: only continue if the model can account for data.**
5. latent variable analysis and report results

Based on Prof. Musslick’s lecture slides in his class about cognitive modeling in the winter semester 2024.




# Implementation, partly still: Implementation Plan (Detailed; Must Follow MODEL-COMPARISON LOCK)

This section is the executable plan for the analysis workflow. The `MODEL-COMPARISON LOCK` section is the source of truth for assumptions, scoring, and inference rules.

## 0) Scope and constraints [x]

- [x] Scope is non-hierarchical, participant-wise fitting (`P01`, `P02`, `P03`).
- [x] Compare exactly 3 candidate models:
  1. Model A: DNM + CNM (blockwise threshold)
  2. Model B: DNM + CNM (blockwise asymptote)
  3. Model C: DNM + DDM (start from `psi`, drift from `LLR`)
- [x] Primary target is joint prediction of `choice` and `rt`.
- [x] Winner selection uses TEST joint score only (as locked).

## 1) Notebook setup and reproducibility [x]

- [x] Add one setup code cell that:
  1. Loads `src/elias/elias_ddm.py`.
  2. Imports `numpy`, `pandas`, `matplotlib`, `scipy`.
  3. Sets global constants from lock:
     - `DT = 1` (ms)
     - `T_MAX = 5000` (ms)
     - `N_SIMS_PER_TRIAL = 2000`
     - `RT_BIN_WIDTH = 20` (ms)
     - `EPS = 1e-12`
     - `SEED = 0`
  4. Initializes deterministic RNG policy (fixed seeds logged per participant/model).


In [None]:
from pathlib import Path
import importlib
import sys
from typing import Final

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats

# I resolve the repository root so the notebook works from different launch folders.
REPO_ROOT = Path.cwd().resolve().parent.parent
if not (REPO_ROOT / "src").exists() and (REPO_ROOT.parent / "src").exists():
    REPO_ROOT = REPO_ROOT.parent

# I add both `src/` and `src/elias/` to the import path for shared + model modules.
SRC_ROOT = REPO_ROOT / "src"
ELIAS_SRC = SRC_ROOT / "elias"
for import_path in (SRC_ROOT, ELIAS_SRC):
    if str(import_path) not in sys.path:
        sys.path.insert(0, str(import_path))

# I reload local modules so notebook edits are picked up without stale imports.
import common_helpers.preprocessing as _preprocessing
import elias_ddm as _elias_ddm
importlib.reload(_preprocessing)
importlib.reload(_elias_ddm)

# I import shared preprocessing helpers from common_helpers.
from common_helpers.preprocessing import load_participant_data, preprocess_loaded_participant_data

# I import the model runners from elias_ddm.
from elias_ddm import (
    run_all_models_for_participant,
    run_model_a_threshold,
    run_model_b_asymptote,
    run_model_c_ddm,
)
# ------------------------------------------------------------------------------
# 4) Locked hard-coded constants (must match MODEL-COMPARISON LOCK).
# Keep these fixed across models/participants for fair comparison.
# `elias_ddm` APIs now use milliseconds directly.


# Simulation integration step (DT = delta time) in milliseconds.
DT: Final[float] = 1

# Maximum allowed RT window per trial in milliseconds.
T_MAX: Final[float] = 5000

# Monte Carlo samples per trial for probability estimates.
N_SIMS_PER_TRIAL: Final[int] = 2000

# RT histogram bin width in ms for density scoring.
RT_BIN_WIDTH: Final[float] = 20

# Small smoothing constant (EPS=epsilon) to avoid log(0) in likelihood terms.
EPS: Final[float] = 1e-12

# Global base RNG seed for reproducible simulation/scoring.
SEED: Final[int] = 0
# ------------------------------------------------------------------------------

print("Setup complete.")
print(f"REPO_ROOT = {REPO_ROOT}")


## 2) Data load, preprocessing overview, exclusions, and split [x]

- [x] Add one data-prep code cell that:
  - [x] Loads all participants with `load_participant_data(...)`.
  - [x] Applies locked exclusions:
     - drop missing `choice` or RT
     - drop RT < 150 ms
     - drop RT > 5000 ms
  - [x] Verifies expected structure per participant:
     - 4 blocks
     - nominally 40 trials per block before exclusions
  - [x] Creates split labels per block:
     - TRAIN: trials 1-30
     - TEST: trials 31-40
  - [x] Saves preprocessing overview table (`participant`, `block`, `n_train`, `n_test`, `n_dropped`).


In [None]:
# Data loading, exclusions, preprocessing overview table, and split labels

# I use the existing merged participant CSV that is already stored in `data/`.
participants_csv_path = REPO_ROOT / "data" / "participants.csv"

# I load all participant rows and derive model-ready state columns.
df_loaded = load_participant_data(
    csv_path=participants_csv_path,
    participant_ids=None,
    hazard_col="subjective_h_snapshot",
    reset_on=("participant", "block"),
)

# I apply one shared preprocessing helper to keep this notebook cell simple.
prep_outputs = preprocess_loaded_participant_data(
    df_loaded,
    min_rt_ms=150,
    max_rt_ms=5000,
    train_trial_max_index=30,
    expected_blocks_per_participant=4,
    nominal_trials_per_block_before=40,
)

# I unpack outputs used in later notebook sections.
df_all = prep_outputs["df_all"]
removed_rows_df = prep_outputs["removed_rows_df"]
preprocessing_overview_table = prep_outputs["preprocessing_overview_table"]
participant_structure_table = prep_outputs["participant_structure_table"]

# I report whether this safety step changed anything.
print(f"Participants CSV path:    {participants_csv_path}")
print(f"Rows before safety check: {prep_outputs['before_n']}")
print(f"Rows after safety check:  {prep_outputs['after_n']}")
print(f"Rows removed:             {prep_outputs['removed_n']}")
print(f"Safety check changed data: {prep_outputs['safety_check_changed_data']}")

# I show participants still present after exclusions.
print("\nParticipants:\n", df_all["participant_id"].unique())

# I show participant-level structure checks requested in section 2.
print("\nParticipant structure checks:")
display(participant_structure_table)
# I build one merged participant CSV from the three stored source CSV files.
combined_csv_path = write_dataset_csv(
    output_filename="participants.csv",
    source_csv_paths=(
        REPO_ROOT / "data" / "elias-standard.csv",
        REPO_ROOT / "data" / "evan-standard.csv",
        REPO_ROOT / "data" / "maik-standard.csv",
    ),
    participant_ids=("P01", "P02", "P03"),
)
# I show the preprocessing overview table requested in section 2.
print("\nPreprocessing overview table:")
display(preprocessing_overview_table)

# I show all rows removed by validity/RT filters for auditability.
print("\nRemoved rows (all rows excluded by validity/RT checks):")
display(removed_rows_df)

# I show the prepared modeling DataFrame used by downstream cells.
print("\nPrepared modeling DataFrame (head):")
display(df_all.head())


### 2) RESULTS:

- P01 has only 27 instead of 40 trials some might have gone missing, likely due
to an error partly caused by the strange csv-string saving method
- 11 Trials of participant 3 where dropped because he (I think that was me,
very unscientific, that I know this), responded either too quickly or to slowly.

## 3) Unified simulation-to-likelihood interface

- [ ] Add helper code cells implementing one common scoring interface for all models.
- [ ] For each trial, estimate from simulations:
  1. `p(choice_t)`
  2. `p(rt_t | choice_t)` via histogram density (`20 ms` bins + `EPS` smoothing)
- [ ] Trial joint negative log score:
  - `L_t = -log p(choice_t) - log p(rt_t | choice_t)`
- [ ] Return all three aggregates:
  1. joint score (primary)
  2. choice-only score
  3. RT-only conditional score

### 3) RESULTS:



## 4) Model parameterization and bounds (participant-wise)

- [ ] Define explicit parameter vectors and transforms.

### Model A parameters (fit on TRAIN)
- [ ] `theta_A = [thr_b1, thr_b2, thr_b3, thr_b4, t0, g]`
- [ ] Constrain thresholds positive and `t0` in valid range.
- [ ] `g` is a global evidence/noise gain for participant.

### Model B parameters (fit on TRAIN)
- [ ] `theta_B = [asy_b1, asy_b2, asy_b3, asy_b4, t0, g]`
- [ ] Constrain asymptotes positive and `t0` in valid range.

### Model C parameters (fit on TRAIN)
- [ ] `theta_C = [a, t0, k_v, k_z]`
- [ ] Fix diffusion scale `s = 1.0`.
- [ ] Use mapping from lock:
  - `x0_t = k_z * psi_t`
  - `v_t = k_v * LLR_t`

## 5) Fitting procedure (TRAIN only)

- [ ] Use one optimizer protocol for all models:
  1. Multi-start optimization (same number of starts per model).
  2. Same convergence criteria and max iterations.
  3. Same seed policy for simulation-based likelihood evaluation.
- [ ] Fit independently for each participant.
- [ ] Save full fit artifacts:
  - best params
  - best TRAIN joint score
  - optimizer status and iterations
  - per-start results

## 6) TEST evaluation and winner decision

- [ ] Evaluate fitted params on TEST only.
- [ ] Compute per participant:
  1. TEST joint score by block
  2. TEST joint score total (sum over 4 blocks)
  3. TEST choice-only and RT-only totals (secondary)
- [ ] Apply locked participant-level winner rule:
  1. best TEST joint total
  2. best in >= 3/4 blocks
  3. block-bootstrap CI for DeltaScore(best - runner-up) strictly > 0
- [ ] Apply locked group rule:
  - same model is clear winner in >= 2/3 participants
  - otherwise report heterogeneity/inconclusive

## 7) Mandatory validity checks (after winner logic)

- [ ] Posterior predictive checks per participant and block:
  1. RT distribution overlay (data vs simulated)
  2. RT quantile comparison (10/30/50/70/90)
  3. accuracy by block
- [ ] Change-point/hazard signatures:
  1. RT and accuracy near change-point vs late block
  2. dependence on prior strength `|psi|`
- [ ] Model recovery:
  1. simulate surrogate data from each fitted model
  2. refit all models
  3. summarize recovery matrix and distinguishability

## 8) Deliverables and export

- [ ] Export participant-level result table:
  - fitted params per model
  - TRAIN and TEST scores
  - winner/inconclusive status
- [ ] Export block-level table for bootstrap and consistency checks.
- [ ] Save all diagnostic plots used in report.
- [ ] Write concise conclusion text strictly following lock interpretation mapping.

## 9) Definition of done

- [ ] All lock constraints are satisfied exactly.
- [ ] Winner decision is reproducible from saved tables and seed logs.
- [ ] Mandatory validity checks are present for every participant.
- [ ] If model recovery is weak, final claim is explicitly downgraded to weak/inconclusive evidence.


# MODEL-COMPARISON LOCK (Triangle task; P01–P03; 4 blocks × 40 trials)

## Goal (what I may conclude)
I will conclude only: “Among these candidate models, under the pre-specified scoring + held-out evaluation + checks, model X provides the best predictive account of choice+RT in this dataset.”
I will NOT conclude: “X is the true brain model.”

---

## Data + preprocessing (fixed)
- Observables (targets): 
  1) `choice_t` (binary) 
  2) `rt_t` (continuous; milliseconds)
- Inputs (given to models): `LLR_t`, hazard/block info, and any DNM-derived `psi_t` if applicable.
- Exclusion rules (apply identically to all models):
  - drop trials with missing choice or RT
  - drop RT < 150 ms or RT > 5000 ms
- Units: keep RT in milliseconds everywhere.

---

## Candidate models (fixed)
Model A: **DNM + CNM (blockwise threshold)**
- DNM provides trial-wise belief/prior quantities (e.g., `psi_t`) from hazard + evidence.
- CNM uses a **block-specific threshold** parameter to generate choice+RT distribution.

Model B: **DNM + CNM (blockwise asymptote)**
- Same DNM inputs.
- CNM uses a **block-specific asymptote (non-absorbing stabilization)** parameter to generate choice+RT distribution.

Model C: **DNM + DDM (standard bounded diffusion)**
- Map DNM outputs to DDM per trial:
  - start point: `x0_t = k_z * psi_t`
  - drift: `v_t = k_v * LLR_t`
  - bounds: ±a, nondecision time: t0, diffusion scale fixed (see below).

(If any mapping differs, that becomes a separate model and must be reported as such.)

---

## Fit vs fixed parameters (locked)
Numerical hyperparameters (fixed for all models):
- `dt = 1 ms`, `t_max = 5000 ms`, `n_sims_per_trial = 2000`
- RNG: seeded and logged (default seed=0); same seed policy for all models.
- RT density for likelihood: histogram density with bin width `20 ms` + epsilon smoothing `1e-12`.

Model parameters to fit **per participant** (fit on TRAIN only):
- Model A: 4 block params (threshold per block) + `t0` + one evidence/noise gain (single global).
- Model B: 4 block params (asymptote per block) + `t0` + one evidence/noise gain (single global).
- Model C (DDM): `a`, `t0`, `k_v`, `k_z`; diffusion scale fixed `s=1.0` (identifiability).

---

## Primary evaluation protocol (locked)
Train/test split (forward-chaining; preserves sequential dependence):
- For each block (40 trials):
  - TRAIN = trials 1–30
  - TEST  = trials 31–40
- Fit parameters on TRAIN only; evaluate scores on TEST only.
- Aggregate TEST scores across the 4 blocks per participant.

History usage:
- PRIMARY scoring uses one-step-ahead prediction: condition on observed history up to t−1 (if a model uses it).
- SECONDARY validation uses free-running simulations (model feeds itself its own simulated history).

---

## Primary scoring rule (locked)
For each trial t in TEST:
- Joint negative log score:
  - `L_t = -log p(choice_t)  -log p(rt_t | choice_t)`
- `p(choice_t)` and `p(rt_t | choice_t)` are estimated from model simulations (same n_sims, dt, seed rules).
Total score per participant = sum over TEST trials across all 4 blocks.
Report also (but do NOT use for winner selection):
- choice-only score = sum `-log p(choice_t)`
- RT-only conditional score = sum `-log p(rt_t | choice_t)`

Winner selection uses ONLY the joint score.

---

## “Winner” vs “inconclusive” rules (locked)
Per participant:
- A model is a clear winner if:
  1) it has the best TEST joint score overall, AND
  2) it is best in ≥ 3 of 4 blocks (blockwise consistency), AND
  3) block-bootstrap over the 4 blocks gives ΔScore(best − runner-up) > 0 with 95% CI strictly > 0.
Otherwise: “no clear winner” for that participant.

Group-level (P01–P03):
- Only claim a group preference if the same model is a clear winner in ≥ 2 of 3 participants.
Otherwise: report heterogeneity / inconclusive.

---

## Mandatory validity checks (must be shown regardless of winner)
1) Posterior predictive checks (per participant; per block):
   - RT distribution overlay (data vs simulated)
   - RT quantiles (10/30/50/70/90%)
   - accuracy by block
2) Change-point / hazard signatures (as plots):
   - accuracy & RT near change-point vs later steady-state
   - dependence of RT/choice on prior strength |psi| (if DNM present)
3) Model recovery (surrogate data):
   - simulate datasets from each fitted model; refit all models; check if generating model is recovered above chance.
   - If recovery is poor, interpret any real-data “winner” as weak evidence (models not distinguishable here).

---

## Interpretation mapping (locked)
If Model A wins: bounded/thresholded continuous accumulation with blockwise caution provides best predictive account.
If Model B wins: non-absorbing stabilization/asymptote mechanism better captures behavior than strict bound crossing.
If Model C wins: standard DDM driven by trial-wise prior (start) + evidence (drift) is sufficient; extra CNM nonlinearity not supported by prediction here.
If inconclusive: dataset (3 participants; 4 blocks) does not disambiguate these mechanisms under the locked protocol; report equivalence and what data would be needed (more participants/blocks or stronger manipulations).
