# Using functions from `src/evan/glaze.py` in `src/elias`

You can reuse the model code from `src/evan` without changing anything in `src/evan`.

## 1) Import functions into a notebook/script in `src/elias`
```python
from pathlib import Path
import sys

repo_root = Path.cwd().resolve()
if not (repo_root / "src").exists() and (repo_root.parent / "src").exists():
    repo_root = repo_root.parent

sys.path.insert(0, str(repo_root / "src"))

from evan.glaze import (
    psi_function,
    simulate_trial,
    run_simulation_and_plot,
    plot_model_comparison,
)
```

## 2) Main functions
- `psi_function(L_prev, H)`: hazard-based prior update (Psi).
- `simulate_trial(...)`: simulates one trial and returns `reaction_time_ms`, `final_belief`, `decision`, `trajectory`, and `time_points_ms`.
- `run_simulation_and_plot(csv_path, block_id=None)`: runs simulations for one/all blocks and generates plots.
- `plot_model_comparison(df, params=None)`: compares actual vs predicted decisions/RT/beliefs (for prepared DataFrames).

## 3) Minimal examples
```python
psi = psi_function(L_prev=0.8, H=0.12)

trial = simulate_trial(
    prev_belief_L=psi,
    current_LLR=0.5,
    H=0.12,
    belief_threshold=1.0,
    max_duration_ms=1500,
    noise_std=0.7,
    decision_time_ms=50.0,
    noise_gain=3.5,
    stop_on_sat=True,
)

trial["decision"], trial["reaction_time_ms"]
```

```python
csv_path = repo_root / "data" / "participants.csv"
run_simulation_and_plot(str(csv_path), block_id=None)  # or block_id=3
```


# Wilson & Collins (2019) Guide Applied to Your Project

Reference: Wilson RC, Collins AGE (2019), *Ten simple rules for the computational modeling of behavioral data*, eLife 8:e49547, https://doi.org/10.7554/eLife.49547

## Critical question first: what changes when moving from one participant to many?

The modeling logic is the same, but the **data structure and inference level** must change.

1. Do not treat all rows as one long sequence.
Your models are sequential (belief depends on previous trials), so each participant must have an independent sequence.

2. Add explicit `participant_id`.
`data/participants.csv` currently has no `participant_id` column (it has `block_id`, `trial_index`, etc.), so true multi-participant fitting is not yet identifiable from this file alone.

3. Reset latent state at boundaries.
At minimum reset at participant boundaries. Usually also reset at block boundaries unless task design says otherwise.

4. Decide analysis level.
- Recommended first: fit each participant separately, then summarize group distributions.
- Later: hierarchical model (partial pooling) for more stable individual estimates.

5. Evaluation becomes two-level.
- Participant-level: fit quality and best model per person.
- Group-level: parameter distributions, model prevalence, and uncertainty.

This is fully consistent with Wilson & Collins: sequential tasks require careful handling of dependencies, and conclusions should be validated per participant and at group level.

---

## Your 3-model comparison (clear definitions)

1. **Model A: Continuous/DNM + fixed threshold**
- Current implementation path: `simulate_trial(..., stop_on_sat=False)`.
- Decision when `|L| >= B` (after non-decision delay).

2. **Model B: Continuous/DNM + saturation/asymptote rule**
- Current implementation path: `simulate_trial(..., stop_on_sat=True)`.
- Effective stopping level tied to asymptote magnitude.

3. **Model C: DDM/DNM hybrid**
- DNM handles across-trial prior belief update.
- DDM handles within-trial RT/choice mechanism.

Practical hybrid mapping:

- `Psi_t = psi_function(L_{t-1}, H_t)`
- `v_t = beta_llr * LLR_t + beta_prior * Psi_t`
- `z_t = sigmoid(gamma * Psi_t)`
- DDM parameters: boundary `a`, non-decision `t0` (plus optional noise scaling)

This gives a principled combination: DNM for volatility-sensitive belief carry-over, DDM for RT dynamics.

---

## Step-by-step pipeline (Wilson & Collins aligned)

## Step 1) Design/check experiment-question alignment

Wilson & Collins start here for a reason: model quality cannot rescue weak task design.

For your case:

1. Confirm your scientific question:
- Do participants use hazard-sensitive belief updating?
- Is RT better explained by continuous attractor dynamics or DDM-like boundary crossing?

2. Confirm your data can answer it:
- Enough trials per participant/condition?
- Enough variation in evidence (`LLR`) and hazard context?
- RT quality good enough for DDM comparison?

3. Define model-independent signatures first:
- Choice vs LLR curves.
- RT vs |LLR| curves.
- Block/hazard adaptation patterns.

If these signatures are absent, model-based inference is usually weak.

---

## Step 2) Design models to reflect competing hypotheses

Wilson & Collins emphasize: include serious competitors, not strawmen.

For you:

1. Keep Model A, B, C all plausible and interpretable.
2. Keep parameterizations comparable in complexity where possible.
3. Include nuisance terms if needed (e.g., side bias, lapse), because omitting them can distort key parameters.

---

## Step 3) Simulate, simulate, simulate (before fitting real data)

Directly from Wilson & Collins: simulate each model across parameter ranges and inspect qualitative behavior.

For you:

1. Simulate each model on realistic trial structures.
- Best: reuse each participantâ€™s observed `LLR`/hazard sequence.

2. Vary parameters over broad but realistic ranges.
3. Plot model-independent summaries from simulations.
4. Check whether models are behaviorally distinguishable in those summaries.

If models are not distinguishable in simulation, model comparison on real data will be ambiguous.

---

## Step 4) Fit parameters robustly

Wilson & Collins fitting cautions to implement explicitly:

1. Ensure finite log-likelihood/objective everywhere in bounds.
2. Avoid numerical issues (zeros, infinities, unstable exponentials).
3. Use sensible parameter constraints/transforms.
4. Use multi-start optimization to avoid local minima.

Recommended pragmatic setup:

1. Global stage (e.g., differential evolution / random search).
2. Local refinement stage (e.g., L-BFGS-B / trust-constr).
3. Keep best solution across many starts.

---

## Step 5) Parameter recovery (core step for your next phase)

Wilson & Collins: parameter recovery is mandatory before interpreting fit parameters.

Recipe:

1. Choose true parameters from realistic ranges.
- Use ranges from pilot fits or prior literature.

2. Generate synthetic data from one model.
3. Fit the same model back to synthetic data.
4. Compare true vs recovered parameters.

Report at least:

1. Correlation true vs recovered.
2. Calibration slope/intercept (`hat ~ true`).
3. Bias and RMSE.
4. Pairwise correlations among recovered parameters (to detect trade-offs).

Important Wilson & Collins nuance:
Recovery can work only in some parameter regimes. Match simulated parameter ranges to ranges that matter for your empirical fits.

---

## Step 6) Model recovery (for 3-model identifiability)

Wilson & Collins: model comparison must be validated by model recovery.

Procedure:

1. Simulate data from each of A/B/C across realistic parameter ranges.
2. Fit all three models to each synthetic dataset.
3. Select best model by one common criterion.
4. Build 3x3 confusion matrix:
- Rows: generating model.
- Columns: selected best-fitting model.

Also compute inversion matrix when needed:
- `p(simulated model | fit model)`

Goal: strong diagonal in confusion matrix and interpretable inversion probabilities.

If diagonal is weak:

1. Improve task/model-independent diagnostics.
2. Revisit parameter ranges.
3. Revisit model definitions (too similar).

---

## Step 7) Fit real multi-participant data

Wilson & Collins recommend model-independent checks first, then fitting.

Recommended order for your dataset:

1. Add/validate `participant_id`.
2. Run participant-wise model-independent analyses.
3. Fit A/B/C per participant.
4. Compare A/B/C per participant and summarize group-level evidence.
5. Verify fitted parameter ranges are inside ranges where recovery was good.

If not, rerun recovery in the empirical range before claiming interpretation.

---

## Step 8) Validate the winning model (posterior predictive checks)

Wilson & Collins: never skip this.

For each participant and winning model:

1. Simulate behavior using fitted parameters.
2. Compare simulated vs real on the same summary plots:
- Choice psychometric function.
- RT chronometric function.
- RT distribution (quantiles/skew).
- Block/hazard dynamics.

If model wins numerically but fails these checks, treat inference as unreliable.

---

## Step 9) Analyze latent variables only after validation

Then it is appropriate to analyze latent variables (`L_t`, `Psi_t`, drift components, etc.) and link them to behavior/conditions.

Do this only for validated winning model(s), to reduce p-hacking risk and over-interpretation.

---

## Step 10) Report transparently (Wilson & Collins reporting guidance)

At minimum report:

1. Model recovery results (confusion matrix; optionally inversion matrix).
2. Number/proportion of participants best fit by each model.
3. Parameter distributions (not only means).
4. Pairwise parameter correlations (trade-off diagnostics).
5. Parameter recovery plots.
6. Validation plots from posterior predictive checks.

---

## Concrete implementation plan in `src/elias` (without touching `src/evan`)

1. `model_continuous.py`
- Wrap `evan.glaze.simulate_trial` with a mode switch:
  - threshold (`stop_on_sat=False`)
  - asymptote (`stop_on_sat=True`)

2. `model_ddm_hybrid.py`
- Implement DDM simulator and DNM-to-DDM mapping (`Psi -> v,z`).

3. `fit_models.py`
- Shared objective/likelihood and optimization utilities.
- Multi-start fitting.

4. `generate_synth.py`
- Synthetic data generation for A/B/C.

5. `recoverability.py`
- Parameter recovery metrics/plots.
- Model recovery confusion + inversion matrices.

6. `analyze_real_data.py`
- Participant-wise fitting, model comparison, and validation plots.

---

## Minimal pseudo-code: participant-wise fitting

```python
for pid, df_pid in data.groupby("participant_id"):
    df_pid = df_pid.sort_values(["block_id", "trial_index"])

    for model in ["A_cont_threshold", "B_cont_asymptote", "C_ddm_dnm"]:
        theta_hat = fit_model(df_pid, model)
        score = model_score(df_pid, model, theta_hat)
        store(pid, model, theta_hat, score)
```

## Minimal pseudo-code: model recovery matrix

```python
for gen_model in models:
    synth_data = generate_synthetic(gen_model, n_participants=100)

    for fit_model in models:
        scores = fit_all_participants(synth_data, fit_model)
        save_criterion(gen_model, fit_model, scores)

confusion = build_confusion_matrix(saved_criteria)
inversion = compute_inversion_matrix(confusion)
```

---

## Common pitfalls this pipeline prevents

1. Interpreting parameters without parameter recovery.
2. Claiming one model is best without model recovery.
3. Using only fit metrics without posterior predictive validation.
4. Ignoring participant-level sequence boundaries.
5. Comparing models that are not behaviorally distinguishable in simulated data.

This is the Wilson & Collins logic, adapted directly to your continuous-vs-DDM comparison.
