Skip to content

Fix QRF stochastic prediction: persistent RNG, unbiased median, monotone quantiles#177

Merged
MaxGhenis merged 1 commit intomainfrom
fix/qrf-randomness-and-quantiles
Apr 17, 2026
Merged

Fix QRF stochastic prediction: persistent RNG, unbiased median, monotone quantiles#177
MaxGhenis merged 1 commit intomainfrom
fix/qrf-randomness-and-quantiles

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

Fixes three correctness bugs in _QRFModel.predict that interact in the stochastic-imputation code path (findings #1, #2, #3 from the bug hunt):

  1. Add hyperparameter tuning to each model #1 — RNG reset on every predict(). np.random.default_rng(self.seed) was called inside predict(), so repeated calls on the same X returned identical draws and multiple-imputation variance collapsed to zero. The RNG is now created once in __init__ and consumed progressively across calls.
  2. Add module with function for each model to impute on a dataset #2 — Quantile-grid bias. The grid [0.091..0.909] plus .astype(int) (which floors, not rounds) biased stochastic "median" predictions low and truncated the tails. Stochastic draws are now rounded onto a fine symmetric grid so the empirical mean of selected quantiles matches the intended mean_quantile.
  3. Add function to compute quantile loss on a given dataframe for each model and quantile #3 — Per-row random quantile for explicit multi-quantile predict. When users passed quantiles=[0.1, 0.5, 0.9], each quantile request drew its own random per-row index, producing crossed quantiles. QRFResults._predict now routes explicit quantiles through a deterministic exact_quantile path that guarantees row-level monotonicity.

Test plan

  • test_qrf_repeated_predict_calls_produce_different_draws — two sequential predict() calls return different draws
  • test_qrf_stochastic_median_is_unbiased — mean of many stochastic median calls ≈ deterministic median
  • test_qrf_multi_quantile_per_row_monotonicity — per-row q=0.1 <= q=0.5 <= q=0.9 with explicit quantiles
  • Existing test_qrf_beta_distribution_sampling still passes
  • Full tests/test_models/test_qrf.py suite (30/30) passes
  • tests/test_smoke_qrf.py passes

…one quantiles

Three correctness bugs in _QRFModel.predict are fixed together because they
interact in the stochastic-imputation code path:

1. Seed reset on every predict() (#1). np.random.default_rng(self.seed) was
   called inside predict(), so repeated predict() calls on the same X
   returned identical draws and collapsed multiple-imputation variance to
   zero. The RNG is now created once in __init__ and consumed
   progressively across calls.

2. Quantile-grid bias (#2). The grid [0.091..0.909] combined with
   .astype(int) (which floors) biased stochastic "median" predictions low
   and truncated the tails. Stochastic draws are now rounded (not floored)
   onto a fine symmetric grid so the empirical mean of selected quantiles
   matches the intended mean_quantile.

3. Per-row random quantile for explicit multi-quantile predict (#3). When
   users passed quantiles=[0.1, 0.5, 0.9] for prediction intervals, each
   quantile request sampled a separate random per-row index, producing
   crossed quantiles. QRFResults._predict now routes explicit quantiles
   through a deterministic exact_quantile path that guarantees
   row-level monotonicity.

Adds regression tests:
- test_qrf_repeated_predict_calls_produce_different_draws
- test_qrf_stochastic_median_is_unbiased
- test_qrf_multi_quantile_per_row_monotonicity
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
microimpute-dashboard Ready Ready Preview, Comment Apr 17, 2026 0:28am

Copy link
Copy Markdown
Contributor Author

@MaxGhenis MaxGhenis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QRF stochastic fixes verified end-to-end:

  • Persistent RNG: self._rng initialised in _QRFModel.__init__ (qrf.py:150), advanced via self._rng.beta(...) in predict. test_qrf_repeated_predict_calls_produce_different_draws asserts two predict calls on same X produce different draws.
  • Unbiased median: np.rint onto fine symmetric grid (size = max(count_samples, 101)), eps = 1/(grid+1). With mean_quantile=0.5 → Beta(1,1) uniform → rounded index centred on 0.5. test_qrf_stochastic_median_is_unbiased averages 60 stochastic draws against the exact-quantile median and asserts the gap < 0.25·std(y).
  • Monotone per-row quantiles: QRFResults._predict routes explicit quantiles=[...] through exact_quantile=q which calls self.qrf.predict(X, quantiles=[q]) directly (no beta sampling). test_qrf_multi_quantile_per_row_monotonicity asserts row-level q_low <= q_mid <= q_high.

CI all green (lint + 3.12/3.14 tests + changelog). Mergeable. LGTM.

@MaxGhenis MaxGhenis merged commit 541e9d3 into main Apr 17, 2026
7 checks passed
@MaxGhenis MaxGhenis deleted the fix/qrf-randomness-and-quantiles branch April 17, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant