Skip to content

Add glum and scikit-survival backends and same reporting output as SEQTaRget#50

Merged
remlapmot merged 11 commits into
mainfrom
devel-2026-05-19-glum
Jun 1, 2026
Merged

Add glum and scikit-survival backends and same reporting output as SEQTaRget#50
remlapmot merged 11 commits into
mainfrom
devel-2026-05-19-glum

Conversation

@remlapmot
Copy link
Copy Markdown
Contributor

(Sorry this got biggish)

Currently pySEQTarget runs substantially slower on our short course practical than SEQTaRget. This adds glum and scikit-survival backends to speed up the GLM and Cox fits - I've left the default fitting packages as statsmodels and lifelines.

I've also added as similar output tables to SEQTaRget as I could manage.

remlapmot and others added 10 commits May 28, 2026 13:40
When a bootstrap replicate fails to fit (e.g. a singular matrix raising LinAlgError, which the glum solver hits more readily than statsmodels' IRLS), fit() skips it, leaving outcome_model shorter than _boot_samples. hazard() previously assumed a 1:1 ordering between the two: it looped over range(len(_boot_samples)) and indexed outcome_model[boot_idx + 1], which raised IndexError once a skip had occurred and, before that, silently paired every replicate after the skipped one with the wrong resample's model — producing incorrect hazard CIs.

bootstrap_loop now records _boot_sample_idx, the original _boot_samples index for each successfully fitted replicate (appended in lockstep with the models, so it is correct for both the serial and parallel paths). hazard() iterates that map, pairing outcome_model[model_pos + 1] with _boot_samples[sample_idx], which fixes both the crash and the misalignment. survival() was unaffected since it iterates outcome_model directly and never indexes _boot_samples.

Adds a regression test that injects a LinAlgError on one replicate and asserts hazard() returns a finite HR with CI instead of crashing.
Add a cox_package option (default "lifelines") to fit the univariate Cox model in the hazard step with either lifelines or scikit-survival. The Cox fit is extracted into a _cox_log_hr dispatcher: the lifelines path is unchanged, while the scikit-survival path builds the structured survival array and fits CoxPHSurvivalAnalysis with ties="efron" to match lifelines - this matters because integer follow-up produces many tied event times and the default Breslow handling would diverge. With matching tie handling and a fixed seed the two backends agree to ~1e-9 on the point hazard ratio and the bootstrap CIs match.

Re-adds scikit-survival as a dependency and adds tests covering validation, ITT and bootstrap equivalence with lifelines, and the subgroup path.
@remlapmot remlapmot requested a review from ryan-odea May 29, 2026 10:50
Copy link
Copy Markdown
Collaborator

@ryan-odea ryan-odea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've minimized some of the simple properties, but otherwise looks great!

@remlapmot remlapmot merged commit 69161f0 into main Jun 1, 2026
6 checks passed
@remlapmot remlapmot deleted the devel-2026-05-19-glum branch June 1, 2026 08:25
@remlapmot
Copy link
Copy Markdown
Contributor Author

Nice, thanks Ryan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants