Skip to content

Stan model changes#37

Merged
cm401 merged 18 commits into
mainfrom
stan_model_changes
May 25, 2026
Merged

Stan model changes#37
cm401 merged 18 commits into
mainfrom
stan_model_changes

Conversation

@cm401
Copy link
Copy Markdown
Owner

@cm401 cm401 commented May 25, 2026

No description provided.

cm401 and others added 16 commits May 7, 2026 11:04
Fits three model variants per eligible pathogen (individual-level only,
summary-statistics only, federated) and compares posterior predictive
summaries, interval ratios, W2/JS/OVL distributional metrics, tail
probabilities, and study-level PSIS-LOO ELPD.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three-figure suite: (1) three-arm forest plot of pred_median with 95% CrI,
(2) federated-gain metrics strip (interval ratio, JS divergence, OVL),
(3) posterior predictive density overlays for top pathogens by JS divergence.
Supplementary version of figure 2 faceted across all distributions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
position_dodge() takes width= not height=; geom_vline() does not
accept inherit.aes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three-column patchwork layout: (A) dumbbell forest plot with grey segment
connecting arm A to arm B point estimates and prominent arm C overlay,
(B) interval ratio strip, (C) JS divergence strip. Also fixes stray
inherit.aes warning in supplementary figure geom_vline call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sition_dodge to Panel A

Panel A of the combined figure (fig_ablation_combined) now selects the
distribution with highest LOO-ELPD for arm C per pathogen rather than
always using the primary (log-normal) distribution. Tie-breaking follows
lognormal > gamma > Weibull > Burr XII > gen. gamma; pathogens with no
valid ELPD fall back to PRIMARY_DIST. Y-axis labels show the selected
distribution in parentheses. Arms A/B/C are now separated with
position_dodge(width = 0.5) on both the errorbar and point layers so
estimates do not overlap. Panels B and C use the same per-pathogen
best-distribution data (gain_best).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ed figure

Section 8a now loads results/model_weights.rds (pre-computed by
compute_pathogen_model_bayes_factors() from main_results.rds) and picks the
highest pseudo-Bayes-factor distribution per pathogen, matching the criterion
used in plot_main_figure() and make_supplementary_figures.R. Previously,
best-distribution was determined from the ablation arm C ELPD, which was
inconsistent with the rest of the paper.

Internal Stan names (lognormal, gamma, weibull, burr, gengamma) are mapped to
display labels via MAIN_DIST_TO_DISPLAY before joining to comparison_tbl.
Falls back to PRIMARY_DIST if a pathogen is absent from model_weights or its
best distribution was not fitted in the ablation. Also fixes pB and pC panels
which were referencing gain_ln instead of gain_best.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Combined figure (fig_ablation_combined) changes:
- Panel A is now a faceted dumbbell forest with two side-by-side columns:
  'Median (P50)' and '95th percentile (P95)', sharing the y-axis with
  independent x-scales (log10). Both facets use the same position_dodge,
  colour/shape encoding and dumbbell backbone.
- patchwork widths changed from c(3,1,1) to c(2,1,1).
- ARM_LABELS now reads 'Individual-level only', 'Summary-statistics only',
  'Federated' (letter suffixes removed). COMP_COLOURS/COMP_SHAPES in section 8
  use the same strings so the collected legend is unified: blue/orange/green
  mean the same entity in all three panels with no duplicate entries.
- P95 CrI bounds parsed from A_pred_q95/B_pred_q95/C_pred_q95 columns in the
  same rowwise() block as the median.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The combined figure (fig_ablation_combined) now has four panels (2:1:1:1):

Panel A — faceted dumbbell forest (Median P50 | P95): unchanged.

Panel B — mu0 CrI width ratio (arm / C).  mu0 is the population-level
location parameter extracted via rstan::extract() from the stored Stan fits
in ablation_fits.  Its 95% CrI width measures pure estimation uncertainty,
independent of tau.  Values > 1 mean arm C has tighter knowledge of the
population mean -- the information-gain force in isolation.

Panel C — tau per arm with 95% CrI (three dodged points per pathogen).
tau is parsed from the A_tau/B_tau/C_tau strings already in comparison_tbl.
Shows whether arm C detects more between-study heterogeneity (the opposing
force) or estimates it more precisely (narrower CrI despite same median).

Panel D — predictive CrI ratio (arm / C): the combined confounded signal
shown previously as Panel B.  Now clearly labelled as the sum of both forces.

All four panels share a single collected legend via unified ARM_COLOURS_FULL /
ARM_SHAPES_FULL scales keyed by full label text so patchwork merges them
correctly.  Figure width scaled up by 1.35x to accommodate the extra panel.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rstan::extract() returns draws whose quantile() output carries names
like "2.5%" and "97.5%". Passing those directly as named-vector arguments
to data.frame() caused the quantile names to become row names, which then
propagated as corrupted column names when rows were stacked with
do.call(rbind,...). Result: all mu0_width values were silently NA after
filter(!is.na(mu0_width)).

Fix: .extract_mu0_cri() now returns a one-row data.frame built from
unname(quantile(...)) values. The per-arm rows are assembled with cbind()
rather than named-vector data.frame() arguments. Confirmed: all 30
(pathogen x arm) combinations now have valid mu0_width values.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace "Individual-level only", "Summary-statistics only", "Federated"
with "(I)", "(S)", "(F)" suffixes everywhere in plot_data_format_ablation.R
to eliminate the clash with patchwork's A/B/C/D panel labels.

Changes:
- ARM_LABELS updated with (I)/(S)/(F) suffixes
- COMP_COLOURS / COMP_SHAPES now derived dynamically via ARM_LABELS[c("A","B")]
  so they stay in sync with ARM_LABELS automatically
- mu0_ratio and pred_ratio comparison factor labels now use ARM_LABELS[c("A","B")]
  instead of hardcoded strings
- Figure 2 and supplementary gain figures updated:
  "vs A (individual-level)" -> "vs (I) individual-level"
  "vs B (summary-stats)"    -> "vs (S) summary-stats"
- Caption updated: "federated (C)" -> "federated (F)"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. Extract tau from Stan fits directly (Panel C)
   Replace the parsed-string tau_long (which silently dropped arms with
   n_datasets < 5) with Stan-extracted tau using .extract_tau_cri(), the same
   pattern used for mu0.  Tau is always a parameter in the hierarchical model;
   arms with few datasets show wide prior-dominated CrIs, which is honest and
   ensures all three arms appear for every pathogen.

2. Make Panel A explicitly about posterior predictive CrI
   Change x-axis label to state "Posterior predictive estimate" and clarify
   that points are median/P95 and error bars are 95% credible intervals.

3. Annotate Panel C subtitle
   Add "(wide CrI: few datasets in arm)" so readers understand that width
   encodes estimation reliability, not just biological heterogeneity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hogens

Reverts the previous Stan-based tau extraction (which included prior-dominated
estimates for n < 5 datasets) and restores the correct parsed-string approach
that only shows tau where n_datasets >= 5 in each arm.

Adds a new filter step (8e-filter) that restricts all four Figure 4 panels to
the subset of pathogens that have tau estimated in every arm (I, S, and F).
This ensures Panel C is complete and coherent, and all panels share the same
y-axis rows.  Figure height is updated to reflect the filtered pathogen count.

Also reverts the Panel A x-label and Panel C subtitle changes from the
previous commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds new reporting utilities for simulation-study and data-format ablation outputs, including LaTeX longtables and publication/supplementary figures, and updates analysis scripts to generate/save these artifacts.

Changes:

  • Added LaTeX longtable generators for simulation-study performance tables.
  • Added new simulation-study plotting functions (main-text Figure 2 and an SI figure).
  • Added/updated analysis scripts for data-format ablation visualization, simulation table generation, supplementary-figure splitting, and meta-analysis forest plot saving.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
R/table_utils.R Adds shared LaTeX helpers and two exported simulation-study longtable generators.
R/ploting_utils.R Adds two new exported simulation-study figure builders using ggplot2 + patchwork.
analysis/run_meta_analysis.R Simplifies forest-plot saving by delegating PDF creation to meta::forest(file=...).
analysis/plot_data_format_ablation.R New script to visualize data-format ablation results with multiple figures.
analysis/make_supplementary_figures.R Splits each pathogen’s supplementary output into two PDFs (CDF vs data panels) and scales label size.
analysis/make_simulation_tables.R New script to write the two LaTeX simulation tables to results/.
analysis/data_format_ablation.R New script to run the three-arm (I/S/F) data-format ablation experiment and export comparison tables.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread R/table_utils.R
Comment thread R/table_utils.R
Comment thread R/table_utils.R
Comment thread R/ploting_utils.R
Comment thread R/ploting_utils.R
Comment thread R/ploting_utils.R
Comment thread analysis/plot_data_format_ablation.R Outdated
Comment thread analysis/plot_data_format_ablation.R
Comment thread analysis/plot_data_format_ablation.R
Comment thread analysis/plot_data_format_ablation.R
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 12 changed files in this pull request and generated 3 comments.

Files not reviewed (4)
  • man/generate_simulation_table1.Rd: Language not supported
  • man/generate_simulation_table2.Rd: Language not supported
  • man/plot_simulation_study_figure2.Rd: Language not supported
  • man/plot_simulation_study_si_figure.Rd: Language not supported

Comment thread analysis/data_format_ablation.R
Comment thread analysis/run_meta_analysis.R
Comment thread R/table_utils.R
@cm401 cm401 merged commit b560df1 into main May 25, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants