Stan model changes by cm401 · Pull Request #37 · cm401/ddsynth

cm401 · 2026-05-25T15:57:13Z

No description provided.

Fits three model variants per eligible pathogen (individual-level only, summary-statistics only, federated) and compares posterior predictive summaries, interval ratios, W2/JS/OVL distributional metrics, tail probabilities, and study-level PSIS-LOO ELPD. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Three-figure suite: (1) three-arm forest plot of pred_median with 95% CrI, (2) federated-gain metrics strip (interval ratio, JS divergence, OVL), (3) posterior predictive density overlays for top pathogens by JS divergence. Supplementary version of figure 2 faceted across all distributions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

position_dodge() takes width= not height=; geom_vline() does not accept inherit.aes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Three-column patchwork layout: (A) dumbbell forest plot with grey segment connecting arm A to arm B point estimates and prominent arm C overlay, (B) interval ratio strip, (C) JS divergence strip. Also fixes stray inherit.aes warning in supplementary figure geom_vline call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…sition_dodge to Panel A Panel A of the combined figure (fig_ablation_combined) now selects the distribution with highest LOO-ELPD for arm C per pathogen rather than always using the primary (log-normal) distribution. Tie-breaking follows lognormal > gamma > Weibull > Burr XII > gen. gamma; pathogens with no valid ELPD fall back to PRIMARY_DIST. Y-axis labels show the selected distribution in parentheses. Arms A/B/C are now separated with position_dodge(width = 0.5) on both the errorbar and point layers so estimates do not overlap. Panels B and C use the same per-pathogen best-distribution data (gain_best). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ed figure Section 8a now loads results/model_weights.rds (pre-computed by compute_pathogen_model_bayes_factors() from main_results.rds) and picks the highest pseudo-Bayes-factor distribution per pathogen, matching the criterion used in plot_main_figure() and make_supplementary_figures.R. Previously, best-distribution was determined from the ablation arm C ELPD, which was inconsistent with the rest of the paper. Internal Stan names (lognormal, gamma, weibull, burr, gengamma) are mapped to display labels via MAIN_DIST_TO_DISPLAY before joining to comparison_tbl. Falls back to PRIMARY_DIST if a pathogen is absent from model_weights or its best distribution was not fitted in the ablation. Also fixes pB and pC panels which were referencing gain_ln instead of gain_best. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Combined figure (fig_ablation_combined) changes: - Panel A is now a faceted dumbbell forest with two side-by-side columns: 'Median (P50)' and '95th percentile (P95)', sharing the y-axis with independent x-scales (log10). Both facets use the same position_dodge, colour/shape encoding and dumbbell backbone. - patchwork widths changed from c(3,1,1) to c(2,1,1). - ARM_LABELS now reads 'Individual-level only', 'Summary-statistics only', 'Federated' (letter suffixes removed). COMP_COLOURS/COMP_SHAPES in section 8 use the same strings so the collected legend is unified: blue/orange/green mean the same entity in all three panels with no duplicate entries. - P95 CrI bounds parsed from A_pred_q95/B_pred_q95/C_pred_q95 columns in the same rowwise() block as the median. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The combined figure (fig_ablation_combined) now has four panels (2:1:1:1): Panel A — faceted dumbbell forest (Median P50 | P95): unchanged. Panel B — mu0 CrI width ratio (arm / C). mu0 is the population-level location parameter extracted via rstan::extract() from the stored Stan fits in ablation_fits. Its 95% CrI width measures pure estimation uncertainty, independent of tau. Values > 1 mean arm C has tighter knowledge of the population mean -- the information-gain force in isolation. Panel C — tau per arm with 95% CrI (three dodged points per pathogen). tau is parsed from the A_tau/B_tau/C_tau strings already in comparison_tbl. Shows whether arm C detects more between-study heterogeneity (the opposing force) or estimates it more precisely (narrower CrI despite same median). Panel D — predictive CrI ratio (arm / C): the combined confounded signal shown previously as Panel B. Now clearly labelled as the sum of both forces. All four panels share a single collected legend via unified ARM_COLOURS_FULL / ARM_SHAPES_FULL scales keyed by full label text so patchwork merges them correctly. Figure width scaled up by 1.35x to accommodate the extra panel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

rstan::extract() returns draws whose quantile() output carries names like "2.5%" and "97.5%". Passing those directly as named-vector arguments to data.frame() caused the quantile names to become row names, which then propagated as corrupted column names when rows were stacked with do.call(rbind,...). Result: all mu0_width values were silently NA after filter(!is.na(mu0_width)). Fix: .extract_mu0_cri() now returns a one-row data.frame built from unname(quantile(...)) values. The per-arm rows are assembled with cbind() rather than named-vector data.frame() arguments. Confirmed: all 30 (pathogen x arm) combinations now have valid mu0_width values. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace "Individual-level only", "Summary-statistics only", "Federated" with "(I)", "(S)", "(F)" suffixes everywhere in plot_data_format_ablation.R to eliminate the clash with patchwork's A/B/C/D panel labels. Changes: - ARM_LABELS updated with (I)/(S)/(F) suffixes - COMP_COLOURS / COMP_SHAPES now derived dynamically via ARM_LABELS[c("A","B")] so they stay in sync with ARM_LABELS automatically - mu0_ratio and pred_ratio comparison factor labels now use ARM_LABELS[c("A","B")] instead of hardcoded strings - Figure 2 and supplementary gain figures updated: "vs A (individual-level)" -> "vs (I) individual-level" "vs B (summary-stats)" -> "vs (S) summary-stats" - Caption updated: "federated (C)" -> "federated (F)" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

1. Extract tau from Stan fits directly (Panel C) Replace the parsed-string tau_long (which silently dropped arms with n_datasets < 5) with Stan-extracted tau using .extract_tau_cri(), the same pattern used for mu0. Tau is always a parameter in the hierarchical model; arms with few datasets show wide prior-dominated CrIs, which is honest and ensures all three arms appear for every pathogen. 2. Make Panel A explicitly about posterior predictive CrI Change x-axis label to state "Posterior predictive estimate" and clarify that points are median/P95 and error bars are 95% credible intervals. 3. Annotate Panel C subtitle Add "(wide CrI: few datasets in arm)" so readers understand that width encodes estimation reliability, not just biological heterogeneity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…hogens Reverts the previous Stan-based tau extraction (which included prior-dominated estimates for n < 5 datasets) and restores the correct parsed-string approach that only shows tau where n_datasets >= 5 in each arm. Adds a new filter step (8e-filter) that restricts all four Figure 4 panels to the subset of pathogens that have tau estimated in every arm (I, S, and F). This ensures Panel C is complete and coherent, and all panels share the same y-axis rows. Figure height is updated to reflect the filtered pathogen count. Also reverts the Panel A x-label and Panel C subtitle changes from the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds new reporting utilities for simulation-study and data-format ablation outputs, including LaTeX longtables and publication/supplementary figures, and updates analysis scripts to generate/save these artifacts.

Changes:

Added LaTeX longtable generators for simulation-study performance tables.
Added new simulation-study plotting functions (main-text Figure 2 and an SI figure).
Added/updated analysis scripts for data-format ablation visualization, simulation table generation, supplementary-figure splitting, and meta-analysis forest plot saving.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
R/table_utils.R	Adds shared LaTeX helpers and two exported simulation-study longtable generators.
R/ploting_utils.R	Adds two new exported simulation-study figure builders using ggplot2 + patchwork.
analysis/run_meta_analysis.R	Simplifies forest-plot saving by delegating PDF creation to `meta::forest(file=...)`.
analysis/plot_data_format_ablation.R	New script to visualize data-format ablation results with multiple figures.
analysis/make_supplementary_figures.R	Splits each pathogen’s supplementary output into two PDFs (CDF vs data panels) and scales label size.
analysis/make_simulation_tables.R	New script to write the two LaTeX simulation tables to `results/`.
analysis/data_format_ablation.R	New script to run the three-arm (I/S/F) data-format ablation experiment and export comparison tables.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 8 out of 12 changed files in this pull request and generated 3 comments.

Files not reviewed (4)

man/generate_simulation_table1.Rd: Language not supported
man/generate_simulation_table2.Rd: Language not supported
man/plot_simulation_study_figure2.Rd: Language not supported
man/plot_simulation_study_si_figure.Rd: Language not supported

cm401 and others added 16 commits May 7, 2026 11:04

Fix position_dodge and geom_vline arguments in ablation plot

84bd0b8

position_dodge() takes width= not height=; geom_vline() does not accept inherit.aes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

add simulation study figure

a0a7529

Update labels and layout for combined figure

00c030a

small updates to code

4221d08

initial revision to create SI tables

c97a897

cm401 requested a review from Copilot May 25, 2026 15:57

Copilot started reviewing on behalf of cm401 May 25, 2026 15:57 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

cm401 added 2 commits May 25, 2026 17:16

minor fixes

fe6c0eb

add documentation

92ff13a

cm401 requested a review from Copilot May 25, 2026 16:17

Copilot started reviewing on behalf of cm401 May 25, 2026 16:17 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Comment thread analysis/data_format_ablation.R

Comment thread analysis/run_meta_analysis.R

Comment thread R/table_utils.R

cm401 merged commit b560df1 into main May 25, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stan model changes#37

Stan model changes#37
cm401 merged 18 commits into
mainfrom
stan_model_changes

cm401 commented May 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cm401 commented May 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants