Stan model changes#37
Merged
Merged
Conversation
Fits three model variants per eligible pathogen (individual-level only, summary-statistics only, federated) and compares posterior predictive summaries, interval ratios, W2/JS/OVL distributional metrics, tail probabilities, and study-level PSIS-LOO ELPD. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three-figure suite: (1) three-arm forest plot of pred_median with 95% CrI, (2) federated-gain metrics strip (interval ratio, JS divergence, OVL), (3) posterior predictive density overlays for top pathogens by JS divergence. Supplementary version of figure 2 faceted across all distributions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
position_dodge() takes width= not height=; geom_vline() does not accept inherit.aes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three-column patchwork layout: (A) dumbbell forest plot with grey segment connecting arm A to arm B point estimates and prominent arm C overlay, (B) interval ratio strip, (C) JS divergence strip. Also fixes stray inherit.aes warning in supplementary figure geom_vline call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sition_dodge to Panel A Panel A of the combined figure (fig_ablation_combined) now selects the distribution with highest LOO-ELPD for arm C per pathogen rather than always using the primary (log-normal) distribution. Tie-breaking follows lognormal > gamma > Weibull > Burr XII > gen. gamma; pathogens with no valid ELPD fall back to PRIMARY_DIST. Y-axis labels show the selected distribution in parentheses. Arms A/B/C are now separated with position_dodge(width = 0.5) on both the errorbar and point layers so estimates do not overlap. Panels B and C use the same per-pathogen best-distribution data (gain_best). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ed figure Section 8a now loads results/model_weights.rds (pre-computed by compute_pathogen_model_bayes_factors() from main_results.rds) and picks the highest pseudo-Bayes-factor distribution per pathogen, matching the criterion used in plot_main_figure() and make_supplementary_figures.R. Previously, best-distribution was determined from the ablation arm C ELPD, which was inconsistent with the rest of the paper. Internal Stan names (lognormal, gamma, weibull, burr, gengamma) are mapped to display labels via MAIN_DIST_TO_DISPLAY before joining to comparison_tbl. Falls back to PRIMARY_DIST if a pathogen is absent from model_weights or its best distribution was not fitted in the ablation. Also fixes pB and pC panels which were referencing gain_ln instead of gain_best. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Combined figure (fig_ablation_combined) changes: - Panel A is now a faceted dumbbell forest with two side-by-side columns: 'Median (P50)' and '95th percentile (P95)', sharing the y-axis with independent x-scales (log10). Both facets use the same position_dodge, colour/shape encoding and dumbbell backbone. - patchwork widths changed from c(3,1,1) to c(2,1,1). - ARM_LABELS now reads 'Individual-level only', 'Summary-statistics only', 'Federated' (letter suffixes removed). COMP_COLOURS/COMP_SHAPES in section 8 use the same strings so the collected legend is unified: blue/orange/green mean the same entity in all three panels with no duplicate entries. - P95 CrI bounds parsed from A_pred_q95/B_pred_q95/C_pred_q95 columns in the same rowwise() block as the median. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The combined figure (fig_ablation_combined) now has four panels (2:1:1:1): Panel A — faceted dumbbell forest (Median P50 | P95): unchanged. Panel B — mu0 CrI width ratio (arm / C). mu0 is the population-level location parameter extracted via rstan::extract() from the stored Stan fits in ablation_fits. Its 95% CrI width measures pure estimation uncertainty, independent of tau. Values > 1 mean arm C has tighter knowledge of the population mean -- the information-gain force in isolation. Panel C — tau per arm with 95% CrI (three dodged points per pathogen). tau is parsed from the A_tau/B_tau/C_tau strings already in comparison_tbl. Shows whether arm C detects more between-study heterogeneity (the opposing force) or estimates it more precisely (narrower CrI despite same median). Panel D — predictive CrI ratio (arm / C): the combined confounded signal shown previously as Panel B. Now clearly labelled as the sum of both forces. All four panels share a single collected legend via unified ARM_COLOURS_FULL / ARM_SHAPES_FULL scales keyed by full label text so patchwork merges them correctly. Figure width scaled up by 1.35x to accommodate the extra panel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rstan::extract() returns draws whose quantile() output carries names like "2.5%" and "97.5%". Passing those directly as named-vector arguments to data.frame() caused the quantile names to become row names, which then propagated as corrupted column names when rows were stacked with do.call(rbind,...). Result: all mu0_width values were silently NA after filter(!is.na(mu0_width)). Fix: .extract_mu0_cri() now returns a one-row data.frame built from unname(quantile(...)) values. The per-arm rows are assembled with cbind() rather than named-vector data.frame() arguments. Confirmed: all 30 (pathogen x arm) combinations now have valid mu0_width values. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace "Individual-level only", "Summary-statistics only", "Federated"
with "(I)", "(S)", "(F)" suffixes everywhere in plot_data_format_ablation.R
to eliminate the clash with patchwork's A/B/C/D panel labels.
Changes:
- ARM_LABELS updated with (I)/(S)/(F) suffixes
- COMP_COLOURS / COMP_SHAPES now derived dynamically via ARM_LABELS[c("A","B")]
so they stay in sync with ARM_LABELS automatically
- mu0_ratio and pred_ratio comparison factor labels now use ARM_LABELS[c("A","B")]
instead of hardcoded strings
- Figure 2 and supplementary gain figures updated:
"vs A (individual-level)" -> "vs (I) individual-level"
"vs B (summary-stats)" -> "vs (S) summary-stats"
- Caption updated: "federated (C)" -> "federated (F)"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. Extract tau from Stan fits directly (Panel C) Replace the parsed-string tau_long (which silently dropped arms with n_datasets < 5) with Stan-extracted tau using .extract_tau_cri(), the same pattern used for mu0. Tau is always a parameter in the hierarchical model; arms with few datasets show wide prior-dominated CrIs, which is honest and ensures all three arms appear for every pathogen. 2. Make Panel A explicitly about posterior predictive CrI Change x-axis label to state "Posterior predictive estimate" and clarify that points are median/P95 and error bars are 95% credible intervals. 3. Annotate Panel C subtitle Add "(wide CrI: few datasets in arm)" so readers understand that width encodes estimation reliability, not just biological heterogeneity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hogens Reverts the previous Stan-based tau extraction (which included prior-dominated estimates for n < 5 datasets) and restores the correct parsed-string approach that only shows tau where n_datasets >= 5 in each arm. Adds a new filter step (8e-filter) that restricts all four Figure 4 panels to the subset of pathogens that have tau estimated in every arm (I, S, and F). This ensures Panel C is complete and coherent, and all panels share the same y-axis rows. Figure height is updated to reflect the filtered pathogen count. Also reverts the Panel A x-label and Panel C subtitle changes from the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds new reporting utilities for simulation-study and data-format ablation outputs, including LaTeX longtables and publication/supplementary figures, and updates analysis scripts to generate/save these artifacts.
Changes:
- Added LaTeX
longtablegenerators for simulation-study performance tables. - Added new simulation-study plotting functions (main-text Figure 2 and an SI figure).
- Added/updated analysis scripts for data-format ablation visualization, simulation table generation, supplementary-figure splitting, and meta-analysis forest plot saving.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| R/table_utils.R | Adds shared LaTeX helpers and two exported simulation-study longtable generators. |
| R/ploting_utils.R | Adds two new exported simulation-study figure builders using ggplot2 + patchwork. |
| analysis/run_meta_analysis.R | Simplifies forest-plot saving by delegating PDF creation to meta::forest(file=...). |
| analysis/plot_data_format_ablation.R | New script to visualize data-format ablation results with multiple figures. |
| analysis/make_supplementary_figures.R | Splits each pathogen’s supplementary output into two PDFs (CDF vs data panels) and scales label size. |
| analysis/make_simulation_tables.R | New script to write the two LaTeX simulation tables to results/. |
| analysis/data_format_ablation.R | New script to run the three-arm (I/S/F) data-format ablation experiment and export comparison tables. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 12 changed files in this pull request and generated 3 comments.
Files not reviewed (4)
- man/generate_simulation_table1.Rd: Language not supported
- man/generate_simulation_table2.Rd: Language not supported
- man/plot_simulation_study_figure2.Rd: Language not supported
- man/plot_simulation_study_si_figure.Rd: Language not supported
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.