recover pr 494#496
Merged
Merged
Conversation
Final Report 1. Interface Separation Rationale: Direct ColocBoost users and xQTL protocol users need different entrypoints. Mixing both workflows in one function made the API confusing and encouraged duplicated QC logic. Key changes: - Added/standardized colocboost_analysis() as the direct user-facing wrapper. - Accepts the same core inputs as colocboost() through .... - Adds optional QC/imputation parameters. - If no QC is requested, it behaves as a minimal call-through to colocboost(). - If QC is requested, it prepares cleaned inputs first, then calls colocboost(). - Added/standardized colocboost_pipeline() as the protocol-oriented wrapper. - Starts from region_data produced by load_multitask_regional_data(). - Designed for xQTL-protocol workflows. - Runs QC once, then reuses cleaned inputs across xQTL-only, joint-GWAS, and separate-GWAS analyses. 2. RegionalData Conversion Layer Rationale: Loaded regional data should be converted into reusable analysis inputs without reloading files or duplicating conversion logic. Key changes: - Added reusable adapters: - region_data_to_rss_input() for summary-stat/RSS workflows. - region_data_to_ind_input() for individual-level workflows. - region_data_to_colocboost_input() for ColocBoost-specific assembly. - Preserved shared heavy objects such as X, LD, and X_ref. - Used dictionaries such as dict_YX and dict_sumstatLD to avoid duplicated matrix storage. 3. Shared QC Refactor Rationale: QC logic should be reusable across ColocBoost, SuSiE RSS, and individual-level SuSiE workflows instead of being embedded separately in each pipeline. Key changes: - Added qc_individual_data() for individual-level QC. - Expanded summary_stats_qc() to support combined RSS/ColocBoost summary-stat QC while preserving its historical LD-mismatch QC behavior. - Clarified qc_method behavior: - NULL, "none", and "rss_qc" mean basic summary-stat harmonization only. - "slalom" and "dentist" add optional LD-mismatch outlier QC. - Added QC progress messages for stage tracking, retained counts, skips, and warnings. 4. X_ref-Based Imputation Rationale: When genotype reference X_ref is available, imputation should use the genotype reference directly rather than deriving LD blocks, because LD block partitioning can alter imputation behavior. Key changes: - Final behavior uses whole-region scaled X_ref with RAISS genotype/SVD imputation. - The temporary scaled matrix is used only inside imputation. - The original X_ref remains the reference passed into the final colocboost() call. - The implementation avoids full LD construction for X_ref imputation. - Added tests showing: - raw X_ref SVD can differ from LD-based RAISS; - scaled X_ref SVD matches LD scale in a controlled single-matrix case; - colocboost_analysis(..., X_ref=..., impute=TRUE) uses scaled genotype/SVD and does not call compute_LD(). Important report observation: - Old LD-block RAISS retained fewer variants for ENSG00000155304: 166 -> 7 and 167 -> 7. - Whole-region scaled X_ref SVD retained more: 166 -> 16 and 167 -> 13. - This difference is expected and should be highlighted as evidence that LD block partitioning changes imputation behavior. 5. Protocol Compatibility and Minimal Loader Fixes Rationale: The implementation should support the uploaded chr21 test data and xQTL-protocol use case while keeping loader changes minimal. Key changes: - Kept all changes within pecotmr; no changes were made to the core colocboost() function. - Made only minimal loader-related fixes required by smoke testing: - multi-genotype-source loading via match_geno_pheno; - header handling for tabix-backed summary-stat files; - whitespace-padded covariate numeric parsing. - Updated protocol-facing naming to use colocboost_pipeline(). 6. Documentation and Tests Rationale: The new interfaces, QC behavior, and imputation decision need clear user-facing documentation and regression coverage so future changes do not reintroduce ambiguity. Key changes: - Updated documentation/man pages and vignette references for the new ColocBoost interfaces. - Added chr21 smoke-test deliverable notes emphasizing the X_ref imputation decision. - Added focused regression tests for: - colocboost_analysis() direct-input behavior; - colocboost_pipeline() protocol-level behavior; - reusable summary-stat QC behavior; - RAISS raw/scaled genotype-reference behavior; - X_ref-backed imputation avoiding compute_LD(). - Focused tests passed: - test_raiss.R - test_sumstats_qc.R - test_colocboost_pipeline.R - Chr21 smoke tests completed on two genes across six scenarios, covering no-AD xQTL-only, basic QC, imputation, PIP screening, SLALOM QC, and combined modes.
Remove the legacy rss_qc alias so summary_stats_qc only accepts none, slalom, and dentist.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #496 +/- ##
==========================================
- Coverage 82.98% 82.95% -0.03%
==========================================
Files 36 36
Lines 8146 9130 +984
==========================================
+ Hits 6760 7574 +814
- Misses 1386 1556 +170 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Final Report
Rationale:
Direct ColocBoost users and xQTL protocol users need different entrypoints. Mixing both workflows in one function made the API confusing and encouraged duplicated QC logic.
Key changes:
Rationale:
Loaded regional data should be converted into reusable analysis inputs without reloading files or duplicating conversion logic.
Key changes:
Rationale:
QC logic should be reusable across ColocBoost, SuSiE RSS, and individual-level SuSiE workflows instead of being embedded separately in each pipeline.
Key changes:
Rationale:
When genotype reference X_ref is available, imputation should use the genotype reference directly rather than deriving LD blocks, because LD block partitioning can alter imputation behavior.
Key changes:
Important report observation:
Rationale:
The implementation should support the uploaded chr21 test data and xQTL-protocol use case while keeping loader changes minimal.
Key changes:
Rationale:
The new interfaces, QC behavior, and imputation decision need clear user-facing documentation and regression coverage so future changes do not reintroduce ambiguity.
Key changes: