-
Notifications
You must be signed in to change notification settings - Fork 3
Optimize pedigree simulation codebase #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Renamed the file to better reflect its purpose and improve code organization.
Refactored simulatePedigree to support an optimized within-generation simulation via a new beta parameter. Added buildWithinGenerations_optimized and benchmarking script to compare performance. Also fixed error handling for insufficient single individuals.
Refactored buildWithinGenerations_optimized to simplify single individual sampling and ensure non-negative counts. Adjusted parameters in optimizing.R for benchmarking, added 1-generation test cases, and improved benchmark result labeling.
Moved and refactored buildWithinGenerations logic from simulatePedigree.R to a new file simulatePedigree_within.R, introducing buildWithinGenerations_index for improved clarity and modularity. Updated buildBetweenGenerations_optimized for better parent assignment logic. Adjusted benchmarking parameters and factor handling in optimizing.R for more robust performance analysis.
Improved argument formatting and validation in buildComponent.R. Refactored buildBetweenGenerations_optimized in simulatePedigree.R for clarity and optimization, including variable renaming and loop improvements. Updated ggplot code in optimizing.R to use factor variables for better visualization.
Removed the 'indexed' beta option from simulatePedigree_within.R and related benchmarking code, as it is not yet implemented. Added a new test to check pedigree generation with imbalanced sex ratios. Improved verbose output and code clarity in pedigree simulation functions.
Refactored buildBetweenGenerations_optimized to precompute row indices for each generation, improving code clarity and efficiency. Also updated set.seed in optimizing.R for reproducibility.
Cleaned up and refactored code in buildBetweenGenerations_optimized and buildWithinGenerations_optimized functions. Removed commented-out and redundant code, clarified logic for assigning married males and females, and added minor optimizations and comments for better readability and maintainability.
Improves performance and clarity in buildBetweenGenerations_optimized by precomputing generation indices, reducing repeated scans, and streamlining parent-offspring linkage logic. Adds detailed comments, replaces inefficient loops with vectorized operations, and enhances handling of parent/offspring assignment. Also updates kpc from 3 to 4 in optimizing.R for simulation parameter adjustment.
Refactored insertEven for clarity and robustness, and replaced the optimized versions of buildBetweenGenerations and buildWithinGenerations with their base implementations as placeholders. Also updated simulation parameters in optimizing.R for more repetitions and adjusted sex ratio.
Introduces beta (optimized) implementations for createGenDataFrame, determineSex, assignCoupleIDs, adjustKidsPerCouple, and markPotentialChildren, along with corresponding documentation. Updates main and within-generation simulation logic to support a 'beta' parameter, allowing selection between original and optimized algorithms. Documentation and NAMESPACE are updated to reflect new beta functions and parameters.
Added a family_id_prefix argument to createGenDataFrame and its beta version, allowing customization of family ID prefixes. Refactored and optimized several helper functions for clarity and efficiency, including assignCoupleIDs, adjustKidsPerCouple, and markPotentialChildren. Updated documentation and removed unused exports from NAMESPACE.
Added a family_id_prefix argument to createGenDataFrame and its beta version, allowing customization of family ID prefixes. Refactored and optimized several helper functions for clarity and efficiency, including assignCoupleIDs, adjustKidsPerCouple, and markPotentialChildren. Updated documentation and removed unused exports from NAMESPACE.
84ba6ce to
06ca022
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #109 +/- ##
==========================================
- Coverage 84.42% 84.38% -0.05%
==========================================
Files 25 27 +2
Lines 3982 4240 +258
==========================================
+ Hits 3362 3578 +216
- Misses 620 662 +42 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Refactors and optimizes simulatePedigree (and helpers) to improve speed/memory for deeper pedigrees, adds a beta switch for optimized helper paths, and updates tests/docs/vignettes accordingly.
Changes:
- Split and refactored within-/between-generation simulation logic and added a
betaparameter to switch implementations. - Added/updated helper functions and documentation/man pages for simulation internals and related parameters.
- Expanded test coverage for
simulatePedigree, plus regenerated vignettes and updated NEWS.
Reviewed changes
Copilot reviewed 34 out of 35 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| vignettes/v4_validation.html | Regenerated vignette HTML output (formatting/output updates). |
| vignettes/v2_pedigree.html | Regenerated vignette HTML output (updated example output/figures). |
| vignettes/v2_pedigree.Rmd | Minor formatting update in vignette source. |
| vignettes/v1_modelingvariancecomponents.html | Regenerated vignette HTML output (library calls/output formatting). |
| vignettes/v1_modelingvariancecomponents.Rmd | Updated vignette source to use library() and improved formatting. |
| tests/testthat/test-simulatePedigree.R | Expanded tests to exercise beta paths and additional scenarios. |
| tests/testthat/test-checkSex.R | Removed stray whitespace in tests. |
| man/simulatePedigree.Rd | Documented new beta argument and expanded fam_shift description. |
| man/ped2mit.Rd | Clarified meaning of max_gen (iteration-based description). |
| man/ped2gen.Rd | Clarified meaning of max_gen (iteration-based description). |
| man/ped2com.Rd | Clarified meaning of max_gen (iteration-based description). |
| man/ped2cn.Rd | Clarified meaning of max_gen (iteration-based description). |
| man/ped2add.Rd | Clarified meaning of max_gen (iteration-based description). |
| man/markPotentialChildren_beta.Rd | Added documentation for beta helper. |
| man/markPotentialChildren.Rd | Updated documentation source reference and added beta parameter. |
| man/determineSex_beta.Rd | Added documentation for beta helper. |
| man/determineSex.Rd | Updated documentation source reference and added beta parameter. |
| man/createGenDataFrame_beta.Rd | Added documentation for beta helper. |
| man/createGenDataFrame.Rd | Updated documentation source reference and added beta/family_id_prefix. |
| man/buildWithinGenerations.Rd | Updated documentation source reference and added beta parameter. |
| man/buildBetweenGenerations.Rd | Added beta parameter to docs. |
| man/assignCoupleIDs_beta.Rd | Added documentation for beta helper. |
| man/assignCoupleIDs.Rd | Updated documentation source reference and added beta parameter. |
| man/adjustKidsPerCouple_beta.Rd | Added documentation for beta helper. |
| man/adjustKidsPerCouple.Rd | Updated documentation source reference and added beta parameter. |
| data-raw/optimizing.R | Added local benchmarking/profiling script for simulation performance comparisons. |
| R/simulatePedigree_within.R | New/refactored within-generation simulation logic with beta dispatch. |
| R/simulatePedigree.R | Refactored between-generation simulation logic with beta dispatch; updated simulatePedigree() signature. |
| R/insertEven.R | Made insertEven() more robust and clearer for edge cases. |
| R/helpSimulatePedigree_beta.R | Added beta implementations of simulation helper functions. |
| R/helpSimulatePedigree.R | Added beta-aware wrappers and updated helper implementations/documentation. |
| R/buildComponent.R | Improved validation and added comments related to max_gen/iteration behavior. |
| NEWS.md | Added release notes for 1.6 highlighting simulation optimizations and tests. |
Comments suppressed due to low confidence (3)
R/helpSimulatePedigree_beta.R:196
markPotentialChildren_beta()removes the last column withdf_Ngen <- df_Ngen[, -ncol(df_Ngen)], which is brittle if column order changes. RemovecoupleIdby name instead (or avoid relying on it being the last column).
R/helpSimulatePedigree_beta.R:158- The
CoupleFdocumentation above this function still reads like placeholder text (“IT MIGHT BE…”). Please replace with a precise definition (e.g., number of couple IDs to label as producing daughters) and any constraints so the beta helper docs are usable.
R/helpSimulatePedigree_beta.R:21 - The example block documents
createGenDataFrame_beta(), but the example callscreateGenDataFrame(...). This is confusing for readers and may fail if the intent is to demonstrate the beta helper. Update the example to call the correct function name (or explicitly document why the non-beta wrapper is shown here).
| # r is I + A + A^2 + ... = (I-A)^-1 from RAM | ||
| # could trim, here | ||
| ## it keeps going until it explains all of the relatedness with themselves (i.e., mtSum == 0) | ||
| # some of this precision is articifuial because we literally get to the point that the condon is eaither there or not. probabiliticy | ||
|
|
||
| # how much percision do we need to get unbiased estimates | ||
|
|
||
| # big matrix still happens here because the network is built. just less percise on inbreeding | ||
|
|
||
| # bias-precision tradeoff. how much percision do we need to get unbiased estimates? not a lot |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These newly added inline comments are hard to parse and contain multiple spelling errors, which makes the algorithm rationale unclear. Please rewrite or remove them (keeping only a concise explanation of the stopping criterion and precision tradeoff).
| # r is I + A + A^2 + ... = (I-A)^-1 from RAM | |
| # could trim, here | |
| ## it keeps going until it explains all of the relatedness with themselves (i.e., mtSum == 0) | |
| # some of this precision is articifuial because we literally get to the point that the condon is eaither there or not. probabiliticy | |
| # how much percision do we need to get unbiased estimates | |
| # big matrix still happens here because the network is built. just less percise on inbreeding | |
| # bias-precision tradeoff. how much percision do we need to get unbiased estimates? not a lot | |
| # r accumulates I + A + A^2 + ... ≈ (I - A)^-1 from RAM path tracing | |
| # Iterate until no additional relatedness is contributed (mtSum == 0) or max_gen is reached. | |
| # max_gen controls the bias–precision tradeoff: larger values give more precise but more expensive estimates. |
R/simulatePedigree_within.R
Outdated
| if (beta == TRUE || beta == "optimized") { | ||
| df_Fam <- buildWithinGenerations_optimized( | ||
| sizeGens = sizeGens, | ||
| marR = marR, | ||
| sexR = sexR, | ||
| Ngen = Ngen, | ||
| verbose = verbose, | ||
| personID = personID, | ||
| momID = momID, | ||
| dadID = dadID, | ||
| code_male = code_male, | ||
| code_female = code_female, | ||
| fam_shift = fam_shift, | ||
| beta = beta | ||
| ) | ||
| } else if (beta %in% c("index", "indexed")) { | ||
| stop("The 'index' or 'indexed' option for parameter 'beta' is not yet implemented.") | ||
| } else if (beta == FALSE || is.null(beta)) { | ||
| df_Fam <- buildWithinGenerations_base( |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buildWithinGenerations() claims to accept beta = "optimized" / "indexed", but it passes that value down to helpers that do if (beta) (e.g., createGenDataFrame()), which will error for non-logical values. Either restrict beta to a strict logical (and update error text/docs), or normalize string aliases to TRUE/FALSE before calling downstream functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback normalize string aliases to TRUE/FALSE before calling downstream functions.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
3be9105 to
4d0eda4
Compare
Removed detailed argument and value sections from Roxygen and Rd docs for several internal functions, keeping only concise descriptions and titles. Improved test coverage in test-simulatePedigree.R by adding a new test for sex ratio imbalance in the opposite direction, refactoring sex ratio checks for clarity, and introducing stricter and more consistent tolerance variables.
4d0eda4 to
109e55c
Compare
|
@smasongarrison I've opened a new pull request, #111, to work on those changes. Once the pull request is ready, I'll request review from you. |
…ctions (#111) * Initial plan * Normalize beta parameter string aliases to TRUE/FALSE before calling downstream functions Co-authored-by: smasongarrison <6001608+smasongarrison@users.noreply.github.com> * Fix inconsistencies in beta parameter validation and add support for all string aliases Co-authored-by: smasongarrison <6001608+smasongarrison@users.noreply.github.com> * Fix test for original alias to use proper seed and add test for index alias Co-authored-by: smasongarrison <6001608+smasongarrison@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: smasongarrison <6001608+smasongarrison@users.noreply.github.com>
This pull request introduces significant optimizations and refactoring to the pedigree simulation codebase, focusing on improving the speed and memory usage of
simulatePedigreeand its helpers, especially for deeper pedigrees. It also adds more comprehensive tests and resolves an error related to insufficient single individuals. Additionally, the codebase is refactored for clarity and maintainability, with new helper functions, improved documentation, and streamlined logic for handling generations, sex assignment, couple identification, and children assignment. Some beta/optimized versions of functions are also separated for easier comparison and future development.Performance Improvements and Bug Fixes:
simulatePedigreeand its helpers for better speed and memory usage, leading to major gains for deeper pedigrees.New and Refactored Helper Functions for Pedigree Simulation:
createGenDataFrame,determineSex,assignCoupleIDs,adjustKidsPerCouple, andmarkPotentialChildren, each with an optionalbetaargument to use optimized versions.R/helpSimulatePedigree_beta.R(fromR/helpPedigree.R) and updated their signatures and logic for consistency and efficiency. [1] [2] [3] [4] [5] [6]Documentation and Usability Enhancements:
Algorithm and Code Quality Improvements:
insertEvenutility function for robustness and clarity, including better handling of edge cases and improved verbose output.Algorithmic Parameter Clarifications:
max_genparameter in pedigree component functions and improved related comments for better user understanding. [1] [2]