Skip to content

Conversation

@smasongarrison
Copy link
Member

This pull request introduces significant optimizations and refactoring to the pedigree simulation codebase, focusing on improving the speed and memory usage of simulatePedigree and its helpers, especially for deeper pedigrees. It also adds more comprehensive tests and resolves an error related to insufficient single individuals. Additionally, the codebase is refactored for clarity and maintainability, with new helper functions, improved documentation, and streamlined logic for handling generations, sex assignment, couple identification, and children assignment. Some beta/optimized versions of functions are also separated for easier comparison and future development.

Performance Improvements and Bug Fixes:

  • Optimized simulatePedigree and its helpers for better speed and memory usage, leading to major gains for deeper pedigrees.
  • Fixed an error that occurred when there were not enough single people available during simulation.

New and Refactored Helper Functions for Pedigree Simulation:

  • Documented optimized helper functions: createGenDataFrame, determineSex, assignCoupleIDs, adjustKidsPerCouple, and markPotentialChildren, each with an optional beta argument to use optimized versions.
  • Moved and renamed beta/optimized helper functions to R/helpSimulatePedigree_beta.R (from R/helpPedigree.R) and updated their signatures and logic for consistency and efficiency. [1] [2] [3] [4] [5] [6]

Documentation and Usability Enhancements:

  • Improved function documentation throughout, clarifying parameters, return values, and usage examples for new and existing functions. [1] [2]
  • Updated NEWS file to reflect the new version (1.6) and summarize key changes.

Algorithm and Code Quality Improvements:

  • Simplified and clarified logic for couple ID assignment, sex determination, and children assignment, both in standard and beta versions. [1] [2] [3] [4]
  • Improved parameter validation and error handling in pedigree component building functions. [1] [2]
  • Enhanced the insertEven utility function for robustness and clarity, including better handling of edge cases and improved verbose output.

Algorithmic Parameter Clarifications:

  • Clarified the meaning of the max_gen parameter in pedigree component functions and improved related comments for better user understanding. [1] [2]

smasongarrison and others added 24 commits January 8, 2026 11:32
Renamed the file to better reflect its purpose and improve code organization.
Refactored simulatePedigree to support an optimized within-generation simulation via a new beta parameter. Added buildWithinGenerations_optimized and benchmarking script to compare performance. Also fixed error handling for insufficient single individuals.
Refactored buildWithinGenerations_optimized to simplify single individual sampling and ensure non-negative counts. Adjusted parameters in optimizing.R for benchmarking, added 1-generation test cases, and improved benchmark result labeling.
Moved and refactored buildWithinGenerations logic from simulatePedigree.R to a new file simulatePedigree_within.R, introducing buildWithinGenerations_index for improved clarity and modularity. Updated buildBetweenGenerations_optimized for better parent assignment logic. Adjusted benchmarking parameters and factor handling in optimizing.R for more robust performance analysis.
Improved argument formatting and validation in buildComponent.R. Refactored buildBetweenGenerations_optimized in simulatePedigree.R for clarity and optimization, including variable renaming and loop improvements. Updated ggplot code in optimizing.R to use factor variables for better visualization.
Removed the 'indexed' beta option from simulatePedigree_within.R and related benchmarking code, as it is not yet implemented. Added a new test to check pedigree generation with imbalanced sex ratios. Improved verbose output and code clarity in pedigree simulation functions.
Refactored buildBetweenGenerations_optimized to precompute row indices for each generation, improving code clarity and efficiency. Also updated set.seed in optimizing.R for reproducibility.
Cleaned up and refactored code in buildBetweenGenerations_optimized and buildWithinGenerations_optimized functions. Removed commented-out and redundant code, clarified logic for assigning married males and females, and added minor optimizations and comments for better readability and maintainability.
Improves performance and clarity in buildBetweenGenerations_optimized by precomputing generation indices, reducing repeated scans, and streamlining parent-offspring linkage logic. Adds detailed comments, replaces inefficient loops with vectorized operations, and enhances handling of parent/offspring assignment. Also updates kpc from 3 to 4 in optimizing.R for simulation parameter adjustment.
Refactored insertEven for clarity and robustness, and replaced the optimized versions of buildBetweenGenerations and buildWithinGenerations with their base implementations as placeholders. Also updated simulation parameters in optimizing.R for more repetitions and adjusted sex ratio.
Introduces beta (optimized) implementations for createGenDataFrame, determineSex, assignCoupleIDs, adjustKidsPerCouple, and markPotentialChildren, along with corresponding documentation. Updates main and within-generation simulation logic to support a 'beta' parameter, allowing selection between original and optimized algorithms. Documentation and NAMESPACE are updated to reflect new beta functions and parameters.
Added a family_id_prefix argument to createGenDataFrame and its beta version, allowing customization of family ID prefixes. Refactored and optimized several helper functions for clarity and efficiency, including assignCoupleIDs, adjustKidsPerCouple, and markPotentialChildren. Updated documentation and removed unused exports from NAMESPACE.
Added a family_id_prefix argument to createGenDataFrame and its beta version, allowing customization of family ID prefixes. Refactored and optimized several helper functions for clarity and efficiency, including assignCoupleIDs, adjustKidsPerCouple, and markPotentialChildren. Updated documentation and removed unused exports from NAMESPACE.
@codecov
Copy link

codecov bot commented Jan 27, 2026

Codecov Report

❌ Patch coverage is 85.93407% with 64 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.38%. Comparing base (eeadb99) to head (d999d18).

Files with missing lines Patch % Lines
R/helpSimulatePedigree.R 83.17% 18 Missing ⚠️
R/helpSimulatePedigree_beta.R 75.34% 18 Missing ⚠️
R/simulatePedigree.R 87.02% 17 Missing ⚠️
R/buildComponent.R 33.33% 4 Missing ⚠️
R/simulatePedigree_within.R 96.58% 4 Missing ⚠️
R/insertEven.R 85.71% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #109      +/-   ##
==========================================
- Coverage   84.42%   84.38%   -0.05%     
==========================================
  Files          25       27       +2     
  Lines        3982     4240     +258     
==========================================
+ Hits         3362     3578     +216     
- Misses        620      662      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors and optimizes simulatePedigree (and helpers) to improve speed/memory for deeper pedigrees, adds a beta switch for optimized helper paths, and updates tests/docs/vignettes accordingly.

Changes:

  • Split and refactored within-/between-generation simulation logic and added a beta parameter to switch implementations.
  • Added/updated helper functions and documentation/man pages for simulation internals and related parameters.
  • Expanded test coverage for simulatePedigree, plus regenerated vignettes and updated NEWS.

Reviewed changes

Copilot reviewed 34 out of 35 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
vignettes/v4_validation.html Regenerated vignette HTML output (formatting/output updates).
vignettes/v2_pedigree.html Regenerated vignette HTML output (updated example output/figures).
vignettes/v2_pedigree.Rmd Minor formatting update in vignette source.
vignettes/v1_modelingvariancecomponents.html Regenerated vignette HTML output (library calls/output formatting).
vignettes/v1_modelingvariancecomponents.Rmd Updated vignette source to use library() and improved formatting.
tests/testthat/test-simulatePedigree.R Expanded tests to exercise beta paths and additional scenarios.
tests/testthat/test-checkSex.R Removed stray whitespace in tests.
man/simulatePedigree.Rd Documented new beta argument and expanded fam_shift description.
man/ped2mit.Rd Clarified meaning of max_gen (iteration-based description).
man/ped2gen.Rd Clarified meaning of max_gen (iteration-based description).
man/ped2com.Rd Clarified meaning of max_gen (iteration-based description).
man/ped2cn.Rd Clarified meaning of max_gen (iteration-based description).
man/ped2add.Rd Clarified meaning of max_gen (iteration-based description).
man/markPotentialChildren_beta.Rd Added documentation for beta helper.
man/markPotentialChildren.Rd Updated documentation source reference and added beta parameter.
man/determineSex_beta.Rd Added documentation for beta helper.
man/determineSex.Rd Updated documentation source reference and added beta parameter.
man/createGenDataFrame_beta.Rd Added documentation for beta helper.
man/createGenDataFrame.Rd Updated documentation source reference and added beta/family_id_prefix.
man/buildWithinGenerations.Rd Updated documentation source reference and added beta parameter.
man/buildBetweenGenerations.Rd Added beta parameter to docs.
man/assignCoupleIDs_beta.Rd Added documentation for beta helper.
man/assignCoupleIDs.Rd Updated documentation source reference and added beta parameter.
man/adjustKidsPerCouple_beta.Rd Added documentation for beta helper.
man/adjustKidsPerCouple.Rd Updated documentation source reference and added beta parameter.
data-raw/optimizing.R Added local benchmarking/profiling script for simulation performance comparisons.
R/simulatePedigree_within.R New/refactored within-generation simulation logic with beta dispatch.
R/simulatePedigree.R Refactored between-generation simulation logic with beta dispatch; updated simulatePedigree() signature.
R/insertEven.R Made insertEven() more robust and clearer for edge cases.
R/helpSimulatePedigree_beta.R Added beta implementations of simulation helper functions.
R/helpSimulatePedigree.R Added beta-aware wrappers and updated helper implementations/documentation.
R/buildComponent.R Improved validation and added comments related to max_gen/iteration behavior.
NEWS.md Added release notes for 1.6 highlighting simulation optimizations and tests.
Comments suppressed due to low confidence (3)

R/helpSimulatePedigree_beta.R:196

  • markPotentialChildren_beta() removes the last column with df_Ngen <- df_Ngen[, -ncol(df_Ngen)], which is brittle if column order changes. Remove coupleId by name instead (or avoid relying on it being the last column).
    R/helpSimulatePedigree_beta.R:158
  • The CoupleF documentation above this function still reads like placeholder text (“IT MIGHT BE…”). Please replace with a precise definition (e.g., number of couple IDs to label as producing daughters) and any constraints so the beta helper docs are usable.
    R/helpSimulatePedigree_beta.R:21
  • The example block documents createGenDataFrame_beta(), but the example calls createGenDataFrame(...). This is confusing for readers and may fail if the intent is to demonstrate the beta helper. Update the example to call the correct function name (or explicitly document why the non-beta wrapper is shown here).

Comment on lines 222 to +235
# r is I + A + A^2 + ... = (I-A)^-1 from RAM
# could trim, here
## it keeps going until it explains all of the relatedness with themselves (i.e., mtSum == 0)
# some of this precision is articifuial because we literally get to the point that the condon is eaither there or not. probabiliticy

# how much percision do we need to get unbiased estimates

# big matrix still happens here because the network is built. just less percise on inbreeding

# bias-precision tradeoff. how much percision do we need to get unbiased estimates? not a lot
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These newly added inline comments are hard to parse and contain multiple spelling errors, which makes the algorithm rationale unclear. Please rewrite or remove them (keeping only a concise explanation of the stopping criterion and precision tradeoff).

Suggested change
# r is I + A + A^2 + ... = (I-A)^-1 from RAM
# could trim, here
## it keeps going until it explains all of the relatedness with themselves (i.e., mtSum == 0)
# some of this precision is articifuial because we literally get to the point that the condon is eaither there or not. probabiliticy
# how much percision do we need to get unbiased estimates
# big matrix still happens here because the network is built. just less percise on inbreeding
# bias-precision tradeoff. how much percision do we need to get unbiased estimates? not a lot
# r accumulates I + A + A^2 + ... ≈ (I - A)^-1 from RAM path tracing
# Iterate until no additional relatedness is contributed (mtSum == 0) or max_gen is reached.
# max_gen controls the bias–precision tradeoff: larger values give more precise but more expensive estimates.

Copilot uses AI. Check for mistakes.
Comment on lines 20 to 38
if (beta == TRUE || beta == "optimized") {
df_Fam <- buildWithinGenerations_optimized(
sizeGens = sizeGens,
marR = marR,
sexR = sexR,
Ngen = Ngen,
verbose = verbose,
personID = personID,
momID = momID,
dadID = dadID,
code_male = code_male,
code_female = code_female,
fam_shift = fam_shift,
beta = beta
)
} else if (beta %in% c("index", "indexed")) {
stop("The 'index' or 'indexed' option for parameter 'beta' is not yet implemented.")
} else if (beta == FALSE || is.null(beta)) {
df_Fam <- buildWithinGenerations_base(
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildWithinGenerations() claims to accept beta = "optimized" / "indexed", but it passes that value down to helpers that do if (beta) (e.g., createGenDataFrame()), which will error for non-logical values. Either restrict beta to a strict logical (and update error text/docs), or normalize string aliases to TRUE/FALSE before calling downstream functions.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback normalize string aliases to TRUE/FALSE before calling downstream functions.

smasongarrison and others added 2 commits January 27, 2026 15:55
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@smasongarrison smasongarrison force-pushed the dev_main branch 2 times, most recently from 3be9105 to 4d0eda4 Compare January 28, 2026 02:46
Removed detailed argument and value sections from Roxygen and Rd docs for several internal functions, keeping only concise descriptions and titles. Improved test coverage in test-simulatePedigree.R by adding a new test for sex ratio imbalance in the opposite direction, refactoring sex ratio checks for clarity, and introducing stricter and more consistent tolerance variables.
Copy link
Contributor

Copilot AI commented Jan 28, 2026

@smasongarrison I've opened a new pull request, #111, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits January 27, 2026 22:27
…ctions (#111)

* Initial plan

* Normalize beta parameter string aliases to TRUE/FALSE before calling downstream functions

Co-authored-by: smasongarrison <6001608+smasongarrison@users.noreply.github.com>

* Fix inconsistencies in beta parameter validation and add support for all string aliases

Co-authored-by: smasongarrison <6001608+smasongarrison@users.noreply.github.com>

* Fix test for original alias to use proper seed and add test for index alias

Co-authored-by: smasongarrison <6001608+smasongarrison@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: smasongarrison <6001608+smasongarrison@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants