Skip to content

two twin methods#118

Merged
smasongarrison merged 49 commits intodev_mainfrom
mztwins
Feb 16, 2026
Merged

two twin methods#118
smasongarrison merged 49 commits intodev_mainfrom
mztwins

Conversation

@smasongarrison
Copy link
Member

Pull Request Template

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (reference issue number here, if any)

Type of change

Please delete options that are not relevant (including this line).

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Test A
  • Test B

Test Configuration:

  • Operating System (Mac OS, Windows 10, etc.):
  • Text Editor (Atom, Eclipse, Vim, etc.):
  • Network (local, VPN, etc.):

Checklist:

Requirements Met:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the project documentation
  • My changes generate no new warnings or errors
  • I have checked my code and corrected any misspellings

Recommendations Met:

While these are not absolutely required for a contribution via pull request, we
do encourage these items are completed.

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes

smasongarrison and others added 7 commits February 11, 2026 13:20
Introduce twinID parameter across pedigree segmentation functions (ped2fam/.ped2id/ped2graph/ped2maternal/ped2paternal) and thread it through calls so twin IDs are considered when building family/graph structures. Add mz_twins logical option to ped2com (default FALSE) and, when TRUE and twinID is present, call addMZtwins(ped) to treat MZ twins as an additional parent–child relationship for relatedness computations. Also fix a minor typo in a comment.
* Fix MZ twins coded as 0.5 instead of 1.0 in relatedness matrix

Implement addMZtwins() which redirects one MZ co-twin's parent links
to point to the other twin before adjacency matrix construction. This
produces isPar[twin2, twin1] = 1.0 in the sparse matrix (two 0.5
entries summed), so path tracing yields relatedness 1 between MZ pairs.

Users provide a twinID column (and optionally zygosity) and pass
mz_twins=TRUE to ped2add()/ped2com(). DZ twins are left unchanged
when zygosity column is present.

https://claude.ai/code/session_01P3RQTYpWtAtheSqi4aPjR5

* Replace pedigree redirection with r2 column merge for MZ twins

The previous approach redirected twin2's parents to twin1 in the pedigree,
which inflated twin2's diagonal (1.5 instead of 1.0) and twin2-to-child
relatedness (0.75 instead of 0.5).

New approach: after path tracing but before tcrossprod, merge twin2's
column into twin1's in the r2 matrix. MZ twins share the same genetic
source, so this correctly produces:
- MZ twin relatedness = 1.0
- Self-relatedness = 1.0 (no inflation)
- Parent-child and all downstream values correct
- No post-hoc diagonal patching needed

https://claude.ai/code/session_01P3RQTYpWtAtheSqi4aPjR5

* Use symmetric column merge for MZ twins instead of zeroing

Both twin columns now get the same normalized values
(r2_merged = (col1 + col2) / sqrt(2)) so both twins remain
present and contribute equally. Produces the same final
relatedness matrix as the zero approach but without erasing
one twin from the genetic source matrix.

https://claude.ai/code/session_01P3RQTYpWtAtheSqi4aPjR5

* Merge-then-restore approach for MZ twins

Temporarily absorb twin2's column into twin1's before tcrossprod,
then copy twin1's row/col back to twin2 afterward. This keeps the
computation correct while ensuring neither twin is erased from the
final relatedness matrix.

https://claude.ai/code/session_01P3RQTYpWtAtheSqi4aPjR5

* Accept lowercase 'mz' and add MZ twin tests

Treat both "mz" and "MZ" as monozygotic in findMZtwins (zygosity check now uses %in% c("mz","MZ")). Minor formatting tweak to the verbose message. Added unit tests (tests/testthat/test-buildComponent.R) verifying that MZ twins are coded with relatedness 1 when mz_twins=TRUE, that siblings remain 0.5 when mz_twins=FALSE, self-relatedness stays 1, and parent-child relatedness is unchanged.

* fix tests

---------

Co-authored-by: Claude <noreply@anthropic.com>
Introduce mz_method (default "merge_before_tcrossprod") to ped2com and gate MZ twin column-merge/restore logic behind this option. Restrict the merge/restore behavior to the additive component, adjust verbose messages, and add a TODO outlining an alternative MZ handling approach. Update tests to cover MZ twin child relatedness cases and add clarifying comments for ped2fam string/numeric ID behavior.
Expose a new mz_method argument in ped2add (default 'addtwins') and forward it to ped2com. Update tests to pass mz_method = 'merging' when mz_twins is used so alternative MZ-twin handling is exercised. This lets callers choose the method for handling MZ twins without changing the default behavior.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for handling monozygotic (MZ) twins in relatedness-matrix generation (and associated documentation/tests), along with minor test/doc updates related to simulation “beta/optimized” modes.

Changes:

  • Introduce mz_twins / mz_method plumbing in ped2com() / ped2add() and add an internal findMZtwins() helper.
  • Extend pedigree segmentation wrappers to accept a twinID argument (documentation updated accordingly).
  • Expand/adjust test coverage for MZ twin behavior and update simulation beta-mode expectations/docs.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
R/buildComponent.R Adds MZ twin handling options to relatedness computation (mz_twins, mz_method).
R/constructAdjacency.R Adds internal findMZtwins() helper used to locate MZ co-twin pairs.
R/segmentPedigree.R Adds twinID parameter plumbed through family/line segmentation wrappers and ped2graph().
tests/testthat/test-buildComponent.R Adds tests asserting MZ co-twin relatedness = 1 (and downstream effects) for two methods.
tests/testthat/test-simulatePedigree.R Adjusts simulation expectations for beta/optimized behavior and formatting.
tests/testthat/test-segmentPedigree.R Adds clarifying comments for numeric vs string ID behavior in ped2fam().
man/ped2com.Rd Documents new mz_twins argument (but not mz_method).
man/ped2add.Rd Documents new mz_twins argument (but not mz_method).
man/findMZtwins.Rd Adds internal documentation page for findMZtwins().
man/ped2fam.Rd Adds twinID parameter documentation.
man/ped2graph.Rd Adds twinID parameter documentation.
man/ped2maternal.Rd Adds twinID parameter documentation.
man/ped2paternal.Rd Adds twinID parameter documentation.
man/simulatePedigree.Rd Expands beta documentation to include character aliases and reproducibility notes.
man/buildWithinGenerations.Rd Expands beta documentation similarly for within-generation building.
man/buildBetweenGenerations.Rd Expands beta documentation similarly for between-generation building.
man/adjustKidsPerCouple.Rd Expands beta documentation similarly for kids-per-couple adjustment.

Add cross-method validations for MZ twin handling. In data-raw/optimizing.R create r_mz1 and r_mz2 using ped2add with mz_method 'merging' and 'addtwins' and assert their sparse matrix internals (@i and @x) match. Update tests/testthat/test-buildComponent.R to loop over mz_method options when verifying parent/child relatedness for MZ twins and add an explicit equality assertion between the two method results. These changes ensure different mz_method implementations produce equivalent relatedness outputs.
smasongarrison and others added 13 commits February 12, 2026 21:49
Allow mz_method to be either "merging" or "addtwins" in key pedigree-processing branches and only perform parent redirection when mz_method == "merging". Move verbose message into the merging branch and simplify restoration checks for MZ pairs. Add a new test that verifies equivalence of additive matrices for merging vs addtwins (including sparse-matrix slot checks). Also fix an expectation bound in simulatePedigree tests. Files changed: R/buildComponent.R, tests/testthat/test-buildComponent.R, tests/testthat/test-simulatePedigree.R.
Create mz_id_pairs earlier when MZ twin pairs exist and streamline the merging branch to only run for mz_method == "merging". Remove redundant mz_id_pairs construction, keep logic that makes the second twin a founder and redirects its children to the first twin, and preserve the verbose merge message. Add object.size() calls in the test to inspect memory usage of intermediate matrices.
)

Two issues caused the sparse matrices from the two mz_method paths to
differ structurally:

1. The merging method iterated over mz_pairs (row indices) but treated
   the values as IDs. Changed to iterate over mz_id_pairs which holds
   the actual pedigree IDs. Currently masked because test data has
   sequential IDs matching row positions, but would break with
   non-sequential IDs.

2. The different computational paths (column-merge-before-tcrossprod vs
   pedigree-modify-then-post-copy) leave different explicit zeros in the
   sparse matrix. Added Matrix::drop0() before returning to normalize
   the sparse representation so both methods produce structurally
   identical results.

https://claude.ai/code/session_01PXbgV7bdA9sg7St6xBbg3T

Co-authored-by: Claude <noreply@anthropic.com>
Only restore MZ twin rows/cols in buildComponent when mz_method is merging/addtwins AND the component is 'additive' (buildComponent.R). In constructAdjacency.R adjust the .adjIndexed parameter layout and add a new isTwin helper to detect twins from the pedigree. Update tests to call ped2add without explicit mz_method, comment out object.size checks, and compare sparse adjacency matrices by converting to dense (as.matrix) and asserting their difference sums to zero to avoid incorrect sparse-matrix summation.
* Fix twin matrix size mismatch between addtwins and merging methods

Two issues caused the sparse matrices from the two mz_method paths to
differ structurally:

1. The merging method iterated over mz_pairs (row indices) but treated
   the values as IDs. Changed to iterate over mz_id_pairs which holds
   the actual pedigree IDs. Currently masked because test data has
   sequential IDs matching row positions, but would break with
   non-sequential IDs.

2. The different computational paths (column-merge-before-tcrossprod vs
   pedigree-modify-then-post-copy) leave different explicit zeros in the
   sparse matrix. Added Matrix::drop0() before returning to normalize
   the sparse representation so both methods produce structurally
   identical results.

https://claude.ai/code/session_01PXbgV7bdA9sg7St6xBbg3T

* Fix sparse matrix class mismatch between MZ twin methods

The merging method's post-copy (r[idx2,] <- r[idx1,]) causes Matrix
to coerce from dsCMatrix (symmetric, one triangle) to dgCMatrix
(general, both triangles), doubling @i/@x slot lengths. Add
forceSymmetric() after the post-copy to restore symmetric storage.

Also guard the merging pedigree modification with a component check
(additive only) to match the restoration block, and move the verbose
message inside the merging if-block so it only prints when merging
actually occurs.

https://claude.ai/code/session_01PXbgV7bdA9sg7St6xBbg3T

* Fix test assertion: use sum(A - B) instead of sum(A, B)

sum(A, B) sums all elements of both matrices, which can never be 0
for non-negative relatedness values. Use sum(A - B) to check that the
element-wise differences are zero.

https://claude.ai/code/session_01PXbgV7bdA9sg7St6xBbg3T

---------

Co-authored-by: Claude <noreply@anthropic.com>
Refactor and expand the optimize/benchmark script to better compare mz handling and beta variants: changed seed, increased kpc, marR and reps; pre-generate pedigrees for 1/low/mid/high generation sizes; add cases for mz_method ("merging" vs "addtwins") and beta_null; adjust gen factor mapping and use beta_factor in regressions and plots. Also added assertions in simulatePedigree tests to verify that parents exist for all generations >1 (and none in generation 1), fixed a spacing/logic expression in an existing length check, and kept existing sex-ratio and generation checks. These changes improve performance analysis coverage and strengthen structural tests.
Fix call in buildComponent.R to pass mz_row_pairs through to fuseTwins. Enhance fuseTwins by adding a test_df_twins parameter to return the intermediate df_twins for testing, refactoring the fusion-condition logic, validating inputs, and emitting clearer messages when no MZ twin pairs are found. Add unit tests (tests/testthat/test-helpTwins.R) to exercise findMZtwins/fuseTwins with different return modes and ensure consistent outputs.
Introduce a beta benchmarking flag and streamline MZ twin handling across the package: remove per-pair loops in ped2com in favor of batch operations, expose beta in documentation (R and Rd updates), and ensure rows/cols are copied in batch. Add a new fuseTwins helper (with man page) to support fusing MZ pairs for path tracing, and expand findMZtwins/help docs to describe return formats and the beta option. Update data-raw/optimizing.R to incorporate the beta factor into models/plots, comment out some high-gen benchmark cases, switch to boxplots, and adjust plot scales. Remove an unused vignette figure PNG.
smasongarrison and others added 3 commits February 14, 2026 23:09
Apply code-style and whitespace cleanup across twin-related code and tests for readability and consistency. Changes include reflowing long function calls onto multiple lines, normalizing comma/paren placement, adding braces around single-line if/next for clarity, and removing extraneous blank lines. Updated files: R/buildComponent.R, R/helpTwins.R, and tests/testthat/test-buildComponent.R and test-helpTwins.R. These edits are non-functional and intended only to improve maintainability and style.
Allow specifying only one twin or mate ID and auto-find a suitable partner. makeTwins now builds an ID-to-row map for O(1) lookups and supports cases where only ID_twin1 or only ID_twin2 is provided (selects sibling by zygosity/sex). makeInbreeding gains symmetric logic to auto-select ID_mate2/ID_mate1 when only one mate is given. dropLink standardizes pedigree column names (with an optional verbose flag) before processing. Updated tests to cover these single-ID behaviors and adjusted an existing MZ twin test; NEWS updated to mention the tweakPedigree change.
Allow specifying only one twin or mate ID and auto-find a suitable partner. makeTwins now builds an ID-to-row map for O(1) lookups and supports cases where only ID_twin1 or only ID_twin2 is provided (selects sibling by zygosity/sex). makeInbreeding gains symmetric logic to auto-select ID_mate2/ID_mate1 when only one mate is given. dropLink standardizes pedigree column names (with an optional verbose flag) before processing. Updated tests to cover these single-ID behaviors and adjusted an existing MZ twin test; NEWS updated to mention the tweakPedigree change.
@smasongarrison smasongarrison merged commit f8e409c into dev_main Feb 16, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant