Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New ts_ibd function #123

Merged
merged 20 commits into from
Mar 3, 2023
Merged

New ts_ibd function #123

merged 20 commits into from
Mar 3, 2023

Conversation

bodkan
Copy link
Owner

@bodkan bodkan commented Mar 2, 2023

This adds a new function -- slendr's interface to TreeSequence.ibd_segments().

As explained in the ?ts_ibd manpage, this is not a real wrapper. R handles heavy iteration extremely poorly so the documented use cases wouldn't really work here. Certainly not for large tree sequences.

Instead, ts_ibd() collects all requested IBD data (either all individual IBD segments when coordinates = TRUE or counts and total pairwise IBD amount when coordinates = FALSE, which is the default) and returns the results as a plain data frame (EDIT: for spatial tree sequences the returned IBD table is now fully spatially annotated and is of the sf data type).

To help to make things manageable, pruning the IBDs to be returned either by setting the minimum length of an IBD segment to be considered, or via setting the maximum age of an ancestor of an IBD pair, is still supported. In fact, given how easy it is to choke on too much IBD, ts_ibd() writes a warning message if all possible IBDs are being requested by the user (something that is most likely an oversight during normal data analysis).

Similarly, the within = and between = arguments are also supported. In line with the rest of the slendr ts_*() library, these arguments accept symbolic names of individuals, not just integer IDs of nodes.

@codecov-commenter
Copy link

codecov-commenter commented Mar 2, 2023

Codecov Report

Merging #123 (aa097e7) into main (e73fd18) will increase coverage by 0.31%.
The diff coverage is 98.43%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main     #123      +/-   ##
==========================================
+ Coverage   83.37%   83.69%   +0.31%     
==========================================
  Files           6        6              
  Lines        2996     3060      +64     
==========================================
+ Hits         2498     2561      +63     
- Misses        498      499       +1     
Impacted Files Coverage Δ
R/tree-sequences.R 87.83% <98.43%> (+0.64%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@bodkan
Copy link
Owner Author

bodkan commented Mar 2, 2023

IBD tracts collected from spatial tree sequences are now annotated with spatial coordinates of nodes and returned as spatial sf objects by default.


As an aside, note that although ts_ibd() returns IBD data in a tabular format as mentioned in the first post, and doesn't work with iteration (and never will), if users need to do iterate over massive amounts of IBD, they can always use the reticulate-d iteration in R just like is shown in tskit docs for Python. (Honestly though, at that point it's probably better to use Python.)

@bodkan bodkan merged commit 63f1423 into main Mar 3, 2023
@bodkan bodkan deleted the ts_ibd branch March 9, 2023 17:59
@bodkan bodkan restored the ts_ibd branch April 21, 2023 13:10
@bodkan bodkan deleted the ts_ibd branch April 21, 2023 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants