Low level pair coalescence counts #2932

nspope · 2024-04-18T00:44:36Z

Low level extension of #2915

codecov · 2024-04-18T01:21:24Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.62%. Comparing base (e6483fc) to head (62069ec).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2932   +/-   ##
=======================================
  Coverage   89.62%   89.62%           
=======================================
  Files          29       29           
  Lines       30176    30176           
  Branches     5874     5874           
=======================================
  Hits        27044    27044           
  Misses       1793     1793           
  Partials     1339     1339

Flag	Coverage Δ
c-tests	`86.21% <ø> (ø)`
lwt-tests	`80.78% <ø> (ø)`
python-c-tests	`88.72% <ø> (ø)`
python-tests	`98.97% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

nspope · 2024-04-18T03:24:37Z

I'd like to generalize this algorithm slightly:

Currently, the output is either a num_windows by num_nodes array (which is very large), or a num_windows by num_time_windows array where the counts are summed within time windows.
Conceptually, the "nodes" output gives the empirical distribution of pair coalescence times in windows across the genome. That is, for each window we have a vector of RVs (node times) and a vector of weights (pair coalescence counts).
From this distributional viewpoint, there's lots of useful things that may be calculated: the empirical CDF, quantiles, moments, coalescence rates, etc. (of which the "sum in time windows" option of the current implementation is one special case)

So, I think it'd be useful to take the current algorithm, and have it apply a summary function at the end of each window. This would let us calculate any useful summary statistic without having to create a potentially humongous array (windows by nodes by indexes) as an intermediate.

The API would stay the same-- later on, we could add named methods for various summary statistics, and potentially eventually expose a "general summary stat" interface, like is done for the other statistics.

jeromekelleher

Looks good to me - do you want to add the summary func stuff now before we start porting to C? Probably a good idea, if you want to do this in the short term.

python/tests/test_coalrate.py

Add proto_pair_coalescence_counts to tests

62069ec

jeromekelleher reviewed Apr 22, 2024

View reviewed changes

python/tests/test_coalrate.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low level pair coalescence counts #2932

Low level pair coalescence counts #2932

nspope commented Apr 18, 2024

codecov bot commented Apr 18, 2024 •

edited

nspope commented Apr 18, 2024 •

edited

jeromekelleher left a comment

Low level pair coalescence counts #2932

Are you sure you want to change the base?

Low level pair coalescence counts #2932

Conversation

nspope commented Apr 18, 2024

codecov bot commented Apr 18, 2024 • edited

Codecov Report

nspope commented Apr 18, 2024 • edited

jeromekelleher left a comment

Choose a reason for hiding this comment

codecov bot commented Apr 18, 2024 •

edited

nspope commented Apr 18, 2024 •

edited