GENOMEWIDE.READ.COUNT does not equal the sum of REGION.READ.COUNT #38

boxiangliu · 2015-12-30T00:59:21Z

Hello Bryce and Graham,

I notice that the sum of REGION.READ.COUNT is less than GENOMEWIDE.READ.COUNT. Shouldn't they be equal? If not, why? Sorry if I am missing something obvious!

From reading extract_haplotype_read_counts.py, GENOMEWIDE.READ.COUNT is obtained directly from the h5 file, so naively I would assume that some of the reads does not overlap linked regions, therefore GENOMEWIDE.READ.COUNT is greater than the sum of REGION.READ.COUNT.

More importantly, if these two quantities are not equal, how should I calculate GENOMEWIDE.READ.COUNT after permuting REGION.READ.COUNTS (for test calibration)? Thanks!

gmcvicker · 2015-12-30T21:13:26Z

Hi Bosh,

You are correct that the GENOME.WIDE.READ.COUNT is the total number of genome-wide filtered mapped reads, not simply the number of reads in the linked regions being tested. This is intentional, and the value is used as a scaling coefficient for each individual's read counts.

The way we do the shuffling is to shuffle the genotypes across individuals while preserving the counts. This way the region read counts are kept appropriately paired with the total read counts. The combined_test.py script has a --shuffle option that will do this for you.

Graham

gmcvicker closed this as completed Dec 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GENOMEWIDE.READ.COUNT does not equal the sum of REGION.READ.COUNT #38

GENOMEWIDE.READ.COUNT does not equal the sum of REGION.READ.COUNT #38

boxiangliu commented Dec 30, 2015

gmcvicker commented Dec 30, 2015

GENOMEWIDE.READ.COUNT does not equal the sum of REGION.READ.COUNT #38

GENOMEWIDE.READ.COUNT does not equal the sum of REGION.READ.COUNT #38

Comments

boxiangliu commented Dec 30, 2015

gmcvicker commented Dec 30, 2015