Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GENOMEWIDE.READ.COUNT does not equal the sum of REGION.READ.COUNT #38

Closed
boxiangliu opened this issue Dec 30, 2015 · 1 comment
Closed

Comments

@boxiangliu
Copy link

Hello Bryce and Graham,

I notice that the sum of REGION.READ.COUNT is less than GENOMEWIDE.READ.COUNT. Shouldn't they be equal? If not, why? Sorry if I am missing something obvious!

From reading extract_haplotype_read_counts.py, GENOMEWIDE.READ.COUNT is obtained directly from the h5 file, so naively I would assume that some of the reads does not overlap linked regions, therefore GENOMEWIDE.READ.COUNT is greater than the sum of REGION.READ.COUNT.

More importantly, if these two quantities are not equal, how should I calculate GENOMEWIDE.READ.COUNT after permuting REGION.READ.COUNTS (for test calibration)? Thanks!

@gmcvicker
Copy link
Collaborator

Hi Bosh,

You are correct that the GENOME.WIDE.READ.COUNT is the total number of genome-wide filtered mapped reads, not simply the number of reads in the linked regions being tested. This is intentional, and the value is used as a scaling coefficient for each individual's read counts.

The way we do the shuffling is to shuffle the genotypes across individuals while preserving the counts. This way the region read counts are kept appropriately paired with the total read counts. The combined_test.py script has a --shuffle option that will do this for you.

Graham

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants