Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Length calculation in summary is incorrect if intervals in the input file overlap #1073

Open
tdanhorn opened this issue Jan 3, 2024 · 0 comments

Comments

@tdanhorn
Copy link

tdanhorn commented Jan 3, 2024

The value of total_ivl_bp (and anything derived from it) in the output of bedtools summary is incorrect if the intervals in the input BED/GTF/VCF overlap. The reason is that this is calculated simply as the sum of the lengths (end - start) of all intervals, so any overlapping regions are double-counted. You can easily see this if you compare the value for all chromosomes in the total_ivl_bp column from a BED file with overlapping intervals to the output of bedtools jaccard -a x.bed -b x.bed, where intersection and union are identical and correspond to the correct value. (The latter is also the same value that you get when subtracting the sum of all intervals in the complement of the BED from the genome length.)

@tdanhorn tdanhorn changed the title Length calculation in summary is incorrect if intervals in the inout file overlap Length calculation in summary is incorrect if intervals in the input file overlap Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant