New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tabix indexed BedTool evaluates to None #132
Comments
Hi Rory - import pybedtools
foo = pybedtools.example_bedtool('b.bed')
if foo:
print "Before tabix indexing; foo is None?:", foo is None
foo = foo.tabix()
if foo:
print "After tabix indexing; foo is None?:", foo is None prints:
So in both cases |
Hi Ryan, Thanks for the reply. Weird, pybedtools is version 0.6.9 and tabix is version 1.2.1. If I run the code you posted I just get the first line and not the second. This is on OSX 10.10.2. The development version of pybedtools has the same behavior as well. I was one tic behind with cython and tried installing it again with the most recent version but it has the same behavior. Any other ideas? |
Hmm. My tests are on Ubuntu 14.04, but I have an OSX 10.9 machine that I'll try tomorrow. If I can't reproduce with that, I'll track down one with 10.10.2. The only thing I can think of off the top of my head is that the bgzipped file is empty for some reason. Here's a a paranoid test case that ensures everything is being cleaned up and using import pybedtools
from pybedtools.bedtool import isBGZIP
import gzip
import os
template = """
fn : {0}
isBGZIP : {1}
._tabixed() : {2}
length : {3}
is None : {4}
contents :
{5}
"""
ex_fn = 'example.bed'
os.system('cp %s %s' % (pybedtools.example_filename('b.bed'), ex_fn))
def clean():
"""
delete the .gz and .gz.tbi files for the example filename
"""
for f in [ex_fn + '.gz', ex_fn + '.gz.tbi']:
if os.path.exists(f):
os.unlink(f)
def check(i):
if i.fn.endswith('.gz'):
contents = gzip.open(i.fn).read()
else:
contents = open(i.fn).read()
print template.format(i.fn, isBGZIP(i.fn), i._tabixed(), len(i), i is None, contents)
# base case: not BGZIPed, not tabixed()
a = pybedtools.BedTool(ex_fn)
clean()
check(a)
clean()
# convert `a` to gzip. Should not be BGZIPed, not tabixed()
with gzip.open(a.fn + '_gzipped_not_bgzipped.gz', 'w') as fout:
fout.write(open(a.fn).read())
b = pybedtools.BedTool(a.fn + '_gzipped_not_bgzipped.gz')
check(b)
clean()
# BGZIPed and tabixed()
c = a.tabix(force=True, in_place=False)
check(c)
clean()
# BGZIPed and tabixed()
d = a.tabix(force=True, in_place=True)
check(d)
clean() Output for me on Ubuntu is:
|
|
Hmm, so the tabixed versions are evaluating to None-like because, despite actually having contents in the file, pybedtools sees it as empty. Would you send me the last example.bed.gz file that's created so I can try debugging it? |
Hi Ryan, Thanks for debugging this. It doesn't actually create the last two files. If I bgzip and tabix index the example.bed file, I get the same behavior. I stuck the bgzipped file and the tabix index up here: https://dl.dropboxusercontent.com/u/2822886/example.tar |
Thanks for the file. But . . . argh, it's working fine for me on OSX 10.9.5. My guess is that it's on the Cython side. So 1) I'll test this on an OSX 10.10 machine and 2) I'll get together some more tests for the Cython side that you can try. |
Just reporting back that I get the same results as you reported here on OSX 10.10 after installing Xcode 6.3, tabix 1.2, and this gcc:
|
Hi Ryan, I'm rocking a slightly older version of clang, I haven't updated Xcode in a bit:
|
OK, thanks. I'm trying to reproduce this on a machine I can debug on, so I just upgraded my development Mac from 10.9 to 10.10 . . . and . . . the above test passes. The 10.10 machine I was able to reproduce on before was borrowed, and never had 10.9 on it. Next up is installing a fresh 10.10 onto a new partition to see how that goes. Back down the rabbit hole! |
So confusing; this machine was upgraded from 10.9 -> 10.10. |
No kidding confusing . . fresh install of 10.10 still works for me. I'm still stumped on the platform/version specificity, but I have a fix that hopefully avoids the issue. The problem is actually an inconsistency in samtools. Short answertry the latest commit (933417d) to see if that helps. Long answer(for my own reference in the future) Fundamentally, the problem was that the bgzipped file was being seen as length 0 by pybedtools, but there didn't seem to be anything funky with the file itself or with tabix. I tracked it down to improper creation of a BAM IntervalIterator when it should be just a regular IntervalIterator. That is, In And there's the problem. # Mac OSX 10.10
samtools view -H a_non_BAM_filename # exit = 0 (sometimes)
# Other platforms / OSX versions
samtools view -H a_non_BAM_filename # exit = 1 In a platform- and OS version-specific manner (still can't reliably reproduce but only has been seen on Mac OSX 10.10 Yosemite), that command incorrectly reports a exit code of 0. The same command on other platforms and other Mac OSX versions correctly return non-zero exit codes and print error messages to stderr. So the fix in 933417d is to avoid looking at the return code and instead look at the contents of the reported header. Now, the presence of an actual header means it's a BAM and the exit code is ignored. |
Hi Ryan, You are awesome, this fixes the issue for me. I Pinboarded this issue as my canonical example for open source software being amazing, I would have never figured this out. Thank you so much, you're a beast. |
Phew, glad it worked! That was a sneaky little bug. Nice easy fix in the end though. |
Hi Ryan,
Before tabix indexing, if a BedTool is non-empty, it will be treated as not None, but after tabix indexing it will not.
I'm cool with 'don't do that' as a valid answer if it wasn't intended to be used that way. :)
Thanks!
The text was updated successfully, but these errors were encountered: