-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bgzf/index: fix chunk reader iteration #14
Conversation
I don't grok why this works and why it didn't before. |
The file offset part of a virtual offset is the unique identifier for a BGZF block. I was using the end block of the last read (which may actually be the next block - so the condition failed, and we did not truncate the []byte to the correct length - thus read potentially up to the end of the BGZF block). The logic is still weird for me and I'm looking into it further. |
I've improved the clarity, properly terminate in the case of non-scanner reading (it was covering for an error in my logic which is now fixed). But please run the new code on your complete set to see if you find any other cases. PTAL |
9e97d38
to
f452016
Compare
// length of the remaining chunk data. | ||
var cursor uint16 | ||
if last.Begin.File == r.chunks[0].Begin.File { | ||
cursor = last.End.Block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I change this to:
cursor = r.chunks[0].Begin.Block
Then my tests (doing tons of intersections and comparing to tabix CLI tool) are getting a lot farther, but still failing eventually. I think it's in the block below for switching chunks to early??
PTAL I resorted to paper and pencil after constructing tests. |
ccf4e26
to
6b0377a
Compare
LGTM |
Nope, not yet. |
There is a remaining issue, though it doesn't break things, it just results in unused reads. The fix requires new API in bgzf.Reader. |
No tests. The test needs to come, but it requires engineering a set of blocks and index chunks to mimic the output shown in #10.
@brentp Please take a look.