-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: solidify start #63
base: master
Are you sure you want to change the base?
Conversation
I think the code part of this is ready to merge (just had to find where bed intervals were converted to regions). |
Hi @mrvollger , You could test this, but we really need Hao (@38) to have a look as I don't fully understand the datastructures used here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @brentp, could you please see my comment. I don't quite understand why the change is related to ssio/reader.rs.
Also, please don't mind the long delay. I finally got my internet service set up at my new place.
d4/src/ssio/reader.rs
Outdated
@@ -147,6 +149,8 @@ impl<R: Read + Seek> D4TrackReader<R> { | |||
)); | |||
} | |||
} | |||
} else { | |||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this introduce new bug. Basically this is the part that creates a view to secondary table partitions that contains the data points that is related to the query.
If break at this point, I believe this will introduce bug in some query for large interval which is larger than a secondary table partition.
So I am a little bit confused if the issue is just change the interval convention, what is the problem that needs to be fixed at this point.
To help me understand the change better, could you please give me some data that triggers a buggy output.
Thanks,
Hao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @brentp, just let you know I've fixed issue #59 by this commit 000f9e6. And the expected output was all zero values. I explained roughly what is going on under issue #59. And I believe for this PR, you don't need to change the ssio/reader.rs implementation anymore. Just simply minus one should do the job. Please let me know if you have any questions. Thanks, |
ddda3c6
to
561ecd0
Compare
d4/src/ssio/reader.rs
Outdated
if overlap_begin < overlap_end { | ||
if overlap_begin == table_ref.begin || self.sfi.is_none() { | ||
if overlap_begin + 1 == table_ref.begin || self.sfi.is_none() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@38 do you think this +1 belongs here? I've forgotten all my previous testing, but I htink this might be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need +1 at this point. Basically what is happening at this point is, we check if we need to read a secondary partition from the middle of the stream. When overlap_begin equals stream's begin, it means we can read the partition from the begin, therefore no need to visit the index.
I suspect what you previously seen is unrelated to this part, but might be another bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reverted all changes in ssio/reader.rs
d4 uses region syntax like: chr1:1-100, which is generally 1-based such that this region would translate to
chr1\t0\t100
in BED format.The internals of d4 are unclear, but I think this change is a start.
@38 could you have a look at the changes in ssio/reader.rs, the break might not work if things are not sorted (though they appear to be). The change of the if statement at line 136/137 is related to the +/- 1 used internally within d4 (which I don't yet understand).
This change passes all tests and fixes the problems seen in #59