Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: solidify start #63

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

WIP: solidify start #63

wants to merge 4 commits into from

Conversation

brentp
Copy link
Collaborator

@brentp brentp commented Sep 12, 2022

d4 uses region syntax like: chr1:1-100, which is generally 1-based such that this region would translate to chr1\t0\t100 in BED format.
The internals of d4 are unclear, but I think this change is a start.

@38 could you have a look at the changes in ssio/reader.rs, the break might not work if things are not sorted (though they appear to be). The change of the if statement at line 136/137 is related to the +/- 1 used internally within d4 (which I don't yet understand).

This change passes all tests and fixes the problems seen in #59

@brentp
Copy link
Collaborator Author

brentp commented Sep 23, 2022

I think the code part of this is ready to merge (just had to find where bed intervals were converted to regions).

@mrvollger
Copy link
Contributor

@arq5x @38 can this be merged and a new release cut to fix #59, or are there more steps that need to be taken first? Maybe I can help?

@brentp
Copy link
Collaborator Author

brentp commented Nov 16, 2022

Hi @mrvollger , You could test this, but we really need Hao (@38) to have a look as I don't fully understand the datastructures used here.

@mrvollger
Copy link
Contributor

Hi @brentp, I understand, and thanks for making a first pass! I tested your changes this morning and it fixes the issue I have been having with my files. For now I will use your source, and wait for final word from @38.

Thanks!

Copy link
Owner

@38 38 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @brentp, could you please see my comment. I don't quite understand why the change is related to ssio/reader.rs.

Also, please don't mind the long delay. I finally got my internet service set up at my new place.

@@ -147,6 +149,8 @@ impl<R: Read + Seek> D4TrackReader<R> {
));
}
}
} else {
break;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this introduce new bug. Basically this is the part that creates a view to secondary table partitions that contains the data points that is related to the query.

If break at this point, I believe this will introduce bug in some query for large interval which is larger than a secondary table partition.

So I am a little bit confused if the issue is just change the interval convention, what is the problem that needs to be fixed at this point.

To help me understand the change better, could you please give me some data that triggers a buggy output.

Thanks,
Hao

Copy link
Contributor

@mrvollger mrvollger Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @38, in case it helps I link to data that appears to be bugged in issue #59 and walk through the example.

Edit: never mind I see this is unrelated to 59 which you have fixed. Thanks!

@38
Copy link
Owner

38 commented Nov 18, 2022

Hi @brentp, just let you know I've fixed issue #59 by this commit 000f9e6.

And the expected output was all zero values. I explained roughly what is going on under issue #59.

And I believe for this PR, you don't need to change the ssio/reader.rs implementation anymore. Just simply minus one should do the job.

Please let me know if you have any questions.

Thanks,
Hao

@brentp brentp force-pushed the issue59 branch 2 times, most recently from ddda3c6 to 561ecd0 Compare November 18, 2022 16:51
if overlap_begin < overlap_end {
if overlap_begin == table_ref.begin || self.sfi.is_none() {
if overlap_begin + 1 == table_ref.begin || self.sfi.is_none() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@38 do you think this +1 belongs here? I've forgotten all my previous testing, but I htink this might be required.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need +1 at this point. Basically what is happening at this point is, we check if we need to read a secondary partition from the middle of the stream. When overlap_begin equals stream's begin, it means we can read the partition from the begin, therefore no need to visit the index.

I suspect what you previously seen is unrelated to this part, but might be another bug.

Copy link
Collaborator Author

@brentp brentp Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted all changes in ssio/reader.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants