-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change to default minoverlap in subsetByOverlaps() feels unintuitive #1
Comments
Hi Pete, |
Thanks, Hervé! I appreciate you're trying to balance two different edge cases. I guess I'm unclear why the default |
That's a good point. It would indeed be better to use the same default everywhere. Right now |
I agree that the impact of changing My preference would be retaining the old default of |
I agree about not changing the behavior unless absolutely necessary. No one would ever expect |
Default behavior on zero-width ranges has to change because it's broken. After putting more thoughts into this I'm leaning towards having |
Is it really broken or do we just need a convenient way to find overlaps for single positions? One option is to simply |
Broken. IMO [7,6] and [1,10] should be considered to overlap (by default) even though the intersection is empty (BTW a zero-width range is not a single position). The criteria for overlap should not be that there is a non-empty intersection between the 2 ranges. I think it's more useful to look at the position of one range w.r.t. the other. In other words, a more sensible (default) criteria is With this criteria:
The last 2 situations are extreme edge cases where the relative position of the 2 ranges is ambiguous: you can either consider that the 1st range is inside the other or that it's adjacent to it. There is no unique answer and different people will have a different opinion on that. Anyway the Using |
Ok, but wow, those default parameters are not intuitive at all, unfortunately. I have no intuition what a maxgap of -1 would mean, and it's strange to see the minoverlap defaulting to 0. |
Maybe that's because you've been with a mindset where overlap means non-empty intersection for too long? The fact that 10 years ago you chose these defaults for |
I guess |
A gap > 0 means the 2 ranges are disjoint and separated by 1 or more nucleotides. A gap of 0 means they are adjacent. By extrapolation a gap of -1 just means the 2 ranges are "even closer than adjacent" i.e. one range has its start or end strictly inside the other. So An alternative would be to add an extra argument to |
I made the change in IRanges 2.11.16, GenomicRanges 1.29.14, GenomicAlignments 1.13.6, and SummarizedExperiment 1.7.8. Hard to anticipate the impact of this change. Hopefully it won't be too bad... @PeteHaitch Modifying the definitions of the |
@hpages I've updated GenomicTuples and it builds locally without error. Will push to BioC ASAP |
Sounds good! |
The change of the default value of
minoverlap
to be 1 forsubsetByOverlaps()
feels really unintuitive to me (78ed68a). E.g., these used to have the same default behaviour (which felt intuitive) but no longer do:Might there a way to get the 'correct' behaviour for zero width ranges in
subsetByOverlaps()
without changing the default? I'd expect more people to be confused by the new behaviour than by dropped zero-width ranges.Cheers,
Pete
The text was updated successfully, but these errors were encountered: