New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow joins on regions that are within a threshold (instead of requiring overlap) #1473

Closed
devin-petersohn opened this Issue Apr 4, 2017 · 2 comments

Comments

Projects
3 participants
@devin-petersohn
Copy link
Member

devin-petersohn commented Apr 4, 2017

In order to comply with BedTools requirements, we need an option to join on data that fits some threshold.

@devin-petersohn devin-petersohn self-assigned this Apr 4, 2017

@fnothaft

This comment has been minimized.

Copy link
Member

fnothaft commented Apr 4, 2017

+1! We used to have something like this, actually. There are myriad ways to do this, but IMO the simplest way is to increase the width of the reference region keys on one side of the join. Not sure if you had an alternative approach in mind; this is but one approach out of many...

@devin-petersohn

This comment has been minimized.

Copy link
Member Author

devin-petersohn commented Apr 6, 2017

I've given this some thought and I believe that using the ReferenceRegion.distance() is the best way to go about it. In my cleanup of ReferenceRegion.scala I am adding an isNearby() method that takes a threshold. This can completely replace the overlaps calls in all cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment