-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up ReferenceRegion.scala and add thresholded overlap and covers #1484
Clean up ReferenceRegion.scala and add thresholded overlap and covers #1484
Conversation
Test PASSed. |
Once this is merged in, I will update #1324 to reflect the ability to use thresholds to join. |
Test PASSed. |
result = 41 * result + (if (strand != null) strand.ordinal() else 0) | ||
result | ||
// add 37 here because this is the start of the hashCode | ||
val nameHashCode = 37 + { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://docs.oracle.com/javase/8/docs/api/java/util/Objects.html#hash-java.lang.Object...- handles all of this reasonably in a one-liner.
If performance is a concern, and if ReferenceRegion is immutable, the hash code could be computed in the ctr and assigned to a final Int
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why we are implementing our own hashCode here, I'm just getting rid of the var. @fnothaft any ideas why we have a custom hashCode here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case classes have an auto-genned hashCode. I don't think we should be implementing one here. That said, what's the git blame for the hashCode function say? This may be perf sensitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was #817
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fnothaft and I discussed today that a change to this should probably be in a different commit so it is easier to roll back in the case that things are broken.
Hi @devin-petersohn! Is this ready for another round of review? |
I have actually encountered another couple of distance based operations that belong here. Everything is ready, but I want to make sure there aren't any more that I need before I ask for another review round. I will let you know as soon as this is done. |
Test PASSed. |
ReferenceRegion(referenceName, max(start, region.start), min(end, region.end)) | ||
def intersection(other: ReferenceRegion): ReferenceRegion = { | ||
require(overlaps(other), "Cannot calculate the intersection of non-overlapping regions.") | ||
ReferenceRegion(referenceName, max(start, other.start), min(end, other.end)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should set the strand here, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, thoughts on this method just calling def intersection(other: ReferenceRegion, minOverlap: Long)
with minOverlap = 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on the default, and yes good catch. We weren't setting the strand before, but we definitely should be.
*/ | ||
def intersection(other: ReferenceRegion, minOverlap: Long): ReferenceRegion = { | ||
require(overlapsBy(other).exists(_ >= minOverlap), "Other region does not meet minimum overlap provided") | ||
ReferenceRegion(referenceName, max(start, other.start), min(end, other.end)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here WRT setting strand.
def hull(other: ReferenceRegion): ReferenceRegion = { | ||
require(sameStrand(other), "Cannot compute convex hull of differently oriented regions.") | ||
require(sameReferenceName(other), "Cannot compute convex hull of regions on different references.") | ||
ReferenceRegion(referenceName, min(start, other.start), max(end, other.end)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here WRT setting strand.
/** | ||
* Returns whether two regions are nearby. | ||
* | ||
* Nearby regions may overlap, and have a thresholded distance between the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wording is a bit funny, suggest:
Two regions are near each other if the distance between the two is less than the user provided distanceThreshold.
* region in the reference space. | ||
* | ||
* @note Distance here is defined as the minimum distance between any point | ||
* within this region, and any point within the other region we are measuring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent.
* @return True if any section of the two regions overlap. | ||
*/ | ||
def covers(other: ReferenceRegion, threshold: Long): Boolean = { | ||
covers(other) || disorient.isNearby(other.disorient, threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to disorient the regions, then reuse those and replace the covers(other)
call, e.g.:
val thisDis = disorient
val otherDis = other.disorient
thisDis.overlaps(otherDis) || thisDis.isNearby(otherDis, threshold)
Also, doesn't isNearby
return true if overlaps
would return true?
* @param other The other region. | ||
* @return True if the two are on the same strand, false otherwise | ||
*/ | ||
def sameStrand(other: ReferenceRegion): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this really warrants being pulled out as a method. First, we're just replacing a field equality comparison. Second, we call this method many times when invariant checking. It seems silly to care about, but function dispatch overhead is real.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was changed to this for readability. Won't the compiler just replace the method call with the equality statement inline anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JIT could, but you would have to make the method final
(or private
). Even then, I am not sure that JIT could inline the function since other
is a ReferenceRegion
which is not a final class
. I'm not sure about that one though; the JVM doesn't have a clear contract around inlining.
* @param other The other region. | ||
* @return True if the two are on the same reference name, false otherwise. | ||
*/ | ||
def sameReferenceName(other: ReferenceRegion): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here.
result | ||
// add 37 here because this is the start of the hashCode | ||
val nameHashCode = 37 + { | ||
if (referenceName != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For both referenceName
and strand
, we should require them to be non-null on initialization and omit the checks here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about unmapped reads? Should we default the referenceName
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We throw an IllegalArgumentException
on unmapped reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change this test code then? I am getting failures associated with creation where referenceName = null
, but the test is not expecting failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you paste the stack trace you're getting? The duplicate marker shouldn't be setting referenceName = null
; we do something else to deskew unmapped reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- unmapped reads *** FAILED *** org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 1 times, most recent failure: Lost task 1.0 in stage 1.0 (TID 5, localhost): java.lang.IllegalArgumentException: requirement failed: Failed when trying to create region null 0 1 on INDEPENDENT strand.
at scala.Predef$.require(Predef.scala:233)
at org.bdgenomics.adam.models.ReferenceRegion.(ReferenceRegion.scala:324)
at org.bdgenomics.adam.models.ReferencePosition.(ReferencePosition.scala:133)
at org.bdgenomics.adam.models.ReferencePosition$.apply(ReferencePosition.scala:111)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1.org$bdgenomics$adam$rdd$read$ReferencePositionPair$$anonfun$$getPos$1(ReferencePositionPair.scala:53)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1$$anonfun$apply$2.apply(ReferencePositionPair.scala:59)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1$$anonfun$apply$2.apply(ReferencePositionPair.scala:59)
at scala.Option.map(Option.scala:145)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1.apply(ReferencePositionPair.scala:59)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1.apply(ReferencePositionPair.scala:43)
at scala.Option.fold(Option.scala:157)
at org.apache.spark.rdd.Timer.time(Timer.scala:48)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$.apply(ReferencePositionPair.scala:43)
at org.bdgenomics.adam.rdd.read.MarkDuplicates$$anonfun$markBuckets$1.apply(MarkDuplicates.scala:114)
at org.bdgenomics.adam.rdd.read.MarkDuplicates$$anonfun$markBuckets$1.apply(MarkDuplicates.scala:114)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$56.apply(RDD.scala:1492)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$56.apply(RDD.scala:1492)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
at org.bdgenomics.adam.rdd.read.MarkDuplicatesSuite.org$bdgenomics$adam$rdd$read$MarkDuplicatesSuite$$markDuplicates(MarkDuplicatesSuite.scala:102)
at org.bdgenomics.adam.rdd.read.MarkDuplicatesSuite$$anonfun$6.apply$mcV$sp(MarkDuplicatesSuite.scala:158)
at org.bdgenomics.utils.misc.SparkFunSuite$$anonfun$sparkTest$1.apply$mcV$sp(SparkFunSuite.scala:102)
at org.bdgenomics.utils.misc.SparkFunSuite$$anonfun$sparkTest$1.apply(SparkFunSuite.scala:98)
at org.bdgenomics.utils.misc.SparkFunSuite$$anonfun$sparkTest$1.apply(SparkFunSuite.scala:98)
at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
at org.bdgenomics.adam.util.ADAMFunSuite.org$scalatest$BeforeAndAfter$$super$runTest(ADAMFunSuite.scala:24)
at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
at org.bdgenomics.adam.util.ADAMFunSuite.runTest(ADAMFunSuite.scala:24)
at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
at org.scalatest.Suite$class.run(Suite.scala:1424)
at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
at org.bdgenomics.adam.util.ADAMFunSuite.org$scalatest$BeforeAndAfter$$super$run(ADAMFunSuite.scala:24)
at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
at org.bdgenomics.adam.util.ADAMFunSuite.run(ADAMFunSuite.scala:24)
at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
at org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
at org.scalatest.Suite$class.run(Suite.scala:1421)
at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
at org.scalatest.tools.Runner$.main(Runner.scala:860)
at org.scalatest.tools.Runner.main(Runner.scala)
Cause: java.lang.IllegalArgumentException: requirement failed: Failed when trying to create region null 0 1 on INDEPENDENT strand.
at scala.Predef$.require(Predef.scala:233)
at org.bdgenomics.adam.models.ReferenceRegion.(ReferenceRegion.scala:324)
at org.bdgenomics.adam.models.ReferencePosition.(ReferencePosition.scala:133)
at org.bdgenomics.adam.models.ReferencePosition$.apply(ReferencePosition.scala:111)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1.org$bdgenomics$adam$rdd$read$ReferencePositionPair$$anonfun$$getPos$1(ReferencePositionPair.scala:53)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1$$anonfun$apply$2.apply(ReferencePositionPair.scala:59)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1$$anonfun$apply$2.apply(ReferencePositionPair.scala:59)
at scala.Option.map(Option.scala:145)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1.apply(ReferencePositionPair.scala:59)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1.apply(ReferencePositionPair.scala:43)
at scala.Option.fold(Option.scala:157)
at org.apache.spark.rdd.Timer.time(Timer.scala:48)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$.apply(ReferencePositionPair.scala:43)
at org.bdgenomics.adam.rdd.read.MarkDuplicates$$anonfun$markBuckets$1.apply(MarkDuplicates.scala:114)
at org.bdgenomics.adam.rdd.read.MarkDuplicates$$anonfun$markBuckets$1.apply(MarkDuplicates.scala:114)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$56.apply(RDD.scala:1492)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$56.apply(RDD.scala:1492)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what's going on. The createUnmappedRead function in the test code is creating reads without sequences. Can you update that function to set the sequence on the read (doesn't matter what the sequence is, just has to be present)? We should probably add a require(read.getSequence != null
right before here.
0 | ||
} | ||
|
||
val listOfHashCodes = List(nameHashCode, start.hashCode, end.hashCode, strandHashCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hashCode
is in general a performance critical operation. As such, allocating and then folding over a list is to be avoided. Why not just write out the multiply/sum chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my functional programming side was working on this.
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, suggested a few minor doc fixes
* | ||
* @param region Region to intersect with. | ||
* @return A smaller reference region. | ||
* @throws IllegalArgumentException Thrown if regions are no within the distance threshold. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no → not
* overlap. | ||
* | ||
* @param other Region to intersect with. | ||
* @param minOverlap The mimimum amount of positions that the two regions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minimum amount of positions... → Minimum overlap between the two reference regions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mimimum
-> minimum
assert(referenceName == region.referenceName, "Cannot compute convex hull of regions on different references.") | ||
ReferenceRegion(referenceName, min(start, region.start), max(end, region.end)) | ||
def hull(other: ReferenceRegion): ReferenceRegion = { | ||
require(sameStrand(other), "Cannot compute convex hull of differently oriented regions.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For require messages, use phrases or sentences or exclamations, but not all three :)
def subtract(other: ReferenceRegion, requireStranded: Boolean = false): List[ReferenceRegion] = { | ||
val newRegionStrand = | ||
if (requireStranded) { | ||
require(overlaps(other), "Region $other is not overlapped by $this!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I only mention this because of my six yr old's homework last night: Choose [., ?, !] for each of these sentences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would generally go with .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few small docs suggestions! Otherwise LGTM!
* overlap. | ||
* | ||
* @param other Region to intersect with. | ||
* @param minOverlap The mimimum amount of positions that the two regions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mimimum
-> minimum
* | ||
* @param other Region to compare against. | ||
* @return Returns an option containing the number of positions of coverage | ||
* between two points. If the two regions do not cover, we return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cover,
-> cover each other,
@@ -503,19 +600,80 @@ case class ReferenceRegion( | |||
} | |||
|
|||
/** | |||
* Determines if two regions are colocated on the same strand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop the colocated
-> Determines if two regions are on the same strand.
} | ||
|
||
/** | ||
* Determines if two regions are colocated on the same reference name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
colocated on the same reference name.
-> on the same contig.
def subtract(other: ReferenceRegion, requireStranded: Boolean = false): List[ReferenceRegion] = { | ||
val newRegionStrand = | ||
if (requireStranded) { | ||
require(overlaps(other), "Region $other is not overlapped by $this!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would generally go with .
* @param requireStranded Whether or not to require other be on same strand. | ||
* @return A list containing the regions resulting from the subtraction. | ||
*/ | ||
def subtract(other: ReferenceRegion, requireStranded: Boolean = false): List[ReferenceRegion] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer Seq
(or, even better, Iterable
) to List
in public APIs.
Strand.INDEPENDENT | ||
} | ||
val first = if (other.start > start) { | ||
List(ReferenceRegion(referenceName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, actually, just List
-> Iterable
these few ones. And probably List()
-> Iterable.empty
.
More cleanup and adding isNearby Adding threshold logic to overlaps and covers Replacing var with fold call Adding overlapsBy and coversBy and related tests Adding subtract, addressing reviewer comments Addressing reviewer comments Addressing reviewer comments
8a2e011
to
ebfc510
Compare
Thanks @devin-petersohn! I will merge this once tests pass. |
Test PASSed. |
Merged! Thanks @devin-petersohn! |
Partially resolves #1473 and fully resolves #1474.