Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up ReferenceRegion.scala and add thresholded overlap and covers #1484

Merged

Conversation

@devin-petersohn
Copy link
Member

@devin-petersohn devin-petersohn commented Apr 10, 2017

Partially resolves #1473 and fully resolves #1474.

@coveralls
Copy link

@coveralls coveralls commented Apr 10, 2017

Coverage Status

Coverage increased (+0.1%) to 81.779% when pulling 6855c05 on devin-petersohn:issue#1473ThresholdOverlaps into 93b32c6 on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Apr 10, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1939/
Test PASSed.

@devin-petersohn
Copy link
Member Author

@devin-petersohn devin-petersohn commented Apr 10, 2017

Once this is merged in, I will update #1324 to reflect the ability to use thresholds to join.

@coveralls
Copy link

@coveralls coveralls commented Apr 10, 2017

Coverage Status

Coverage increased (+0.2%) to 81.8% when pulling af70a8d on devin-petersohn:issue#1473ThresholdOverlaps into 93b32c6 on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Apr 10, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1940/
Test PASSed.

@devin-petersohn devin-petersohn changed the title Issue#1473 threshold overlaps Clean up ReferenceRegion.scala and add thresholded overlap and covers Apr 10, 2017
result = 41 * result + (if (strand != null) strand.ordinal() else 0)
result
// add 37 here because this is the start of the hashCode
val nameHashCode = 37 + {

This comment has been minimized.

@heuermh

heuermh Apr 10, 2017
Member

http://docs.oracle.com/javase/8/docs/api/java/util/Objects.html#hash-java.lang.Object...- handles all of this reasonably in a one-liner.

If performance is a concern, and if ReferenceRegion is immutable, the hash code could be computed in the ctr and assigned to a final Int field.

This comment has been minimized.

@devin-petersohn

devin-petersohn Apr 11, 2017
Author Member

I don't know why we are implementing our own hashCode here, I'm just getting rid of the var. @fnothaft any ideas why we have a custom hashCode here?

This comment has been minimized.

@fnothaft

fnothaft Apr 11, 2017
Member

Case classes have an auto-genned hashCode. I don't think we should be implementing one here. That said, what's the git blame for the hashCode function say? This may be perf sensitive.

This comment has been minimized.

@devin-petersohn

devin-petersohn Apr 11, 2017
Author Member

c4b23e0

It might have been a performance issue. @heuermh asked the same question in 2015. Interesting.

This comment has been minimized.

@heuermh

heuermh Apr 11, 2017
Member

The issue was #817

This comment has been minimized.

@fnothaft

fnothaft Apr 11, 2017
Member

Reading the commit message, that method was necessary to fix issue #817, which was a regression caused by 9a318ba from PR #624. I would say it definitely was not a performance issue... If we do delete the hash code method, we need to validate that it doesn't cause #817 to reappear.

This comment has been minimized.

@devin-petersohn

devin-petersohn Apr 17, 2017
Author Member

@fnothaft and I discussed today that a change to this should probably be in a different commit so it is easier to roll back in the case that things are broken.

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Apr 15, 2017

Hi @devin-petersohn! Is this ready for another round of review?

@devin-petersohn
Copy link
Member Author

@devin-petersohn devin-petersohn commented Apr 16, 2017

I have actually encountered another couple of distance based operations that belong here. Everything is ready, but I want to make sure there aren't any more that I need before I ask for another review round. I will let you know as soon as this is done.

@coveralls
Copy link

@coveralls coveralls commented Apr 17, 2017

Coverage Status

Coverage decreased (-0.03%) to 81.603% when pulling cdcb926 on devin-petersohn:issue#1473ThresholdOverlaps into 93b32c6 on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Apr 17, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1953/
Test PASSed.

@devin-petersohn devin-petersohn requested review from heuermh and fnothaft Apr 18, 2017
ReferenceRegion(referenceName, max(start, region.start), min(end, region.end))
def intersection(other: ReferenceRegion): ReferenceRegion = {
require(overlaps(other), "Cannot calculate the intersection of non-overlapping regions.")
ReferenceRegion(referenceName, max(start, other.start), min(end, other.end))

This comment has been minimized.

@fnothaft

fnothaft Apr 19, 2017
Member

We should set the strand here, no?

This comment has been minimized.

@fnothaft

fnothaft Apr 19, 2017
Member

Also, thoughts on this method just calling def intersection(other: ReferenceRegion, minOverlap: Long) with minOverlap = 0?

This comment has been minimized.

@devin-petersohn

devin-petersohn Apr 19, 2017
Author Member

+1 on the default, and yes good catch. We weren't setting the strand before, but we definitely should be.

*/
def intersection(other: ReferenceRegion, minOverlap: Long): ReferenceRegion = {
require(overlapsBy(other).exists(_ >= minOverlap), "Other region does not meet minimum overlap provided")
ReferenceRegion(referenceName, max(start, other.start), min(end, other.end))

This comment has been minimized.

@fnothaft

fnothaft Apr 19, 2017
Member

Ditto here WRT setting strand.

def hull(other: ReferenceRegion): ReferenceRegion = {
require(sameStrand(other), "Cannot compute convex hull of differently oriented regions.")
require(sameReferenceName(other), "Cannot compute convex hull of regions on different references.")
ReferenceRegion(referenceName, min(start, other.start), max(end, other.end))

This comment has been minimized.

@fnothaft

fnothaft Apr 19, 2017
Member

Ditto here WRT setting strand.

/**
* Returns whether two regions are nearby.
*
* Nearby regions may overlap, and have a thresholded distance between the

This comment has been minimized.

@fnothaft

fnothaft Apr 19, 2017
Member

Wording is a bit funny, suggest:

Two regions are near each other if the distance between the two is less than the user provided distanceThreshold.

* region in the reference space.
*
* @note Distance here is defined as the minimum distance between any point
* within this region, and any point within the other region we are measuring

This comment has been minimized.

@fnothaft

fnothaft Apr 19, 2017
Member

Indent.

* @return True if any section of the two regions overlap.
*/
def covers(other: ReferenceRegion, threshold: Long): Boolean = {
covers(other) || disorient.isNearby(other.disorient, threshold)

This comment has been minimized.

@fnothaft

fnothaft Apr 19, 2017
Member

If you're going to disorient the regions, then reuse those and replace the covers(other) call, e.g.:

val thisDis = disorient
val otherDis = other.disorient

thisDis.overlaps(otherDis) || thisDis.isNearby(otherDis, threshold)

Also, doesn't isNearby return true if overlaps would return true?

* @param other The other region.
* @return True if the two are on the same strand, false otherwise
*/
def sameStrand(other: ReferenceRegion): Boolean = {

This comment has been minimized.

@fnothaft

fnothaft Apr 19, 2017
Member

I don't think this really warrants being pulled out as a method. First, we're just replacing a field equality comparison. Second, we call this method many times when invariant checking. It seems silly to care about, but function dispatch overhead is real.

This comment has been minimized.

@devin-petersohn

devin-petersohn Apr 19, 2017
Author Member

It was changed to this for readability. Won't the compiler just replace the method call with the equality statement inline anyway?

This comment has been minimized.

@fnothaft

fnothaft Apr 19, 2017
Member

JIT could, but you would have to make the method final (or private). Even then, I am not sure that JIT could inline the function since other is a ReferenceRegion which is not a final class. I'm not sure about that one though; the JVM doesn't have a clear contract around inlining.

* @param other The other region.
* @return True if the two are on the same reference name, false otherwise.
*/
def sameReferenceName(other: ReferenceRegion): Boolean = {

This comment has been minimized.

@fnothaft

fnothaft Apr 19, 2017
Member

Ditto here.

result
// add 37 here because this is the start of the hashCode
val nameHashCode = 37 + {
if (referenceName != null) {

This comment has been minimized.

@fnothaft

fnothaft Apr 19, 2017
Member

For both referenceName and strand, we should require them to be non-null on initialization and omit the checks here.

This comment has been minimized.

This comment has been minimized.

@devin-petersohn

devin-petersohn Apr 24, 2017
Author Member

What about unmapped reads? Should we default the referenceName?

This comment has been minimized.

@fnothaft

fnothaft Apr 24, 2017
Member

We throw an IllegalArgumentException on unmapped reads.

This comment has been minimized.

@devin-petersohn

devin-petersohn Apr 24, 2017
Author Member

Should we change this test code then? I am getting failures associated with creation where referenceName = null, but the test is not expecting failure.

This comment has been minimized.

@fnothaft

fnothaft Apr 24, 2017
Member

Can you paste the stack trace you're getting? The duplicate marker shouldn't be setting referenceName = null; we do something else to deskew unmapped reads.

This comment has been minimized.

@devin-petersohn

devin-petersohn Apr 24, 2017
Author Member

- unmapped reads *** FAILED *** org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 1 times, most recent failure: Lost task 1.0 in stage 1.0 (TID 5, localhost): java.lang.IllegalArgumentException: requirement failed: Failed when trying to create region null 0 1 on INDEPENDENT strand.
at scala.Predef$.require(Predef.scala:233)
at org.bdgenomics.adam.models.ReferenceRegion.(ReferenceRegion.scala:324)
at org.bdgenomics.adam.models.ReferencePosition.(ReferencePosition.scala:133)
at org.bdgenomics.adam.models.ReferencePosition$.apply(ReferencePosition.scala:111)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1.org$bdgenomics$adam$rdd$read$ReferencePositionPair$$anonfun$$getPos$1(ReferencePositionPair.scala:53)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1$$anonfun$apply$2.apply(ReferencePositionPair.scala:59)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1$$anonfun$apply$2.apply(ReferencePositionPair.scala:59)
at scala.Option.map(Option.scala:145)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1.apply(ReferencePositionPair.scala:59)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1.apply(ReferencePositionPair.scala:43)
at scala.Option.fold(Option.scala:157)
at org.apache.spark.rdd.Timer.time(Timer.scala:48)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$.apply(ReferencePositionPair.scala:43)
at org.bdgenomics.adam.rdd.read.MarkDuplicates$$anonfun$markBuckets$1.apply(MarkDuplicates.scala:114)
at org.bdgenomics.adam.rdd.read.MarkDuplicates$$anonfun$markBuckets$1.apply(MarkDuplicates.scala:114)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$56.apply(RDD.scala:1492)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$56.apply(RDD.scala:1492)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
at org.bdgenomics.adam.rdd.read.MarkDuplicatesSuite.org$bdgenomics$adam$rdd$read$MarkDuplicatesSuite$$markDuplicates(MarkDuplicatesSuite.scala:102)
at org.bdgenomics.adam.rdd.read.MarkDuplicatesSuite$$anonfun$6.apply$mcV$sp(MarkDuplicatesSuite.scala:158)
at org.bdgenomics.utils.misc.SparkFunSuite$$anonfun$sparkTest$1.apply$mcV$sp(SparkFunSuite.scala:102)
at org.bdgenomics.utils.misc.SparkFunSuite$$anonfun$sparkTest$1.apply(SparkFunSuite.scala:98)
at org.bdgenomics.utils.misc.SparkFunSuite$$anonfun$sparkTest$1.apply(SparkFunSuite.scala:98)
at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
at org.bdgenomics.adam.util.ADAMFunSuite.org$scalatest$BeforeAndAfter$$super$runTest(ADAMFunSuite.scala:24)
at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
at org.bdgenomics.adam.util.ADAMFunSuite.runTest(ADAMFunSuite.scala:24)
at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
at org.scalatest.Suite$class.run(Suite.scala:1424)
at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
at org.bdgenomics.adam.util.ADAMFunSuite.org$scalatest$BeforeAndAfter$$super$run(ADAMFunSuite.scala:24)
at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
at org.bdgenomics.adam.util.ADAMFunSuite.run(ADAMFunSuite.scala:24)
at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
at org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
at org.scalatest.Suite$class.run(Suite.scala:1421)
at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
at org.scalatest.tools.Runner$.main(Runner.scala:860)
at org.scalatest.tools.Runner.main(Runner.scala)
Cause: java.lang.IllegalArgumentException: requirement failed: Failed when trying to create region null 0 1 on INDEPENDENT strand.
at scala.Predef$.require(Predef.scala:233)
at org.bdgenomics.adam.models.ReferenceRegion.(ReferenceRegion.scala:324)
at org.bdgenomics.adam.models.ReferencePosition.(ReferencePosition.scala:133)
at org.bdgenomics.adam.models.ReferencePosition$.apply(ReferencePosition.scala:111)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1.org$bdgenomics$adam$rdd$read$ReferencePositionPair$$anonfun$$getPos$1(ReferencePositionPair.scala:53)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1$$anonfun$apply$2.apply(ReferencePositionPair.scala:59)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1$$anonfun$apply$2.apply(ReferencePositionPair.scala:59)
at scala.Option.map(Option.scala:145)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1.apply(ReferencePositionPair.scala:59)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$$anonfun$apply$1.apply(ReferencePositionPair.scala:43)
at scala.Option.fold(Option.scala:157)
at org.apache.spark.rdd.Timer.time(Timer.scala:48)
at org.bdgenomics.adam.rdd.read.ReferencePositionPair$.apply(ReferencePositionPair.scala:43)
at org.bdgenomics.adam.rdd.read.MarkDuplicates$$anonfun$markBuckets$1.apply(MarkDuplicates.scala:114)
at org.bdgenomics.adam.rdd.read.MarkDuplicates$$anonfun$markBuckets$1.apply(MarkDuplicates.scala:114)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$56.apply(RDD.scala:1492)
at org.apache.spark.rdd.RDD$$anonfun$keyBy$1$$anonfun$apply$56.apply(RDD.scala:1492)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

This comment has been minimized.

@fnothaft

fnothaft Apr 24, 2017
Member

Ah, I see what's going on. The createUnmappedRead function in the test code is creating reads without sequences. Can you update that function to set the sequence on the read (doesn't matter what the sequence is, just has to be present)? We should probably add a require(read.getSequence != null right before here.

0
}

val listOfHashCodes = List(nameHashCode, start.hashCode, end.hashCode, strandHashCode)

This comment has been minimized.

@fnothaft

fnothaft Apr 19, 2017
Member

hashCode is in general a performance critical operation. As such, allocating and then folding over a list is to be avoided. Why not just write out the multiply/sum chain?

This comment has been minimized.

@devin-petersohn

devin-petersohn Apr 19, 2017
Author Member

I guess my functional programming side was working on this.

@coveralls
Copy link

@coveralls coveralls commented Apr 24, 2017

Coverage Status

Coverage increased (+0.3%) to 81.928% when pulling 5a0e19e on devin-petersohn:issue#1473ThresholdOverlaps into 93b32c6 on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Apr 24, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1958/
Test PASSed.

@coveralls
Copy link

@coveralls coveralls commented Apr 24, 2017

Coverage Status

Coverage increased (+0.3%) to 81.959% when pulling 8a2e011 on devin-petersohn:issue#1473ThresholdOverlaps into 93b32c6 on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Apr 24, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1959/
Test PASSed.

Copy link
Member

@heuermh heuermh left a comment

LGTM, suggested a few minor doc fixes

*
* @param region Region to intersect with.
* @return A smaller reference region.
* @throws IllegalArgumentException Thrown if regions are no within the distance threshold.

This comment has been minimized.

@heuermh

heuermh Apr 24, 2017
Member

no → not

* overlap.
*
* @param other Region to intersect with.
* @param minOverlap The mimimum amount of positions that the two regions

This comment has been minimized.

@heuermh

heuermh Apr 24, 2017
Member

The minimum amount of positions... → Minimum overlap between the two reference regions.

This comment has been minimized.

@fnothaft

fnothaft Apr 24, 2017
Member

mimimum -> minimum

assert(referenceName == region.referenceName, "Cannot compute convex hull of regions on different references.")
ReferenceRegion(referenceName, min(start, region.start), max(end, region.end))
def hull(other: ReferenceRegion): ReferenceRegion = {
require(sameStrand(other), "Cannot compute convex hull of differently oriented regions.")

This comment has been minimized.

@heuermh

heuermh Apr 24, 2017
Member

For require messages, use phrases or sentences or exclamations, but not all three :)

def subtract(other: ReferenceRegion, requireStranded: Boolean = false): List[ReferenceRegion] = {
val newRegionStrand =
if (requireStranded) {
require(overlaps(other), "Region $other is not overlapped by $this!")

This comment has been minimized.

@heuermh

heuermh Apr 24, 2017
Member

... I only mention this because of my six yr old's homework last night: Choose [., ?, !] for each of these sentences.

This comment has been minimized.

@fnothaft

fnothaft Apr 24, 2017
Member

I would generally go with .

Copy link
Member

@fnothaft fnothaft left a comment

Few small docs suggestions! Otherwise LGTM!

* overlap.
*
* @param other Region to intersect with.
* @param minOverlap The mimimum amount of positions that the two regions

This comment has been minimized.

@fnothaft

fnothaft Apr 24, 2017
Member

mimimum -> minimum

*
* @param other Region to compare against.
* @return Returns an option containing the number of positions of coverage
* between two points. If the two regions do not cover, we return

This comment has been minimized.

@fnothaft

fnothaft Apr 24, 2017
Member

cover, -> cover each other,

@@ -503,19 +600,80 @@ case class ReferenceRegion(
}

/**
* Determines if two regions are colocated on the same strand.

This comment has been minimized.

@fnothaft

fnothaft Apr 24, 2017
Member

I would drop the colocated -> Determines if two regions are on the same strand.

}

/**
* Determines if two regions are colocated on the same reference name.

This comment has been minimized.

@fnothaft

fnothaft Apr 24, 2017
Member

colocated on the same reference name. -> on the same contig.

def subtract(other: ReferenceRegion, requireStranded: Boolean = false): List[ReferenceRegion] = {
val newRegionStrand =
if (requireStranded) {
require(overlaps(other), "Region $other is not overlapped by $this!")

This comment has been minimized.

@fnothaft

fnothaft Apr 24, 2017
Member

I would generally go with .

* @param requireStranded Whether or not to require other be on same strand.
* @return A list containing the regions resulting from the subtraction.
*/
def subtract(other: ReferenceRegion, requireStranded: Boolean = false): List[ReferenceRegion] = {

This comment has been minimized.

@fnothaft

fnothaft Apr 24, 2017
Member

Prefer Seq (or, even better, Iterable) to List in public APIs.

Strand.INDEPENDENT
}
val first = if (other.start > start) {
List(ReferenceRegion(referenceName,

This comment has been minimized.

@fnothaft

fnothaft Apr 24, 2017
Member

Yeah, actually, just List -> Iterable these few ones. And probably List() -> Iterable.empty.

More cleanup and adding isNearby

Adding threshold logic to overlaps and covers

Replacing var with fold call

Adding overlapsBy and coversBy and related tests

Adding subtract, addressing reviewer comments

Addressing reviewer comments

Addressing reviewer comments
@devin-petersohn devin-petersohn force-pushed the devin-petersohn:issue#1473ThresholdOverlaps branch from 8a2e011 to ebfc510 Apr 24, 2017
@devin-petersohn
Copy link
Member Author

@devin-petersohn devin-petersohn commented Apr 24, 2017

Squashed and rebased. Thanks for the feedback @fnothaft and @heuermh!

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Apr 24, 2017

Thanks @devin-petersohn! I will merge this once tests pass.

@coveralls
Copy link

@coveralls coveralls commented Apr 24, 2017

Coverage Status

Coverage decreased (-0.06%) to 81.726% when pulling ebfc510 on devin-petersohn:issue#1473ThresholdOverlaps into 1fb57a3 on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Apr 24, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1960/
Test PASSed.

@fnothaft fnothaft merged commit 5b6a109 into bigdatagenomics:master Apr 24, 2017
2 of 3 checks passed
2 of 3 checks passed
coverage/coveralls Coverage decreased (-0.06%) to 81.726%
Details
codacy/pr Good work! A positive pull request.
Details
default Merged build finished.
Details
@fnothaft
Copy link
Member

@fnothaft fnothaft commented Apr 24, 2017

Merged! Thanks @devin-petersohn!

@heuermh heuermh added this to the 0.23.0 milestone Dec 7, 2017
@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
5 participants
You can’t perform that action at this time.