Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-1513] Strandedness for FeatureRDDs #1555

Open
wants to merge 10 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@devin-petersohn
Copy link
Member

commented Jun 1, 2017

This was the lightest change that added the most flexibility for the future.

@coveralls

This comment has been minimized.

Copy link

commented Jun 1, 2017

Coverage Status

Coverage increased (+0.06%) to 82.854% when pulling 2377e9d on devin-petersohn:issue#1513strandedness into b7762c2 on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Copy link

commented Jun 1, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2079/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1555/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 0eb3bcb4d7c660f505f886d16e07dcd7451842d8 # timeout=10Checking out Revision 0eb3bcb4d7c660f505f886d16e07dcd7451842d8 (origin/pr/1555/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 0eb3bcb4d7c660f505f886d16e07dcd7451842d8First time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@coveralls

This comment has been minimized.

Copy link

commented Jun 1, 2017

Coverage Status

Coverage increased (+0.06%) to 82.854% when pulling 094f1e7 on devin-petersohn:issue#1513strandedness into b7762c2 on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Copy link

commented Jun 1, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2080/
Test PASSed.

@fnothaft
Copy link
Member

left a comment

Also, throughout, we need scaladoc updates.

Seq(ReferenceRegion.unstranded(elem))
protected def getReferenceRegions(elem: Feature,
stranded: Boolean = false): Seq[ReferenceRegion] = {
stranded match {

This comment has been minimized.

Copy link
@fnothaft

fnothaft Jun 1, 2017

Member

Don't match on a boolean.

try {
ReferenceRegion.stranded(elem)
} catch {
case e: IllegalArgumentException => ReferenceRegion.unstranded(elem)

This comment has been minimized.

Copy link
@fnothaft

fnothaft Jun 1, 2017

Member

How do we get an IAE here?

This comment has been minimized.

Copy link
@devin-petersohn

devin-petersohn Jun 1, 2017

Author Member

ReferenceRegion.stranded(feature: Feature) has a require that the Feature have stranded data.

This comment has been minimized.

Copy link
@fnothaft

fnothaft Jun 1, 2017

Member

That seems like an erroneous requirement to me. I would suggest that we change that. Alternatively, I'd suggest that we check whether the feature has INDEPENDENT strand prior to calling, instead of catching an exception.

This comment has been minimized.

Copy link
@devin-petersohn

devin-petersohn Jun 1, 2017

Author Member

This requirement was recently added, and I'm not sure that we should remove it. I don't mind getting rid of the try..catch block, but I think that a null field should at least be logged. Maybe ValidationStringency here?

This comment has been minimized.

Copy link
@fnothaft

fnothaft Jun 1, 2017

Member

Ah, I see! I misunderstood that the require would fire if strand was Independent. I'd actually suggest that we not catch here.

@@ -190,7 +190,8 @@ case class FragmentRDD(rdd: RDD[Fragment],
* @param elem The Fragment to get the region from.
* @return Returns all regions covered by this fragment.
*/
protected def getReferenceRegions(elem: Fragment): Seq[ReferenceRegion] = {
protected def getReferenceRegions(elem: Fragment,
stranded: Boolean = false): Seq[ReferenceRegion] = {

This comment has been minimized.

Copy link
@fnothaft

fnothaft Jun 1, 2017

Member

This method should allow stranded behavior.

@@ -232,7 +232,8 @@ case class AlignmentRecordRDD(
* @param elem Read to produce regions for.
* @return The seq of reference regions this read covers.
*/
protected def getReferenceRegions(elem: AlignmentRecord): Seq[ReferenceRegion] = {
protected def getReferenceRegions(elem: AlignmentRecord,
stranded: Boolean = false): Seq[ReferenceRegion] = {

This comment has been minimized.

Copy link
@fnothaft

fnothaft Jun 1, 2017

Member

Ditto here.

@coveralls

This comment has been minimized.

Copy link

commented Jun 1, 2017

Coverage Status

Coverage decreased (-0.08%) to 82.713% when pulling 1f0870c on devin-petersohn:issue#1513strandedness into b7762c2 on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Copy link

commented Jun 1, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2081/
Test PASSed.

* @param elem Fragment to extract a region from.
* @param stranded Whether or not to report stranded data for each

This comment has been minimized.

Copy link
@heuermh

heuermh Jun 5, 2017

Member

True to report stranded data for each NucleotideContigFragment.

@@ -236,9 +236,12 @@ case class CoverageRDD(rdd: RDD[Coverage],
* return a Seq of exactly one ReferenceRegion.
*
* @param elem The Coverage to get an underlying region for.
* @param stranded Whether or not to report stranded data for each Coverage

This comment has been minimized.

Copy link
@heuermh

heuermh Jun 5, 2017

Member

...same as above

@@ -323,12 +323,21 @@ case class FeatureRDD(rdd: RDD[Feature],
}

/**
* Returns all reference regions that overlap this feature.

This comment has been minimized.

Copy link
@heuermh

heuermh Jun 5, 2017

Member

In a different PR, I suggest adding doc to the definition in GenomicRDD, removing doc in all extensions of that method, and using the override keyword. Does that work for Scala like it does in Java (i.e. the @Override annotation)?

@coveralls

This comment has been minimized.

Copy link

commented Jun 13, 2017

Coverage Status

Coverage decreased (-0.4%) to 82.713% when pulling f42238b on devin-petersohn:issue#1513strandedness into ad5ae6d on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Copy link

commented Jun 13, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2098/
Test PASSed.

@coveralls

This comment has been minimized.

Copy link

commented Jun 14, 2017

Coverage Status

Coverage decreased (-0.4%) to 82.713% when pulling 71c2247 on devin-petersohn:issue#1513strandedness into ad5ae6d on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Copy link

commented Jun 14, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2099/
Test PASSed.

@fnothaft

This comment has been minimized.

Copy link
Member

commented Jun 14, 2017

@devin-petersohn can you rebase this and clear the conflicts?

@fnothaft

This comment has been minimized.

Copy link
Member

commented Jun 14, 2017

Also, shouldn't the stranded parameter propagate through to the flattenByReferenceRegion method and all the joins?

@devin-petersohn

This comment has been minimized.

Copy link
Member Author

commented Jun 14, 2017

Yes, it should propagate. I also need to add unit tests for stranded = true.

@devin-petersohn devin-petersohn force-pushed the devin-petersohn:issue#1513strandedness branch from 71c2247 to b0bd896 Jun 15, 2017

@coveralls

This comment has been minimized.

Copy link

commented Jun 15, 2017

Coverage Status

Coverage decreased (-0.6%) to 82.713% when pulling b0bd896 on devin-petersohn:issue#1513strandedness into 152a8ad on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Copy link

commented Jun 15, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2103/
Test PASSed.

@fnothaft
Copy link
Member

left a comment

Few small nits, otherwise LGTM! Thanks @devin-petersohn!

@@ -1185,7 +1190,8 @@ private case class GenericGenomicRDD[T](
IntervalArray(rdd)
}

protected def getReferenceRegions(elem: T): Seq[ReferenceRegion] = {
protected def getReferenceRegions(elem: T,
stranded: Boolean = false): Seq[ReferenceRegion] = {

This comment has been minimized.

Copy link
@fnothaft

fnothaft Jun 24, 2017

Member

In this function, regionFn should be updated to take stranded as a parameter, no?


protected def flattenRddByRegions(): RDD[(ReferenceRegion, T)] = {
protected def flattenRddByRegions(stranded: Boolean = false): RDD[(ReferenceRegion, T)] = {

This comment has been minimized.

Copy link
@fnothaft

fnothaft Jun 24, 2017

Member

A small request here. Can we add an abstract field canBeStranded, and then log a warning message if flattenRddByRegions is called with (!canBeStranded && stranded)?

* @return Sequence of ReferenceRegions extracted from Coverage.
*/
protected def getReferenceRegions(elem: Coverage): Seq[ReferenceRegion] = {
protected def getReferenceRegions(elem: Coverage,
stranded: Boolean = false): Seq[ReferenceRegion] = {

This comment has been minimized.

Copy link
@fnothaft

fnothaft Jun 24, 2017

Member

We do support generating coverage data for independent/positive/negative strands. I don't know if we expose that anywhere. If we don't, can you open a ticket for us to follow up on this? CC @akmorrow13

if (stranded) {
elem.getAlignments
.map(r => {
if (r.getReadMapped && r.getReadNegativeStrand != null) {

This comment has been minimized.

Copy link
@fnothaft

fnothaft Jun 24, 2017

Member

IMO, if you call stranded on reads with readNegativeStrand == null, you should get the IllegalArgumentException, not an unstranded region.

ReferenceRegion.opt(elem).toSeq
protected def getReferenceRegions(elem: AlignmentRecord,
stranded: Boolean = false): Seq[ReferenceRegion] = {
if (stranded && elem.getReadNegativeStrand != null) {

This comment has been minimized.

Copy link
@fnothaft

fnothaft Jun 24, 2017

Member

Same here, WRT readNegativeStrand == null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.