Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-1501] Compute coverage using Dataset API. #1528

Merged

Conversation

@fnothaft
Copy link
Member

@fnothaft fnothaft commented May 14, 2017

Resolves #1501. Depends on #1391. Perf numbers forthcoming.

@fnothaft fnothaft added this to the 0.23.0 milestone May 14, 2017
@fnothaft fnothaft requested review from heuermh and akmorrow13 May 14, 2017
@coveralls
Copy link

@coveralls coveralls commented May 14, 2017

Coverage Status

Coverage decreased (-6.4%) to 75.579% when pulling e1d4159 on fnothaft:issues/1501-coverage-dataset into 18191f9 on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented May 14, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2007/
Test PASSed.

@coveralls
Copy link

@coveralls coveralls commented May 15, 2017

Coverage Status

Coverage decreased (-6.4%) to 75.579% when pulling 39ce835 on fnothaft:issues/1501-coverage-dataset into 18191f9 on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented May 15, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2008/
Test PASSed.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented May 15, 2017

This provides a 2x speedup on a high coverage WGS dataset.

@fnothaft fnothaft force-pushed the fnothaft:issues/1501-coverage-dataset branch from 39ce835 to c65f144 May 24, 2017
@coveralls
Copy link

@coveralls coveralls commented May 24, 2017

Coverage Status

Coverage decreased (-7.1%) to 74.939% when pulling c65f144 on fnothaft:issues/1501-coverage-dataset into 2820e94 on bigdatagenomics:master.

@coveralls
Copy link

@coveralls coveralls commented May 24, 2017

Coverage Status

Coverage decreased (-6.5%) to 75.518% when pulling c65f144 on fnothaft:issues/1501-coverage-dataset into 2820e94 on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented May 24, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2049/
Test PASSed.

@fnothaft fnothaft force-pushed the fnothaft:issues/1501-coverage-dataset branch from c65f144 to ce0dcbf Jun 22, 2017
@coveralls
Copy link

@coveralls coveralls commented Jun 22, 2017

Coverage Status

Changes Unknown when pulling ce0dcbf on fnothaft:issues/1501-coverage-dataset into ** on bigdatagenomics:master**.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Jun 22, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2121/
Test PASSed.

@fnothaft fnothaft force-pushed the fnothaft:issues/1501-coverage-dataset branch from ce0dcbf to 8ab6e0a Jun 24, 2017
@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Jun 24, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2146/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1528/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 2279526 # timeout=10Checking out Revision 2279526 (origin/pr/1528/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 2279526 > /home/jenkins/git2/bin/git rev-list 9b78f51ed5925f3542ac6eb5cfe67458e13348c4 # timeout=10Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft fnothaft force-pushed the fnothaft:issues/1501-coverage-dataset branch from 8ab6e0a to 903facc Jul 11, 2017
@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Jul 11, 2017

Rebased.

@fnothaft fnothaft mentioned this pull request Jul 11, 2017
4 of 4 tasks complete
@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Jul 11, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2195/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1528/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains b1b7e9a # timeout=10Checking out Revision b1b7e9a (origin/pr/1528/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f b1b7e9a > /home/jenkins/git2/bin/git rev-list 2279526 # timeout=10Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft fnothaft force-pushed the fnothaft:issues/1501-coverage-dataset branch from 903facc to b885248 Jul 11, 2017
@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Jul 11, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2197/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1528/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 358225c # timeout=10Checking out Revision 358225c (origin/pr/1528/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 358225c > /home/jenkins/git2/bin/git rev-list b1b7e9a # timeout=10Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Jul 11, 2017

Jenkins, test this please.

@coveralls
Copy link

@coveralls coveralls commented Jul 11, 2017

Coverage Status

Coverage decreased (-0.02%) to 83.942% when pulling b885248 on fnothaft:issues/1501-coverage-dataset into 467db1f on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Jul 11, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2199/
Test PASSed.

Copy link
Member

@devin-petersohn devin-petersohn left a comment

A few nits, mostly conciseness. Let me know if there's a specific reason for the things I pointed out.

}
}

private case class AlignmentWindow(contigName: String, start: Long, end: Long) {

This comment has been minimized.

@devin-petersohn

devin-petersohn Jul 11, 2017
Member

What is our policy on brackets for case classes with no body?

This comment has been minimized.

@fnothaft

fnothaft Jul 11, 2017
Author Member

My general policy is to always put brackets on them, because you know that having brackets and an empty body will always be OK, but you could conceive that empty-body/no brackets could get deprecated someday...

This comment has been minimized.

@heuermh

heuermh Jul 11, 2017
Member

+1

I like to add an // empty comment between the squigglies so that it is obvious they are intentionally empty, but that convention isn't used in this code base.


readMapped
}).flatMap(r => {
val t: List[Long] = List.range(r.getStart, r.getEnd)

This comment has been minimized.

@devin-petersohn

devin-petersohn Jul 11, 2017
Member

t -> positions (or something like that)

readMapped
}).flatMap(r => {
val t: List[Long] = List.range(r.getStart, r.getEnd)
t.map(n => (ReferenceRegion(r.getContigName, n, n + 1), 1))

This comment has been minimized.

@devin-petersohn

devin-petersohn Jul 11, 2017
Member

Is ReferencePosition more appropriate here? It is slightly more concise at least.

This comment has been minimized.

@fnothaft

fnothaft Jul 11, 2017
Author Member

+1

.flatMap(w => {
val width = (w.end - w.start).toInt
val buffer = new Array[Coverage](width)
var idx = 0

This comment has been minimized.

@devin-petersohn

devin-petersohn Jul 11, 2017
Member

It would be more concise to do the following:
val positions = Array.range(w.start, w.end)
positions.map(f => Coverage(w.contigname, f, f + 1L, 1.0))

This comment has been minimized.

@fnothaft

fnothaft Jul 11, 2017
Author Member

This is for perf reasons. Makes a small (5%?) perf improvement to write it this way.

@fnothaft fnothaft force-pushed the fnothaft:issues/1501-coverage-dataset branch from b885248 to 56dd89b Jul 11, 2017
@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Jul 11, 2017

Re-rebased and addressed review comments.

@coveralls
Copy link

@coveralls coveralls commented Jul 11, 2017

Coverage Status

Coverage decreased (-0.4%) to 83.664% when pulling 56dd89b on fnothaft:issues/1501-coverage-dataset into 324ae74 on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Jul 11, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2204/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1528/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains d2726cbf553d735190f232d2b754b7247d4d26cf # timeout=10Checking out Revision d2726cbf553d735190f232d2b754b7247d4d26cf (origin/pr/1528/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f d2726cbf553d735190f232d2b754b7247d4d26cf > /home/jenkins/git2/bin/git rev-list 358225c # timeout=10Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.1.0,centosTriggering ADAM-prb ? 2.3.0,2.11,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,2.1.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.1.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,2.1.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,2.1.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@devin-petersohn
Copy link
Member

@devin-petersohn devin-petersohn commented Jul 11, 2017

I am not getting these build issues locally. I'm not sure why it's failing with Jenkins, but not locally.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Jul 11, 2017

@devin-petersohn the jenkins-test script needs to move to Spark 2.1.0 for #1397, so until #1397 merges, I have to manually toggle Jenkins between 2.0.0 and 2.1.0, hence this failure.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Jul 11, 2017

Jenkins, test this please.

@coveralls
Copy link

@coveralls coveralls commented Jul 11, 2017

Coverage Status

Coverage decreased (-0.08%) to 84.015% when pulling 56dd89b on fnothaft:issues/1501-coverage-dataset into 324ae74 on bigdatagenomics:master.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Jul 11, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2205/
Test PASSed.

}
}

private case class AlignmentWindow(contigName: String, start: Long, end: Long) {

This comment has been minimized.

@heuermh

heuermh Jul 11, 2017
Member

+1

I like to add an // empty comment between the squigglies so that it is obvious they are intentionally empty, but that convention isn't used in this code base.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Jul 11, 2017

Ping for merge?

@devin-petersohn devin-petersohn merged commit 238e044 into bigdatagenomics:master Jul 11, 2017
2 of 3 checks passed
2 of 3 checks passed
codacy/pr Not so good... This pull request quality could be better.
Details
coverage/coveralls Coverage decreased (-0.08%) to 84.015%
Details
default Merged build finished.
Details
@devin-petersohn
Copy link
Member

@devin-petersohn devin-petersohn commented Jul 11, 2017

Thanks @fnothaft

@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.