Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added additional arguments to GenomicRDD.pipe() #1758

Open
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@gunjanbaid
Copy link
Contributor

gunjanbaid commented Oct 11, 2017

Use case: when input RDD partitioning needs to be preserved, in order for the piped command to work correctly.

This is needed for piping out to MACS2, which needs to process all reads from a particular chromosome together. In addition, when there is not enough data for a particular chromosome, MACS2 needs to process multiple chromosomes together.

@fnothaft
Copy link
Member

fnothaft left a comment

Few small nits, otherwise LGTM!

@@ -484,7 +486,10 @@ trait GenomicRDD[T, U <: GenomicRDD[T, U]] extends Logging {
val bins = GenomeBins(totalLength / rdd.partitions.size, seqLengths)

// if the input rdd is mapped, then we need to repartition
val partitionedRdd = if (sequences.records.size > 0) {
val partitionedRdd = if (sequences.records.size == 0 ||

This comment has been minimized.

Copy link
@fnothaft

fnothaft Oct 11, 2017

Member

Prefer sequences.records.isEmpty

@@ -484,7 +486,10 @@ trait GenomicRDD[T, U <: GenomicRDD[T, U]] extends Logging {
val bins = GenomeBins(totalLength / rdd.partitions.size, seqLengths)

// if the input rdd is mapped, then we need to repartition
val partitionedRdd = if (sequences.records.size > 0) {
val partitionedRdd = if (sequences.records.size == 0 ||
repartitionInput == false) {

This comment has been minimized.

Copy link
@fnothaft

fnothaft Oct 11, 2017

Member

Prefer !repartitionInput

@@ -561,7 +564,8 @@ trait GenomicRDD[T, U <: GenomicRDD[T, U]] extends Logging {

// if the original rdd was aligned and the final rdd is aligned, then we must filter
if (newRdd.sequences.isEmpty ||
sequences.isEmpty) {
sequences.isEmpty ||
filterOutput == false) {

This comment has been minimized.

Copy link
@fnothaft

fnothaft Oct 11, 2017

Member

Prefer !filterOutput.

@AmplabJenkins

This comment has been minimized.

Copy link

AmplabJenkins commented Oct 11, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2422/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1758/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 9091f1917f15cf09849f2507cb19c49f5b65446f # timeout=10Checking out Revision 9091f1917f15cf09849f2507cb19c49f5b65446f (origin/pr/1758/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 9091f1917f15cf09849f2507cb19c49f5b65446fFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.2,2.11,1.6.3,centosTriggering ADAM-prb ? 2.7.3,2.11,1.6.3,centosTriggering ADAM-prb ? 2.7.3,2.11,2.2.0,centosTriggering ADAM-prb ? 2.7.3,2.10,1.6.3,centosTriggering ADAM-prb ? 2.6.2,2.10,1.6.3,centosTriggering ADAM-prb ? 2.6.2,2.10,2.2.0,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.0,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.0,centosADAM-prb ? 2.6.2,2.11,1.6.3,centos completed with result FAILUREADAM-prb ? 2.7.3,2.11,1.6.3,centos completed with result FAILUREADAM-prb ? 2.7.3,2.11,2.2.0,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,1.6.3,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,1.6.3,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,2.2.0,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.0,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins

This comment has been minimized.

Copy link

AmplabJenkins commented Oct 11, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2423/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1758/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains dcf0219a07b7d3445bb18fd6d53efdd5dfb8ff2b # timeout=10Checking out Revision dcf0219a07b7d3445bb18fd6d53efdd5dfb8ff2b (origin/pr/1758/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f dcf0219a07b7d3445bb18fd6d53efdd5dfb8ff2bFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.2,2.11,1.6.3,centosTriggering ADAM-prb ? 2.7.3,2.11,1.6.3,centosTriggering ADAM-prb ? 2.7.3,2.11,2.2.0,centosTriggering ADAM-prb ? 2.7.3,2.10,1.6.3,centosTriggering ADAM-prb ? 2.6.2,2.10,1.6.3,centosTriggering ADAM-prb ? 2.6.2,2.10,2.2.0,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.0,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.0,centosADAM-prb ? 2.6.2,2.11,1.6.3,centos completed with result FAILUREADAM-prb ? 2.7.3,2.11,1.6.3,centos completed with result FAILUREADAM-prb ? 2.7.3,2.11,2.2.0,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,1.6.3,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,1.6.3,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,2.2.0,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.0,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@gunjanbaid

This comment has been minimized.

Copy link
Contributor Author

gunjanbaid commented Oct 11, 2017

@fnothaft Did I miss something? This page shows 0 failures, not sure why the above comment reports a failure.

@fnothaft

This comment has been minimized.

Copy link
Member

fnothaft commented Oct 11, 2017

@gunjanbaid it looks like you need to run ./scripts/format-sources

@AmplabJenkins

This comment has been minimized.

Copy link

AmplabJenkins commented Oct 11, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2424/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1758/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains af7aae1 # timeout=10Checking out Revision af7aae1 (origin/pr/1758/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f af7aae1 > /home/jenkins/git2/bin/git rev-list dcf0219a07b7d3445bb18fd6d53efdd5dfb8ff2b # timeout=10Triggering ADAM-prb ? 2.6.2,2.11,1.6.3,centosTriggering ADAM-prb ? 2.7.3,2.11,1.6.3,centosTriggering ADAM-prb ? 2.7.3,2.11,2.2.0,centosTriggering ADAM-prb ? 2.7.3,2.10,1.6.3,centosTriggering ADAM-prb ? 2.6.2,2.10,1.6.3,centosTriggering ADAM-prb ? 2.6.2,2.10,2.2.0,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.0,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.0,centosADAM-prb ? 2.6.2,2.11,1.6.3,centos completed with result FAILUREADAM-prb ? 2.7.3,2.11,1.6.3,centos completed with result FAILUREADAM-prb ? 2.7.3,2.11,2.2.0,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,1.6.3,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,1.6.3,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,2.2.0,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.0,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@gunjanbaid

This comment has been minimized.

Copy link
Contributor Author

gunjanbaid commented Oct 11, 2017

@fnothaft I ran ./scripts/format-sources, pushed updated formatting, and ran through the steps listed in CONTRIBUTING.md. Not sure why Jenkins is still failing. If there's a particular page in the test results I am missing and should be looking at, please let me know.

@heuermh

This comment has been minimized.

Copy link
Member

heuermh commented Oct 11, 2017

The Jenkins error message might be misleading, it is complaining about pom files but I believe the issue is still formatting with the file GenomicRDD.scala

++ git status --porcelain
+ test -n ' M adam-core/src/main/scala/org/bdgenomics/adam/rdd/GenomicRDD.scala'

Personally I don't run ./scripts/format-sources separately; if you run the maven build from the command line it formats the source code for you.

@heuermh
Copy link
Member

heuermh left a comment

Should this pull request also update the python and R APIs?

@gunjanbaid

This comment has been minimized.

Copy link
Contributor Author

gunjanbaid commented Oct 12, 2017

@heuermh I ran the maven build again, there were no changes to the formatting in GenomicRDD.scala. However, there were changes to four different POM files, though I'm not totally sure why. I committed those changes to see if that fixes the build error.

I can add the updates to the Python/R API as well. I'll put this PR on hold until those changes have been added in.

@AmplabJenkins

This comment has been minimized.

Copy link

AmplabJenkins commented Oct 12, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2426/
Test PASSed.

@@ -33,7 +33,7 @@
As explained here: http://stackoverflow.com/questions/1660441/java-flag-to-enable-extended-serialization-debugging-info
The second option allows us better debugging for serialization-based errors.
-->
<argLine>-Xmx1024m -Dsun.io.serialization.extendedDebugInfo=true</argLine>
<argLine>-Xmx1024m -Dsun.io.serialization.extendedDebugInfo=true -Djava.io.tmpdir=/tmp/adamTestMvnNTasIKA</argLine>

This comment has been minimized.

Copy link
@akmorrow13

akmorrow13 Oct 12, 2017

Contributor

remove these

@heuermh heuermh referenced this pull request Feb 14, 2018

Open

Allow users to configure partitioning for pipe APIs #1863

0 of 3 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.