Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-1141] Add support for saving/loading AlignmentRecords to/from CRAM. #1145

Merged
merged 1 commit into from
Sep 13, 2016

Conversation

fnothaft
Copy link
Member

@fnothaft fnothaft commented Sep 1, 2016

Resolves #1141. Changes the signature of AlignmentRecordRDD.saveAsSAM to take an Option[SAMFormat] parameter, since asSam is now no longer a binary choice.

-1 for now, as I need to make a pass back and write some more tests. Depends on #1104, #1117.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1451/

Build result: FAILURE

GitHub pull request #1145 of commit e18db5a automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1145/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains f5465f8 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1145/merge^{commit} # timeout=10Checking out Revision f5465f8 (origin/pr/1145/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f f5465f81f3d35a033c626b652ccd157936e8f3d0First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

* @param asSingleFile If true, saves output as a single file.
* @param isSorted If the output is sorted, this will modify the header.
*/
def saveAsSam(
filePath: String,
asSam: Boolean = true,
asType: Option[SAMFormat] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a generic term for these formats? Otherwise I think the optional asType is reasonable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1478/
Test PASSed.

@heuermh
Copy link
Member

heuermh commented Sep 12, 2016

@fnothaft did you want to take off the -1 here?

What is here LGTM, though I would like to see some CRAM-specific unit tests and a small CRAM test file to read.

@fnothaft
Copy link
Member Author

Nah this is still -1 pending the CRAM specific tests.

@fnothaft
Copy link
Member Author

fnothaft commented Sep 13, 2016

OK! Removing my -1 here. I've added a commit (b2d40c7) with CRAM specific tests. Can I get a review pass of said commit? Once it looks good to everyone, I'll squash down and we can merge this.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1488/
Test PASSed.

val readsA = rddA.rdd.collect()
val readsB = rddB.rdd.collect()

readsA.indices.foreach {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... this may be a more robust way to validate than the zip I've been using (with various problems) in FeatureRDDSuite

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two are equivalent, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the zip fails inconsistently for me with SparkException: Can only zip RDDs with same number of elements in each partition. Sometimes less clever is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I typically do a collect before the zip, which eliminates said issue (and we need to collect to use asserts anyways).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heuermh
Copy link
Member

heuermh commented Sep 13, 2016

LGTM

…RAM.

Resolves bigdatagenomics#1141. Changes the signature of `AlignmentRecordRDD.saveAsSAM` to take
an `Option[SAMFormat]` parameter, since `asSam` is now no longer a binary
choice.
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1489/
Test PASSed.

@heuermh heuermh merged commit 0b7e03e into bigdatagenomics:master Sep 13, 2016
@heuermh
Copy link
Member

heuermh commented Sep 13, 2016

Merged as commit 0b7e03e.

Thank you, @fnothaft!

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants