New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADAM-599, 905: Move to bdg-formats:0.7.0 and migrate metadata #906

Merged
merged 2 commits into from Jan 12, 2016

Conversation

Projects
None yet
4 participants
@fnothaft
Member

fnothaft commented Dec 27, 2015

Resolves #599 and #905:

  • Moves to bdg-formats:0.7.0, where the recordGroup metadata fields have been eliminated from the AlignmentRecord schema.
  • Adds code so that loadAlignments always returns Sequence and RecordGroup dictionaries.
  • Supports the loading/storage of Sequence/RecordGroup dictionaries by writing them to disk as Avro files using the Contig and RecordGroupMetadata records from bdg-formats.
  • Scala/javadoc cleanup related to the above changes.
@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Dec 27, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1034/
Test PASSed.

AmplabJenkins commented Dec 27, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1034/
Test PASSed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Dec 29, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1035/
Test PASSed.

AmplabJenkins commented Dec 29, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1035/
Test PASSed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Dec 29, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1038/
Test PASSed.

AmplabJenkins commented Dec 29, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1038/
Test PASSed.

@laserson

This comment has been minimized.

Show comment
Hide comment
@laserson

laserson Jan 7, 2016

Contributor

I may be a little out of context, since I've been offline for a month. I tend to be on the side of doing less magic for the user. Will returning such a tuple turn out to be a hassle if a user does some more exotic things other than just vanilla genome sequencing? Will there be situations where the additional non-read objects will be dummies? I would also support avoiding a tuple and replacing with a named type in case we go down that path.

Contributor

laserson commented Jan 7, 2016

I may be a little out of context, since I've been offline for a month. I tend to be on the side of doing less magic for the user. Will returning such a tuple turn out to be a hassle if a user does some more exotic things other than just vanilla genome sequencing? Will there be situations where the additional non-read objects will be dummies? I would also support avoiding a tuple and replacing with a named type in case we go down that path.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jan 11, 2016

Member

OK, rebased and added code to address the tuple method signature in a new commit (999f48f).

Member

fnothaft commented Jan 11, 2016

OK, rebased and added code to address the tuple method signature in a new commit (999f48f).

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jan 11, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1045/

Build result: FAILURE

GitHub pull request #906 of commit 999f48f automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/906/merge^{commit} # timeout=10 > git branch -a --contains 5561a72212c3b57bf6e1ac28ce69c5aa4c16f1c7 # timeout=10 > git rev-parse remotes/origin/pr/906/merge^{commit} # timeout=10Checking out Revision 5561a72212c3b57bf6e1ac28ce69c5aa4c16f1c7 (origin/pr/906/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 5561a72212c3b57bf6e1ac28ce69c5aa4c16f1c7First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented Jan 11, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1045/

Build result: FAILURE

GitHub pull request #906 of commit 999f48f automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/906/merge^{commit} # timeout=10 > git branch -a --contains 5561a72212c3b57bf6e1ac28ce69c5aa4c16f1c7 # timeout=10 > git rev-parse remotes/origin/pr/906/merge^{commit} # timeout=10Checking out Revision 5561a72212c3b57bf6e1ac28ce69c5aa4c16f1c7 (origin/pr/906/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 5561a72212c3b57bf6e1ac28ce69c5aa4c16f1c7First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jan 11, 2016

Member

@fnothaft thank you for the update! I don't see the source for GenomicRDD in the diff, did I miss that?

Member

heuermh commented Jan 11, 2016

@fnothaft thank you for the update! I don't see the source for GenomicRDD in the diff, did I miss that?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jan 12, 2016

Member

Added the missing file and squashed down to two commits (one per issue).

Member

fnothaft commented Jan 12, 2016

Added the missing file and squashed down to two commits (one per issue).

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jan 12, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1046/
Test PASSed.

AmplabJenkins commented Jan 12, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1046/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jan 12, 2016

Member

Even though this makes a lot of changes, some of which might be binary-incompatible (when it comes to the implicit stuff with scala I'm not too sure), my downstream projects still work.

+1 from me after minor doc fixes mentioned above.

Member

heuermh commented Jan 12, 2016

Even though this makes a lot of changes, some of which might be binary-incompatible (when it comes to the implicit stuff with scala I'm not too sure), my downstream projects still work.

+1 from me after minor doc fixes mentioned above.

[ADAM-599] Eliminate shuffle issues by writing metadata to avro files.
Resolves #599. Since we have added the RecordGroupMetadata fields in
bdg-formats:0.7.0, we can read/write our metadata as separate Avro files. We
process these files when loading/writing the Parquet files where the alignment
data is stored. This allows us to both eliminate the bulky metadata that we are
currently storing in the AlignmentRecord, while maintaining the Sequence and
RecordGroup dictionaries that we need to keep around.
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jan 12, 2016

Member

@heuermh fixed the doc issues.

Member

fnothaft commented Jan 12, 2016

@heuermh fixed the doc issues.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jan 12, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1049/
Test PASSed.

AmplabJenkins commented Jan 12, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1049/
Test PASSed.

heuermh added a commit that referenced this pull request Jan 12, 2016

Merge pull request #906 from fnothaft/eliminate-metadata
ADAM-599, 905: Move to bdg-formats:0.7.0 and migrate metadata

@heuermh heuermh merged commit 4415b04 into bigdatagenomics:master Jan 12, 2016

1 check passed

default Merged build finished.
Details
@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jan 12, 2016

Member

Thank you, @fnothaft!

Member

heuermh commented Jan 12, 2016

Thank you, @fnothaft!

fnothaft added a commit to fnothaft/adam that referenced this pull request Feb 9, 2016

[ADAM-934] Properly handle unset library name during duplicate marking
Resolves #934. If the library name for a read group is not set, we will use a
null string during the groupBy. This is equivalent to our pre-#906
implementation. Additionally, this commit adds logging that prints a warning
message for the user if there are read groups whose library ID is not set.

heuermh added a commit that referenced this pull request Feb 13, 2016

[ADAM-934] Properly handle unset library name during duplicate marking
Resolves #934. If the library name for a read group is not set, we will use a
null string during the groupBy. This is equivalent to our pre-#906
implementation. Additionally, this commit adds logging that prints a warning
message for the user if there are read groups whose library ID is not set.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment