New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-1630] Overhauled docs introduction and added architecture section. #1653

Merged
merged 3 commits into from Dec 5, 2017

Conversation

Projects
4 participants
@fnothaft
Member

fnothaft commented Aug 2, 2017

WIP towards resolving #1630, #1632, #1633, #1662. Rewrote the introduction to focus on what ADAM provides and the ADAM echosystem. Adds an architecture section that talks about ADAM's stack model and schemas, and which introduces the ADAMContext and GenomicRDDs as implementations of the evidence access layer of the stack.

TODO:

  • Explanation of metadata in GenomicRDD
  • Diagram depicting flow of data from disk into GenomicRDD types back out to disk I think this is unnecessary.
  • A discussion of "why Parquet" and "why not BAM/VCF/etc" --> resolved by #1772

@fnothaft fnothaft added this to the 0.23.0 milestone Aug 2, 2017

@fnothaft fnothaft requested review from devin-petersohn and heuermh Aug 2, 2017

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Aug 2, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2306/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1653/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 3ab700e # timeout=10Checking out Revision 3ab700e (origin/pr/1653/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 3ab700ecb330c84ffb85a6895b2438a351d6008bFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.1.0,centosTriggering ADAM-prb ? 2.3.0,2.11,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.1.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.1.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.1.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.1.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented Aug 2, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2306/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1653/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 3ab700e # timeout=10Checking out Revision 3ab700e (origin/pr/1653/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 3ab700ecb330c84ffb85a6895b2438a351d6008bFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.1.0,centosTriggering ADAM-prb ? 2.3.0,2.11,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.1.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.1.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.1.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.1.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh

Very good summary sections.

Show outdated Hide outdated docs/source/01_intro.md Outdated
Show outdated Hide outdated docs/source/02_architecture.md Outdated
Show outdated Hide outdated docs/source/02_architecture.md Outdated
Show outdated Hide outdated docs/source/02_architecture.md Outdated
with a parallel collection of genomic data. In ADAM, we implement this layer
through the [GenomicRDD](#genomic-rdd) classes. This layer presents users
with a view of the metadata associated with a collection of genomic data,
and APIs for [transforming](#transforming) and [joining](#join) genomic data.

This comment has been minimized.

@heuermh

heuermh Aug 2, 2017

Member

👍

@heuermh

heuermh Aug 2, 2017

Member

👍

Show outdated Hide outdated docs/source/02_architecture.md Outdated
Show outdated Hide outdated docs/source/02_architecture.md Outdated
the bdg-formats schemas are nullable, and the schemas themselves do not contain
invariants around valid values for a field. Instead, we validate data on ingress
and egress to/from a conventional genomic file format. This allows users to take
advantage of features such as field projection, which can improve the

This comment has been minimized.

@heuermh

heuermh Aug 2, 2017

Member

could field projection be a link here?

@heuermh

heuermh Aug 2, 2017

Member

could field projection be a link here?

This comment has been minimized.

@fnothaft

fnothaft Dec 4, 2017

Member

What did you want to link to?

@fnothaft

fnothaft Dec 4, 2017

Member

What did you want to link to?

This comment has been minimized.

@heuermh

heuermh Dec 4, 2017

Member

There was a bit about projections in the old README, it is gone now, no worry

@heuermh

heuermh Dec 4, 2017

Member

There was a bit about projections in the old README, it is gone now, no worry

GenomicRDD is enriched with genomics-specific metadata such as computational
lineage and sample metadata, and optimized genomics-specific query patterns
such as [region joins](#join) and the [auto-parallelizing pipe API](#pipes)
for running legacy tools using Apache Spark.

This comment has been minimized.

@heuermh

heuermh Aug 2, 2017

Member

👍

@heuermh

heuermh Aug 2, 2017

Member

👍

@devin-petersohn

A couple of clarifying questions. Overall, looks good!

@@ -373,6 +371,10 @@ all are called in a similar way:
* Inner join and group by right
* Right outer join and group by right
A subset of these joins are depicted in Figure 2 below.
![Joins Available](source/img/join_examples.png)

This comment has been minimized.

@devin-petersohn

devin-petersohn Aug 3, 2017

Member

Is this link accurate? I found that I had to use the relative path from the doc.

@devin-petersohn

devin-petersohn Aug 3, 2017

Member

Is this link accurate? I found that I had to use the relative path from the doc.

This comment has been minimized.

@fnothaft

fnothaft Aug 3, 2017

Member

This worked OK for me when running ./build.sh

@fnothaft

fnothaft Aug 3, 2017

Member

This worked OK for me when running ./build.sh

Show outdated Hide outdated docs/source/01_intro.md Outdated
@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 12, 2017

Member

See pull request fnothaft#19 for updated README.md.

Member

heuermh commented Sep 12, 2017

See pull request fnothaft#19 for updated README.md.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 17, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2431/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse 188836a^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 188836a # timeout=10Checking out Revision 188836a (origin/pr/1653/head) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 188836a9bbfe82f757a2bfdcebe104cb2a20f782First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.2,2.11,1.6.3,centosTriggering ADAM-prb ? 2.7.3,2.11,1.6.3,centosTriggering ADAM-prb ? 2.7.3,2.11,2.2.0,centosTriggering ADAM-prb ? 2.7.3,2.10,1.6.3,centosTriggering ADAM-prb ? 2.6.2,2.10,1.6.3,centosTriggering ADAM-prb ? 2.6.2,2.10,2.2.0,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.0,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.0,centosADAM-prb ? 2.6.2,2.11,1.6.3,centos completed with result SUCCESSADAM-prb ? 2.7.3,2.11,1.6.3,centos completed with result SUCCESSADAM-prb ? 2.7.3,2.11,2.2.0,centos completed with result SUCCESSADAM-prb ? 2.7.3,2.10,1.6.3,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,1.6.3,centos completed with result SUCCESSADAM-prb ? 2.6.2,2.10,2.2.0,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.0,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.0,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented Oct 17, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2431/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse 188836a^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 188836a # timeout=10Checking out Revision 188836a (origin/pr/1653/head) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 188836a9bbfe82f757a2bfdcebe104cb2a20f782First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.2,2.11,1.6.3,centosTriggering ADAM-prb ? 2.7.3,2.11,1.6.3,centosTriggering ADAM-prb ? 2.7.3,2.11,2.2.0,centosTriggering ADAM-prb ? 2.7.3,2.10,1.6.3,centosTriggering ADAM-prb ? 2.6.2,2.10,1.6.3,centosTriggering ADAM-prb ? 2.6.2,2.10,2.2.0,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.0,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.0,centosADAM-prb ? 2.6.2,2.11,1.6.3,centos completed with result SUCCESSADAM-prb ? 2.7.3,2.11,1.6.3,centos completed with result SUCCESSADAM-prb ? 2.7.3,2.11,2.2.0,centos completed with result SUCCESSADAM-prb ? 2.7.3,2.10,1.6.3,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,1.6.3,centos completed with result SUCCESSADAM-prb ? 2.6.2,2.10,2.2.0,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.0,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.0,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

[ADAM-1630] Overhauled docs introduction and added architecture section.
Resolves #1630, #1632, #1633. Rewrote the introduction to focus on what ADAM
provides and the ADAM echosystem. Adds an architecture section that talks about
ADAM's stack model and schemas, and which introduces the ADAMContext and
GenomicRDDs as implementations of the evidence access layer of the stack.
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Dec 4, 2017

Member

This is good to go from my side.

Member

fnothaft commented Dec 4, 2017

This is good to go from my side.

Show outdated Hide outdated README.md Outdated
[Avro]: http://avro.apache.org
[Spark]: https://spark.apache.org/
[Parquet]: https://parquet.apache.org/
[releases]: https://github.com/bigdatagenomics/adam/releases
# Citing ADAM

This comment has been minimized.

@heuermh

heuermh Dec 4, 2017

Member

The homebrew section in README.md (which doesn't show up here in the diff) should probably be removed, I'll replace it later with new homebrew and conda sections. Homebrew might take some work going forward, in that they go to JDK 9 by default which will break apache-spark and other upstream deps.

@heuermh

heuermh Dec 4, 2017

Member

The homebrew section in README.md (which doesn't show up here in the diff) should probably be removed, I'll replace it later with new homebrew and conda sections. Homebrew might take some work going forward, in that they go to JDK 9 by default which will break apache-spark and other upstream deps.

This comment has been minimized.

@fnothaft

fnothaft Dec 4, 2017

Member

JDK9. Nice. Aggressive.

@fnothaft

fnothaft Dec 4, 2017

Member

JDK9. Nice. Aggressive.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Dec 4, 2017

Member

@heuermh I've cleaned the README nits.

Member

fnothaft commented Dec 4, 2017

@heuermh I've cleaned the README nits.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Dec 4, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2498/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1653/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 6249b76 # timeout=10Checking out Revision 6249b76 (origin/pr/1653/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 6249b76ee94039e8f92333a1538786f16dc9cd69First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.2,2.11,1.6.3,centosTriggering ADAM-prb ? 2.7.3,2.10,1.6.3,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.0,centosTriggering ADAM-prb ? 2.7.3,2.11,1.6.3,centosTriggering ADAM-prb ? 2.6.2,2.10,2.2.0,centosTriggering ADAM-prb ? 2.6.2,2.10,1.6.3,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.0,centosTriggering ADAM-prb ? 2.7.3,2.11,2.2.0,centosADAM-prb ? 2.6.2,2.11,1.6.3,centos completed with result SUCCESSADAM-prb ? 2.7.3,2.10,1.6.3,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.0,centos completed with result FAILUREADAM-prb ? 2.7.3,2.11,1.6.3,centos completed with result SUCCESSADAM-prb ? 2.6.2,2.10,2.2.0,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,1.6.3,centos completed with result SUCCESSADAM-prb ? 2.6.2,2.11,2.2.0,centos completed with result SUCCESSADAM-prb ? 2.7.3,2.11,2.2.0,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented Dec 4, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2498/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1653/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 6249b76 # timeout=10Checking out Revision 6249b76 (origin/pr/1653/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 6249b76ee94039e8f92333a1538786f16dc9cd69First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.2,2.11,1.6.3,centosTriggering ADAM-prb ? 2.7.3,2.10,1.6.3,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.0,centosTriggering ADAM-prb ? 2.7.3,2.11,1.6.3,centosTriggering ADAM-prb ? 2.6.2,2.10,2.2.0,centosTriggering ADAM-prb ? 2.6.2,2.10,1.6.3,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.0,centosTriggering ADAM-prb ? 2.7.3,2.11,2.2.0,centosADAM-prb ? 2.6.2,2.11,1.6.3,centos completed with result SUCCESSADAM-prb ? 2.7.3,2.10,1.6.3,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.0,centos completed with result FAILUREADAM-prb ? 2.7.3,2.11,1.6.3,centos completed with result SUCCESSADAM-prb ? 2.6.2,2.10,2.2.0,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,1.6.3,centos completed with result SUCCESSADAM-prb ? 2.6.2,2.11,2.2.0,centos completed with result SUCCESSADAM-prb ? 2.7.3,2.11,2.2.0,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Dec 4, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2499/
Test PASSed.

AmplabJenkins commented Dec 4, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2499/
Test PASSed.

@heuermh

heuermh approved these changes Dec 5, 2017

@heuermh heuermh merged commit 34b6bec into bigdatagenomics:master Dec 5, 2017

2 checks passed

Codacy/PR Quality Review Good work! A positive pull request.
Details
default Merged build finished.
Details
@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Dec 5, 2017

Member

Thank you, @fnothaft!

Member

heuermh commented Dec 5, 2017

Thank you, @fnothaft!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment