HBase as a separate repo #1293

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
3 participants
@jpdna
Member

jpdna commented Nov 27, 2016

Requires: https://github.com/jpdna/bdghbase commit 809c554

This PR is a stub for any changes to ADAM to make use of HBase code as a separate repo, and discussion of that approach.

I decided to try splitting out the HBase code into a different repo, bdghbase above.
The motivation I see for this are:

  1. We may have a number of different data backends including HBase - Kudu for one. It may be better not to put these all into the main ADAM repo.

  2. I wasn't able to yet find a way to get "provided" dependency for hbase library code to work, so it will be bloating the compiled hbase module for now , better to keep that out of main ADAM, even if its working fine.

  3. Testing with CI is going to be a pain for hbase, for the moment tests are going to require that an hbase instance is accessible, as I can't seem to get the mock "mini" hbase cluster to work. Thus this complicates testing and CI, so better to keep that complexity in separate repo.

  4. nice to have fast compilation time for the hbase code under development

Choice of separate repo or not seems irrelevant to how adam-shell or downstream applications using ADAM as a library would be built, I can interact with ADAM and hbase from adam-shell just as before using this PR.

  • I plan to add a "hbase" profile that will turn of the hbase dependencies in ADAM, sound good?

  • I will clearly have to publish bdghbase to maven was currently do for bdg-utils and other seperate bdg repos.

  • I'm not sure how to handle the CLI in this instance for hbase. We could include the vcf2Hbase CLI code as it currently is in this PR, and perhaps it would only work in the -hbase profile was turned on. We'd want the command and cli help to also be different based on the profile.
    This doesn't seem ideal to me though - if bdghbase (or bdgkudu) is to be a separate repo, I feel like the CLI code should be in that repo as well, but I am not sure how to best integrate it with the current ADAM CLI, thoughts?

For now I am looking for general comments on this approach. I'll ask later for review of the hbase code, as I am nearly done addressing the comments in: #1246

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Nov 27, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1640/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1293/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 30ccf09 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1293/merge^{commit} # timeout=10Checking out Revision 30ccf09 (origin/pr/1293/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 30ccf09bac3a1b616c0b7eee3f64f657489848e4First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1640/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1293/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 30ccf09 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1293/merge^{commit} # timeout=10Checking out Revision 30ccf09 (origin/pr/1293/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 30ccf09bac3a1b616c0b7eee3f64f657489848e4First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Nov 28, 2016

Member

Extending the ADAM command line interface from an external repository is demonstrated at https://github.com/heuermh/adam-commands. Not necessarily saying that is the right way to go for this, we might need to discuss the overall approach in person.

Member

heuermh commented Nov 28, 2016

Extending the ADAM command line interface from an external repository is demonstrated at https://github.com/heuermh/adam-commands. Not necessarily saying that is the right way to go for this, we might need to discuss the overall approach in person.

@jpdna jpdna closed this Jan 2, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment