New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBase genotypes backend -revised #1335

Closed
wants to merge 5 commits into
from

Conversation

Projects
None yet
5 participants
@jpdna
Member

jpdna commented Jan 3, 2017

supersedes #1246
and addresses comments from there.

Please make a review pass on this.

I'll add instructions soon as to how to run on cluster for reviewers to try live.

jpdna added some commits Jul 23, 2016

Update HBase dependencies in POM
Added HBaseFunctions

Added VCF genotype save and load Hbase functions

Added saveHBaseGenotypesSingleSample packing multiple genotypes at same position and samle allele into same hbase value

Added multisample capable loadHBaseGenotypes()

Removed commented-out earlier versions of save and load genotypes

removed more dead code

Clean up formatting - limit line length

Added saveHbaseSampleMetadata function

Added save and load SequenceDictionary functions to HBaseFunctions

Added createHbaseGenotypeTable

Adding loadGenotypeRddFromHBase

in progress updates to multi-sample hbase save

multi sample VCF save and load now working

Added repartitioning parameter tp hbase genotype load

Added comments identifying the public api vs helper functions

COB Aug 25

Added genomic start and stop parameters to loadGenotypesFromHBaseToGenotypeRDD

Added boolean saveSequenceDictionary toggle parameter to saveVariantContextRddToHBase

fixed start, stop null ptr exeception

first steps in adding hbaseToVariantcontextRDD

Changed region query to use ADAM ReferenceRegion

Added custom HBaseEncoder1 save function

Added custom Encoder1 Hbase loader

Added Encoder1 hbase variant context loader

Working - before rowkey int

Changed end in key to be size, added data block encoding in create table

Added create table splits

Removed dead code of encoder2

Added option to repartion vcrdd before saving to HBase

Added bulk load save to Hbase option

changed to cdh hbase api depedencies in POM

allow sample name file list as input to load functions

made sample_ids lis parameter  in load optional

Added deleteSamplesFromHBase function

Fixed bulk delete and made loadVariantContext work even when requested samplids are missing

Removed code from  failed version of sample delete function

Moved delete function up with test of genotype code

Fixed errors after rebase against master

small formatting cleanup

first pass hbase cli demo

second pass hbase cli demo

remove saveSeqDict check

add seq dict id to cli

import clean up

removed undeeded demo and temp.js code

Ran ./scripts/format-source due to build failure on Jenkins

changed Hbase dependencies to provided

addressed some review issues

Changed hbases dependencies back to NOT being provided

fixed None.get

first step key Strategy pattern

second step key Strategy pattern

more cleanup from review

made private functions package private for hbase

Factored out hbase connection

Factored out hbase connection - fix1

Factored out hbase connection - cleanup

Removed the non-bulk load hbase function

step one adding tests

first step implement DAO

step2 DAO

step3 DAO

Added first loadHBase Test

Added some javadoc

use dao everywhere

added more javadoc

step 1 in adding save hbase test

step 2 in adding save hbase test

clean up hbase doc

add more assertions to hbase load test

improved POM and imports in HBaseFunctions

fixed more PR suggestions

@jpdna jpdna referenced this pull request Jan 3, 2017

Closed

HBase backend for Genotypes #1246

0 of 3 tasks complete
@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jan 3, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1714/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1335/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 5903417 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1335/merge^{commit} # timeout=10Checking out Revision 5903417 (origin/pr/1335/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 5903417ce134e3726b6d0df89b1a4eeab3394a3dFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1714/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1335/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 5903417 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1335/merge^{commit} # timeout=10Checking out Revision 5903417 (origin/pr/1335/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 5903417ce134e3726b6d0df89b1a4eeab3394a3dFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft

Looks really good, definitely heading in the right direction!

Show outdated Hide outdated adam-cli/src/main/scala/org/bdgenomics/adam/cli/Vcf2HBase.scala
package org.bdgenomics.adam.cli
import org.apache.spark.SparkContext

This comment has been minimized.

@fnothaft

fnothaft Jan 5, 2017

Member

Nit: remove whitespace.

@fnothaft

fnothaft Jan 5, 2017

Member

Nit: remove whitespace.

Show outdated Hide outdated adam-cli/src/main/scala/org/bdgenomics/adam/cli/Vcf2HBase.scala
@Argument(required = true, metaVar = "VCF", usage = "The VCF file to convert", index = 0)
var vcfPath: String = _
@Argument(required = true, metaVar = "-hbase_table", usage = "HBase table name in which to load VCF file")

This comment has been minimized.

@fnothaft

fnothaft Jan 5, 2017

Member

For arguments, should be allcaps, no -, i.e., HBASE_TABLE

@fnothaft

fnothaft Jan 5, 2017

Member

For arguments, should be allcaps, no -, i.e., HBASE_TABLE

Show outdated Hide outdated adam-cli/src/main/scala/org/bdgenomics/adam/cli/Vcf2HBase.scala
@Argument(required = true, metaVar = "-hbase_table", usage = "HBase table name in which to load VCF file")
var hbaseTable: String = null
@Args4jOption(required = true, name = "-seq_dict_id", usage = "User defined name to apply to the sequence dictionary create from this VCF, or name of existing sequence dictionary to be used")

This comment has been minimized.

@fnothaft

fnothaft Jan 5, 2017

Member

Ditto: SEQ_DICT_ID

@fnothaft

fnothaft Jan 5, 2017

Member

Ditto: SEQ_DICT_ID

Show outdated Hide outdated adam-cli/src/main/scala/org/bdgenomics/adam/cli/Vcf2HBase.scala
@Argument(required = true, metaVar = "-staging_folder", usage = "Location for temporary files during bulk load")
var stagingFolder: String = null

This comment has been minimized.

@fnothaft

fnothaft Jan 5, 2017

Member

Nit: remove whitespace.

@fnothaft

fnothaft Jan 5, 2017

Member

Nit: remove whitespace.

Show outdated Hide outdated adam-cli/src/main/scala/org/bdgenomics/adam/cli/Vcf2HBase.scala
}
class Vcf2HBase(protected val args: Vcf2HBaseArgs) extends BDGSparkCommand[Vcf2HBaseArgs] with Logging {
val companion = Transform

This comment has been minimized.

@fnothaft

fnothaft Jan 5, 2017

Member

wrong companion object

@fnothaft

fnothaft Jan 5, 2017

Member

wrong companion object

Show outdated Hide outdated adam-core/src/test/scala/org/bdgenomics/adam/hbase/HBaseSuite.scala
@@ -0,0 +1,111 @@
package org.bdgenomics.adam.hbase
import org.bdgenomics.adam.util.ADAMFunSuite

This comment has been minimized.

@fnothaft

fnothaft Jan 5, 2017

Member

Nit: sort imports.

@fnothaft

fnothaft Jan 5, 2017

Member

Nit: sort imports.

Show outdated Hide outdated adam-core/src/test/scala/org/bdgenomics/adam/hbase/HBaseSuite.scala
assert(genoData.getAlleles.get(1) === GenotypeAllele.ALT)
}
sparkAfter("Done with HBase Tests") {

This comment has been minimized.

@fnothaft

fnothaft Jan 5, 2017

Member

Do we need this?

@fnothaft

fnothaft Jan 5, 2017

Member

Do we need this?

Show outdated Hide outdated pom.xml
@@ -1,5 +1,15 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

This comment has been minimized.

@fnothaft

fnothaft Jan 5, 2017

Member

Nit: extra whitespace before/after.

@fnothaft

fnothaft Jan 5, 2017

Member

Nit: extra whitespace before/after.

Show outdated Hide outdated pom.xml
<version>${spark.version}</version>
<scope>provided</scope>
<exclusions>
<exclusion>

This comment has been minimized.

@fnothaft

fnothaft Jan 5, 2017

Member

We shouldn't need these exclusions, IIRC.

@fnothaft

fnothaft Jan 5, 2017

Member

We shouldn't need these exclusions, IIRC.

Show outdated Hide outdated pom.xml
</dependencies>
</dependencyManagement>

This comment has been minimized.

@fnothaft

fnothaft Jan 5, 2017

Member

Nit: keep whitespace.

@fnothaft

fnothaft Jan 5, 2017

Member

Nit: keep whitespace.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jan 11, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1734/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1335/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains b96c96f # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1335/merge^{commit} # timeout=10Checking out Revision b96c96f (origin/pr/1335/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f b96c96f78dd117968ba25dfe2044613e7dd10ce4First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1734/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1335/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains b96c96f # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1335/merge^{commit} # timeout=10Checking out Revision b96c96f (origin/pr/1335/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f b96c96f78dd117968ba25dfe2044613e7dd10ce4First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Jan 12, 2017

Member

This branch at 0ab45cd builds and passes tests locally for me after coning from my repo.
Any ideas on why it fails when running on Jenkins? I did run the format script.

Member

jpdna commented Jan 12, 2017

This branch at 0ab45cd builds and passes tests locally for me after coning from my repo.
Any ideas on why it fails when running on Jenkins? I did run the format script.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Jan 12, 2017

Member

@heuermh suggests this could be a JDK version issue given the byte level comparisons I am making.

I am running java 1.8.0_111 loaclly
and Jenkins is running: JDK_7u60

Do we intended / need to not use java 8 for Jenkins?

Member

jpdna commented Jan 12, 2017

@heuermh suggests this could be a JDK version issue given the byte level comparisons I am making.

I am running java 1.8.0_111 loaclly
and Jenkins is running: JDK_7u60

Do we intended / need to not use java 8 for Jenkins?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jan 12, 2017

Member

Jenkins must be using JDK8, since our build requires it. I don't know the exact version though.

Member

heuermh commented Jan 12, 2017

Jenkins must be using JDK8, since our build requires it. I don't know the exact version though.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Jan 12, 2017

Member

what's with then the
JAVA_HOME /home/jenkins/tools/hudson.model.JDK/JDK_7u60
I see here:
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/1734/injectedEnvVars/

Member

jpdna commented Jan 12, 2017

what's with then the
JAVA_HOME /home/jenkins/tools/hudson.model.JDK/JDK_7u60
I see here:
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/1734/injectedEnvVars/

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jan 12, 2017

Member

There's a long boring story to why the JAVA_HOME values are configured the way they are, which mostly has to do with Jenkins specific oddities as @shaneknapp can attest to. If you grep later in the build logs for export JAVA_HOME, you'll see that we do indeed set the Java version to jdk1.8.0_60. This happens later in the build than the environment variable injection stage.

Member

fnothaft commented Jan 12, 2017

There's a long boring story to why the JAVA_HOME values are configured the way they are, which mostly has to do with Jenkins specific oddities as @shaneknapp can attest to. If you grep later in the build logs for export JAVA_HOME, you'll see that we do indeed set the Java version to jdk1.8.0_60. This happens later in the build than the environment variable injection stage.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jan 12, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1735/

Build result: FAILURE

[...truncated 38 lines...]Triggering ADAM-prb ? 2.6.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1735/

Build result: FAILURE

[...truncated 38 lines...]Triggering ADAM-prb ? 2.6.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Jan 12, 2017

Member

arg - now the remaining problem is that the HBase spark library version I am using seems to not support Spark 2.0 and thus we are getting complaints about lack of that Spark Logging class that went away in Spark 2.0. I can check into if the newest version of HBase/Spark connection supports Spark 2.0 as it seems to here: https://github.com/apache/hbase/blob/master/hbase-spark/pom.xml
however, I am not sure if that will be compatible with the CDH version we'd want for the cluster.

Ideas about how to best manage these issues of different versions of libraries and Jenkins?
Maven Profiles?

Member

jpdna commented Jan 12, 2017

arg - now the remaining problem is that the HBase spark library version I am using seems to not support Spark 2.0 and thus we are getting complaints about lack of that Spark Logging class that went away in Spark 2.0. I can check into if the newest version of HBase/Spark connection supports Spark 2.0 as it seems to here: https://github.com/apache/hbase/blob/master/hbase-spark/pom.xml
however, I am not sure if that will be compatible with the CDH version we'd want for the cluster.

Ideas about how to best manage these issues of different versions of libraries and Jenkins?
Maven Profiles?

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Jan 12, 2017

Member

As you can see here:
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1735/

My HBase PR is failing only the 2.0.0 column of the matrix
due to:
"[ERROR] error: bad symbolic reference. A signature in HBaseContext.class refers to type Logging"

I take this to be the Spark 2.0 Logging issue.

The first target I see for HBase work is making it work as we have thus far with CDH, and CDH spark/hbase support clearly doesn't support yet Spark 2.- given its continued use of
org.apache.spark.logging

https://github.com/cloudera/hbase/blob/cdh5-1.2.0_5.9.0/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala#L41

In general, it looks like the Spark 2.0 HBase connector may not exist at all:
http://stackoverflow.com/questions/40908891/which-hbase-connector-for-spark-2-0-should-i-use

So - it may be for now impossible for this PR to pass against Spark 2.0 (without re-writing connector).
( the needed changes might now be so major - but not sure that would be a good use of my time. Also, we might want to contact the person at Cloudera who would work on this - does anyone you know https://github.com/tedyu ?
What would be the forum to inquire with Cloudera about running Spark2.0 with HBase on CDH)

Assuming no immediate solution, how should we manage this wrt Jenkins? Thoughts @heuermh and @fnothaft ?

Member

jpdna commented Jan 12, 2017

As you can see here:
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1735/

My HBase PR is failing only the 2.0.0 column of the matrix
due to:
"[ERROR] error: bad symbolic reference. A signature in HBaseContext.class refers to type Logging"

I take this to be the Spark 2.0 Logging issue.

The first target I see for HBase work is making it work as we have thus far with CDH, and CDH spark/hbase support clearly doesn't support yet Spark 2.- given its continued use of
org.apache.spark.logging

https://github.com/cloudera/hbase/blob/cdh5-1.2.0_5.9.0/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala#L41

In general, it looks like the Spark 2.0 HBase connector may not exist at all:
http://stackoverflow.com/questions/40908891/which-hbase-connector-for-spark-2-0-should-i-use

So - it may be for now impossible for this PR to pass against Spark 2.0 (without re-writing connector).
( the needed changes might now be so major - but not sure that would be a good use of my time. Also, we might want to contact the person at Cloudera who would work on this - does anyone you know https://github.com/tedyu ?
What would be the forum to inquire with Cloudera about running Spark2.0 with HBase on CDH)

Assuming no immediate solution, how should we manage this wrt Jenkins? Thoughts @heuermh and @fnothaft ?

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Jan 16, 2017

Member

Ping for suggestions about how I should deal with this Spark 2.0 compatibility and Jenkins.

Member

jpdna commented Jan 16, 2017

Ping for suggestions about how I should deal with this Spark 2.0 compatibility and Jenkins.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jan 16, 2017

Member

We could move it into a module that is only accessible via a profile and then disable that profile when building for Spark 2. That would be simple.

Member

fnothaft commented Jan 16, 2017

We could move it into a module that is only accessible via a profile and then disable that profile when building for Spark 2. That would be simple.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jan 16, 2017

Member

We could move it into a module that is only accessible via a profile and then disable that profile when building for Spark 2.

Yep, that would work. And this would be a good opportunity to bother upstream about it.

Member

heuermh commented Jan 16, 2017

We could move it into a module that is only accessible via a profile and then disable that profile when building for Spark 2.

Yep, that would work. And this would be a good opportunity to bother upstream about it.

@shaneknapp

This comment has been minimized.

Show comment
Hide comment
@shaneknapp

shaneknapp Jan 16, 2017

Contributor
Contributor

shaneknapp commented Jan 16, 2017

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Jan 17, 2017

Member

Alright, so do you suggest I make an HBase module at the same level as the adam-core and adam-cli modules?

It seems like the approach need in the POM would then be something like I see here:
http://maven.40175.n5.nabble.com/Excluding-certain-modules-in-a-profile-td81566.html#a81577

<project>
  ... 
  <modules>
    <module>module3</module>
    <module>module4</module>
  </modules>

  <profiles>
    <profile>
      <activation>
        <property>
          <name>!exclude-cpp-qa</name>
        </property>
      </activation>
      <modules>
        <module>module1</module>
        <module>module2</module>
      </modules>
    </profile>
  </profiles>
  ... 
</project>

Can you comment on what would be needed in the main project POM or elsewhere to selectively disable HBase module for the Spark 2 build?

Member

jpdna commented Jan 17, 2017

Alright, so do you suggest I make an HBase module at the same level as the adam-core and adam-cli modules?

It seems like the approach need in the POM would then be something like I see here:
http://maven.40175.n5.nabble.com/Excluding-certain-modules-in-a-profile-td81566.html#a81577

<project>
  ... 
  <modules>
    <module>module3</module>
    <module>module4</module>
  </modules>

  <profiles>
    <profile>
      <activation>
        <property>
          <name>!exclude-cpp-qa</name>
        </property>
      </activation>
      <modules>
        <module>module1</module>
        <module>module2</module>
      </modules>
    </profile>
  </profiles>
  ... 
</project>

Can you comment on what would be needed in the main project POM or elsewhere to selectively disable HBase module for the Spark 2 build?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jan 17, 2017

Member

There are a few different ways to activate a profile, see
http://books.sonatype.com/mvnref-book/reference/profiles-sect-activation.html

Member

heuermh commented Jan 17, 2017

There are a few different ways to activate a profile, see
http://books.sonatype.com/mvnref-book/reference/profiles-sect-activation.html

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Jan 17, 2017

Member

http://books.sonatype.com/mvnref-book/reference/profiles-sect-activation.html

Thanks, but I still feel like I'll be futzing around in the dark here - so to pick your brain more:

So it looks like in all of these I'd use the <activation> section to activate or with ! deactivate the module mentioned in the profile.

Some questions:

  1. Assuming we are trying to this to work with Jenkins based on what version of Spark is active which I assume Jenkins configuration is doing ( not by specifying the name or a Profile or setting a special env variable directly) what is the parameter/property I will be able to test inside the <activate> to determine which version of Spark as active? I'm not clear if this property is itself allowed to be a POM defined Property.

  2. Example 5.3.1 in the doc you pointed to seems like it as an AND combination of values when we'd possibly need an OR like "activate for Spark 1.6.2, 1.6.3, 1.6.4" I'm not sure how I'll include or exclude these multiple possible versions. If Jenkins could set an OS env variable like Spark2=True that would be set for any Spark versions greater than Spark 1.x then <activation> could test just for that.

Member

jpdna commented Jan 17, 2017

http://books.sonatype.com/mvnref-book/reference/profiles-sect-activation.html

Thanks, but I still feel like I'll be futzing around in the dark here - so to pick your brain more:

So it looks like in all of these I'd use the <activation> section to activate or with ! deactivate the module mentioned in the profile.

Some questions:

  1. Assuming we are trying to this to work with Jenkins based on what version of Spark is active which I assume Jenkins configuration is doing ( not by specifying the name or a Profile or setting a special env variable directly) what is the parameter/property I will be able to test inside the <activate> to determine which version of Spark as active? I'm not clear if this property is itself allowed to be a POM defined Property.

  2. Example 5.3.1 in the doc you pointed to seems like it as an AND combination of values when we'd possibly need an OR like "activate for Spark 1.6.2, 1.6.3, 1.6.4" I'm not sure how I'll include or exclude these multiple possible versions. If Jenkins could set an OS env variable like Spark2=True that would be set for any Spark versions greater than Spark 1.x then <activation> could test just for that.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jan 18, 2017

Member

I would just activate it through a profile as we do with the distribution module.

Member

fnothaft commented Jan 18, 2017

I would just activate it through a profile as we do with the distribution module.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Feb 6, 2017

Member

I could use another set of eyes if someone could look at below:

I'm attempting to move Hbase into a separate module in this new branch dev/b6 at my repo here:
https://github.com/jpdna/adam/tree/dev/b6

Trying to build after moving the module (skipping test for the moment) I get this error:

[ERROR] /home/paschallj/2_4_2017/adam/adam-hbase/src/main/scala/org/bdgenomics/adam/hbase/HBaseFunctions.scala:52: error: object rdd is not a member of package org.bdgenomics.adam
[ERROR] import org.bdgenomics.adam.rdd.variant.VariantContextRDD
[ERROR] ^
[ERROR] /home/paschallj/2_4_2017/adam/adam-hbase/src/main/scala/org/bdgenomics/adam/hbase/HBaseFunctions.scala:53: error: object rich is not a member of package org.bdgenomics.adam
[ERROR] import org.bdgenomics.adam.rich.RichVariant
[ERROR] ^
[ERROR] /home/paschallj/2_4_2017/adam/adam-hbase/src/main/scala/org/bdgenomics/adam/hbase/HBaseFunctions.scala:55: error: object models is not a member of package org.bdgenomics.adam
[ERROR] import org.bdgenomics.adam.models.{ ReferencePosition, ReferenceRegion, SequenceDictionary, VariantContext }

coming from:
https://github.com/jpdna/adam/blob/dev/b6/adam-hbase/src/main/scala/org/bdgenomics/adam/hbase/HBaseFunctions.scala#L52

Be sure to skip tests for the moment if you try to replicate from:
https://github.com/jpdna/adam/tree/dev/b6

I've got more POM cleanup to do, but first need to figure out why can't currently see:
org.bdgenomics.adam.rdd.variant.VariantContextRDD
from above. Thanks!

Member

jpdna commented Feb 6, 2017

I could use another set of eyes if someone could look at below:

I'm attempting to move Hbase into a separate module in this new branch dev/b6 at my repo here:
https://github.com/jpdna/adam/tree/dev/b6

Trying to build after moving the module (skipping test for the moment) I get this error:

[ERROR] /home/paschallj/2_4_2017/adam/adam-hbase/src/main/scala/org/bdgenomics/adam/hbase/HBaseFunctions.scala:52: error: object rdd is not a member of package org.bdgenomics.adam
[ERROR] import org.bdgenomics.adam.rdd.variant.VariantContextRDD
[ERROR] ^
[ERROR] /home/paschallj/2_4_2017/adam/adam-hbase/src/main/scala/org/bdgenomics/adam/hbase/HBaseFunctions.scala:53: error: object rich is not a member of package org.bdgenomics.adam
[ERROR] import org.bdgenomics.adam.rich.RichVariant
[ERROR] ^
[ERROR] /home/paschallj/2_4_2017/adam/adam-hbase/src/main/scala/org/bdgenomics/adam/hbase/HBaseFunctions.scala:55: error: object models is not a member of package org.bdgenomics.adam
[ERROR] import org.bdgenomics.adam.models.{ ReferencePosition, ReferenceRegion, SequenceDictionary, VariantContext }

coming from:
https://github.com/jpdna/adam/blob/dev/b6/adam-hbase/src/main/scala/org/bdgenomics/adam/hbase/HBaseFunctions.scala#L52

Be sure to skip tests for the moment if you try to replicate from:
https://github.com/jpdna/adam/tree/dev/b6

I've got more POM cleanup to do, but first need to figure out why can't currently see:
org.bdgenomics.adam.rdd.variant.VariantContextRDD
from above. Thanks!

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 3, 2017

Member

Continued in #1388. Closing.

Member

fnothaft commented Mar 3, 2017

Continued in #1388. Closing.

@fnothaft fnothaft closed this Mar 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment