Use Hive-style partitioning #370

jpdna · 2018-02-25T22:37:05Z

Replaces #361

Works with ADAM PR bigdatagenomics/adam#1922

Reading partitioned files works, for example with command

mango-submit --master yarn --num-executors 10 --executor-cores 4 --executor-memory 20g --driver-memory 20g  -- /home/eecs/akmorrow/builds/hg19.2bit -genes http://www.biodalliance.org/datasets/ensGene.bb -reads hdfs://amp-bdg-master.amplab.net:8020/user/jpaschall/feb16_work/NA12889_S1.bam.partitioned.v4_withpartnum.adam -show_genotypes -parquetIsBinned

Note, this PR currently fails tests, but so does Mango Master for me at, 328b519
I get test failure

VizReadsSuite:
2018-02-25 16:29:33 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: org/apache/http/ssl/SSLContexts
  at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:966)
  at org.scalatra.test.HttpComponentsClient$class.createClient(HttpComponentsClient.scala:99)
  at org.bdgenomics.mango.cli.VizReadsSuite.createClient(VizReadsSuite.scala:27)
  at org.scalatra.test.HttpComponentsClient$class.submit(HttpComponentsClient.scala:62)
  at org.bdgenomics.mango.cli.VizReadsSuite.submit(VizReadsSuite.scala:27)

AmplabJenkins · 2018-02-25T22:41:04Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/mango-prb/607/
Test FAILed.

jpdna · 2018-03-10T03:54:10Z

This worked fine with, ADAM with the PR bigdatagenomics/adam#1948
at
f745bca

Note, you need to use partitioned parquet files generated from that version of the ADAM PR.
This works:

../mango/bin/mango-submit --master yarn --num-executors 10 --executor-cores 4 --executor-memory 20g --driver-memory 20g  -- ./hg19.2bit -genes http://www.biodalliance.org/datasets/ensGene.bb -reads hdfs://amp-bdg-master.amplab.net:8020/user/jpaschall/march9/NA12877_S1.partitioned.v2.adam  -show_genotypes

akmorrow13 · 2018-03-18T21:32:50Z

Jenkins, retest this please.

AmplabJenkins · 2018-03-18T21:42:50Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/mango-prb/623/
Test PASSed.

akmorrow13 · 2018-03-18T21:49:43Z

@jpdna can you add unit tests?

AmplabJenkins · 2018-03-20T16:07:28Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/mango-prb/624/
Test PASSed.

akmorrow13 · 2018-03-20T17:27:23Z

mango-core/src/test/scala/org/bdgenomics/mango/models/VariantContextMaterializationSuite.scala

@@ -67,6 +68,20 @@ class VariantContextMaterializationSuite extends MangoFunSuite {

  }

+  sparkTest("Can read Partitioned Parquet Genotypes") {
+


remove empty line

akmorrow13 · 2018-03-20T17:27:31Z

mango-core/src/test/scala/org/bdgenomics/mango/models/AlignmentRecordMaterializationSuite.scala

+  }
+
+  sparkTest("Read Partitioned Data") {
+


remove empty line

akmorrow13 · 2018-03-20T17:28:37Z

This looks great @jpdna ! Just some minor spacing comments, otherwise it looks good to go on my side.

akmorrow13 · 2018-03-21T16:21:10Z

Replaced with #379

Use Hive-style partitioning

e74e304

jpdna mentioned this pull request Feb 26, 2018

Mango using Partitioned parquet ADAM #361

Closed

Added Hive-partitioned unit tests

b15aa70

akmorrow13 approved these changes Mar 20, 2018

View reviewed changes

akmorrow13 closed this Mar 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Hive-style partitioning #370

Use Hive-style partitioning #370

jpdna commented Feb 25, 2018

AmplabJenkins commented Feb 25, 2018

jpdna commented Mar 10, 2018

akmorrow13 commented Mar 18, 2018

AmplabJenkins commented Mar 18, 2018

akmorrow13 commented Mar 18, 2018

AmplabJenkins commented Mar 20, 2018

akmorrow13 Mar 20, 2018

akmorrow13 Mar 20, 2018

akmorrow13 commented Mar 20, 2018

akmorrow13 commented Mar 21, 2018

		@@ -67,6 +68,20 @@ class VariantContextMaterializationSuite extends MangoFunSuite {

		}

		sparkTest("Can read Partitioned Parquet Genotypes") {

Use Hive-style partitioning #370

Use Hive-style partitioning #370

Conversation

jpdna commented Feb 25, 2018

AmplabJenkins commented Feb 25, 2018

jpdna commented Mar 10, 2018

akmorrow13 commented Mar 18, 2018

AmplabJenkins commented Mar 18, 2018

akmorrow13 commented Mar 18, 2018

AmplabJenkins commented Mar 20, 2018

akmorrow13 Mar 20, 2018

Choose a reason for hiding this comment

akmorrow13 Mar 20, 2018

Choose a reason for hiding this comment

akmorrow13 commented Mar 20, 2018

akmorrow13 commented Mar 21, 2018