Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Hive-style partitioning #370

Closed
wants to merge 2 commits into from
Closed

Conversation

jpdna
Copy link
Member

@jpdna jpdna commented Feb 25, 2018

Replaces #361

Works with ADAM PR bigdatagenomics/adam#1922

Reading partitioned files works, for example with command

mango-submit --master yarn --num-executors 10 --executor-cores 4 --executor-memory 20g --driver-memory 20g  -- /home/eecs/akmorrow/builds/hg19.2bit -genes http://www.biodalliance.org/datasets/ensGene.bb -reads hdfs://amp-bdg-master.amplab.net:8020/user/jpaschall/feb16_work/NA12889_S1.bam.partitioned.v4_withpartnum.adam -show_genotypes -parquetIsBinned

Note, this PR currently fails tests, but so does Mango Master for me at, 328b519
I get test failure

VizReadsSuite:
2018-02-25 16:29:33 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: org/apache/http/ssl/SSLContexts
  at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:966)
  at org.scalatra.test.HttpComponentsClient$class.createClient(HttpComponentsClient.scala:99)
  at org.bdgenomics.mango.cli.VizReadsSuite.createClient(VizReadsSuite.scala:27)
  at org.scalatra.test.HttpComponentsClient$class.submit(HttpComponentsClient.scala:62)
  at org.bdgenomics.mango.cli.VizReadsSuite.submit(VizReadsSuite.scala:27)

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/mango-prb/607/
Test FAILed.

@jpdna
Copy link
Member Author

jpdna commented Mar 10, 2018

This worked fine with, ADAM with the PR bigdatagenomics/adam#1948
at
f745bca

Note, you need to use partitioned parquet files generated from that version of the ADAM PR.
This works:

../mango/bin/mango-submit --master yarn --num-executors 10 --executor-cores 4 --executor-memory 20g --driver-memory 20g  -- ./hg19.2bit -genes http://www.biodalliance.org/datasets/ensGene.bb -reads hdfs://amp-bdg-master.amplab.net:8020/user/jpaschall/march9/NA12877_S1.partitioned.v2.adam  -show_genotypes 

@akmorrow13
Copy link
Contributor

Jenkins, retest this please.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/mango-prb/623/
Test PASSed.

@akmorrow13
Copy link
Contributor

@jpdna can you add unit tests?

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/mango-prb/624/
Test PASSed.

@@ -67,6 +68,20 @@ class VariantContextMaterializationSuite extends MangoFunSuite {

}

sparkTest("Can read Partitioned Parquet Genotypes") {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line

}

sparkTest("Read Partitioned Data") {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line

@akmorrow13
Copy link
Contributor

This looks great @jpdna ! Just some minor spacing comments, otherwise it looks good to go on my side.

@akmorrow13
Copy link
Contributor

Replaced with #379

@akmorrow13 akmorrow13 closed this Mar 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants