Use .adam/_{seq,rg}dict.avro paths for Avro-formatted dictionaries #978

Merged
merged 1 commit into from Mar 29, 2016

Conversation

Projects
None yet
3 participants
@heuermh
Member

heuermh commented Mar 24, 2016

Fixes #945

This results in a Parquet folder structure of

$ ls -ls /var/folders/3y/61r1w_cs4hbdr_34nrdrbhww0000gn/T/3911097172870281306/reads12.adam
total 176
 8 -rw-r--r--   1 user  staff      8 Mar 24 17:38 ._SUCCESS.crc
 8 -rw-r--r--   1 user  staff     92 Mar 24 17:38 ._common_metadata.crc
 8 -rw-r--r--   1 user  staff    120 Mar 24 17:38 ._metadata.crc
 8 -rw-r--r--   1 user  staff     20 Mar 24 17:38 ._rgdict.avro.crc
 8 -rw-r--r--   1 user  staff     20 Mar 24 17:38 ._seqdict.avro.crc
 8 -rw-r--r--   1 user  staff    204 Mar 24 17:38 .part-r-00000.gz.parquet.crc
 0 -rw-r--r--   1 user  staff      0 Mar 24 17:38 _SUCCESS
24 -rw-r--r--   1 user  staff  10494 Mar 24 17:38 _common_metadata
32 -rw-r--r--   1 user  staff  14304 Mar 24 17:38 _metadata
 8 -rw-r--r--   1 user  staff   1247 Mar 24 17:38 _rgdict.avro
 8 -rw-r--r--   1 user  staff   1450 Mar 24 17:38 _seqdict.avro
56 -rw-r--r--   1 user  staff  24716 Mar 24 17:38 part-r-00000.gz.parquet

Another option would be .adam/.{seq,rg}dict.avro.

Other file names are mistaken by Parquet to be Parquet-formatted files and a RuntimeException is thrown, e.g.

RuntimeException: file:/var/folders/3y/61r1w_cs4hbdr_34nrdrbhww0000gn/T/
7766788398861766842/reads12.adam/seqdict.avro is not a Parquet file. expected
magic number at tail [80, 65, 82, 49] but found [-104, -80, 71, -108]
@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Mar 24, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1117/

Build result: FAILURE

[...truncated 24 lines...]Triggering ADAM-prb ? 2.6.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosADAM-prb ? 2.6.0,2.11,1.6.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1117/

Build result: FAILURE

[...truncated 24 lines...]Triggering ADAM-prb ? 2.6.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosADAM-prb ? 2.6.0,2.11,1.6.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Mar 24, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1118/

Build result: FAILURE

[...truncated 24 lines...]Triggering ADAM-prb ? 2.6.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosADAM-prb ? 2.6.0,2.11,1.6.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1118/

Build result: FAILURE

[...truncated 24 lines...]Triggering ADAM-prb ? 2.6.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosADAM-prb ? 2.6.0,2.11,1.6.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Mar 24, 2016

Member

Jenkins, retest this please

Member

heuermh commented Mar 24, 2016

Jenkins, retest this please

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Mar 24, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1119/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1119/
Test PASSed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 25, 2016

Member

Oh, savvy! Looks great @heuermh!

OOC, why's there a big deletion in FieldEnumerationSuite?

Member

fnothaft commented Mar 25, 2016

Oh, savvy! Looks great @heuermh!

OOC, why's there a big deletion in FieldEnumerationSuite?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Mar 25, 2016

Member

why's there a big deletion in FieldEnumerationSuite?

I started digging in there due to a failing unit test, turns out it was that saveAsParquet is order dependent. At it was, the saveAvro calls would create the parent directory which would blow up in adamParquetSave.

That test was goofy though, in that it was creating the parquet folder in adam-core/target/scala-2.10.4/test-classes instead of somewhere reasonable, so I kept the changes in.

Member

heuermh commented Mar 25, 2016

why's there a big deletion in FieldEnumerationSuite?

I started digging in there due to a failing unit test, turns out it was that saveAsParquet is order dependent. At it was, the saveAvro calls would create the parent directory which would blow up in adamParquetSave.

That test was goofy though, in that it was creating the parquet folder in adam-core/target/scala-2.10.4/test-classes instead of somewhere reasonable, so I kept the changes in.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 25, 2016

Member

why's there a big deletion in FieldEnumerationSuite?

I started digging in there due to a failing unit test, turns out it was that saveAsParquet is order dependent. At it was, the saveAvro calls would create the parent directory which would blow up in adamParquetSave.

That test was goofy though, in that it was creating the parquet folder in adam-core/target/scala-2.10.4/test-classes instead of somewhere reasonable, so I kept the changes in.

Ah, that makes sense. Thanks for the follow up.

This looks good to merge for me, but I will keep this open until tomorrow in case anyone else wants to chime in.

Member

fnothaft commented Mar 25, 2016

why's there a big deletion in FieldEnumerationSuite?

I started digging in there due to a failing unit test, turns out it was that saveAsParquet is order dependent. At it was, the saveAvro calls would create the parent directory which would blow up in adamParquetSave.

That test was goofy though, in that it was creating the parquet folder in adam-core/target/scala-2.10.4/test-classes instead of somewhere reasonable, so I kept the changes in.

Ah, that makes sense. Thanks for the follow up.

This looks good to merge for me, but I will keep this open until tomorrow in case anyone else wants to chime in.

@fnothaft fnothaft merged commit 65f893f into bigdatagenomics:master Mar 29, 2016

1 check passed

default Merged build finished.
Details
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 29, 2016

Member

Thanks @heuermh! Merged.

Member

fnothaft commented Mar 29, 2016

Thanks @heuermh! Merged.

@heuermh heuermh deleted the heuermh:dict-in-adam-dir branch Mar 29, 2016

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Mar 29, 2016

Member

Thanks!

Member

heuermh commented Mar 29, 2016

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment