vcf2adam : Unsupported type ENUM #638

Closed
simplelive opened this Issue Apr 2, 2015 · 5 comments

Comments

Projects
None yet
3 participants
@simplelive
Contributor

simplelive commented Apr 2, 2015

adam-submit vcf2adam small.vcf small.adam

0: jdbc:drill:zk=local> SELECT * FROM dfs./spark/work/app-20150402150016-0004/0/small.adam/part-r-00000.gz.parquet;
Query failed: Query failed: Failure while running fragment., Unsupported type ENUM [ 34e42d37-bb08-4c56-9b25-0e626e7a05a8 on localhost:31010 ]
[ 34e42d37-bb08-4c56-9b25-0e626e7a05a8 on localhost:31010 ]

Error: exception while executing query: Failure while executing query. (state=,code=0)
0: jdbc:drill:zk=local>

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Apr 2, 2015

Member

What query engine are you getting this error from? This is an error on their side. That being said, it may be useful to try running one of the new flatten commands (see #630) before loading the data inside of that engine.

Member

fnothaft commented Apr 2, 2015

What query engine are you getting this error from? This is an error on their side. That being said, it may be useful to try running one of the new flatten commands (see #630) before loading the data inside of that engine.

@pgrosu

This comment has been minimized.

Show comment
Hide comment
@pgrosu

pgrosu Apr 2, 2015

From the error I believe he's using Drill.

pgrosu commented Apr 2, 2015

From the error I believe he's using Drill.

@simplelive

This comment has been minimized.

Show comment
Hide comment
@simplelive

simplelive Apr 3, 2015

Contributor

Transform and query test is OK, But query vcf2adam file failed.

Engine:
Drill 0.7.0

OS:
Linux V1 2.6.32-504.12.2.el6.x86_64 #1 SMP Wed Mar 11 22:03:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Transform and query:

[root@V1 ~]# adam-submit transform $ADAM_HOME/adam-core/src/test/resources/small.sam small.adam
Spark assembly has been built with Hive, including Datanucleus jars on classpath
2015-04-03 09:32:57 WARN NativeCodeLoader:52 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@V1 ~]#

0: jdbc:drill:zk=local> SELECT contig,start,oldPosition,end,mapq FROM dfs./spark/work/app-20150403093253-0009/0/small.adam/part-r-00000.gz.parquet ;
+------------+------------+-------------+------------+------------+
| contig | start | oldPosition | end | mapq |
+------------+------------+-------------+------------+------------+
| {"contigName":"1","contigLength":249250621} | 26472783 | null | 26472858 | 60 |
| {"contigName":"1","contigLength":249250621} | 240997787 | null | 240997862 | 60 |
| {"contigName":"1","contigLength":249250621} | 189606653 | null | 189606728 | 60 |
| {"contigName":"1","contigLength":249250621} | 207027738 | null | 207027813 | 60 |
| {"contigName":"1","contigLength":249250621} | 14397233 | null | 14397308 | 60 |
| {"contigName":"1","contigLength":249250621} | 240344442 | null | 240344517 | 24 |
| {"contigName":"1","contigLength":249250621} | 153978724 | null | 153978799 | 60 |
| {"contigName":"1","contigLength":249250621} | 237728409 | null | 237728484 | 28 |
| {"contigName":"1","contigLength":249250621} | 231911906 | null | 231911981 | 60 |
| {"contigName":"1","contigLength":249250621} | 50683371 | null | 50683446 | 60 |
| {"contigName":"1","contigLength":249250621} | 37577445 | null | 37577520 | 60 |
| {"contigName":"1","contigLength":249250621} | 195211965 | null | 195212040 | 60 |
| {"contigName":"1","contigLength":249250621} | 163841413 | null | 163841488 | 60 |
| {"contigName":"1","contigLength":249250621} | 101556378 | null | 101556453 | 60 |
| {"contigName":"1","contigLength":249250621} | 20101800 | null | 20101875 | 35 |
| {"contigName":"1","contigLength":249250621} | 186794283 | null | 186794358 | 60 |
| {"contigName":"1","contigLength":249250621} | 165341382 | null | 165341457 | 60 |
| {"contigName":"1","contigLength":249250621} | 5469106 | null | 5469181 | 60 |
| {"contigName":"1","contigLength":249250621} | 89554252 | null | 89554327 | 60 |
| {"contigName":"1","contigLength":249250621} | 169801933 | null | 169802008 | 40 |
+------------+------------+-------------+------------+------------+
20 rows selected (0.082 seconds)
0: jdbc:drill:zk=local>

Contributor

simplelive commented Apr 3, 2015

Transform and query test is OK, But query vcf2adam file failed.

Engine:
Drill 0.7.0

OS:
Linux V1 2.6.32-504.12.2.el6.x86_64 #1 SMP Wed Mar 11 22:03:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Transform and query:

[root@V1 ~]# adam-submit transform $ADAM_HOME/adam-core/src/test/resources/small.sam small.adam
Spark assembly has been built with Hive, including Datanucleus jars on classpath
2015-04-03 09:32:57 WARN NativeCodeLoader:52 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@V1 ~]#

0: jdbc:drill:zk=local> SELECT contig,start,oldPosition,end,mapq FROM dfs./spark/work/app-20150403093253-0009/0/small.adam/part-r-00000.gz.parquet ;
+------------+------------+-------------+------------+------------+
| contig | start | oldPosition | end | mapq |
+------------+------------+-------------+------------+------------+
| {"contigName":"1","contigLength":249250621} | 26472783 | null | 26472858 | 60 |
| {"contigName":"1","contigLength":249250621} | 240997787 | null | 240997862 | 60 |
| {"contigName":"1","contigLength":249250621} | 189606653 | null | 189606728 | 60 |
| {"contigName":"1","contigLength":249250621} | 207027738 | null | 207027813 | 60 |
| {"contigName":"1","contigLength":249250621} | 14397233 | null | 14397308 | 60 |
| {"contigName":"1","contigLength":249250621} | 240344442 | null | 240344517 | 24 |
| {"contigName":"1","contigLength":249250621} | 153978724 | null | 153978799 | 60 |
| {"contigName":"1","contigLength":249250621} | 237728409 | null | 237728484 | 28 |
| {"contigName":"1","contigLength":249250621} | 231911906 | null | 231911981 | 60 |
| {"contigName":"1","contigLength":249250621} | 50683371 | null | 50683446 | 60 |
| {"contigName":"1","contigLength":249250621} | 37577445 | null | 37577520 | 60 |
| {"contigName":"1","contigLength":249250621} | 195211965 | null | 195212040 | 60 |
| {"contigName":"1","contigLength":249250621} | 163841413 | null | 163841488 | 60 |
| {"contigName":"1","contigLength":249250621} | 101556378 | null | 101556453 | 60 |
| {"contigName":"1","contigLength":249250621} | 20101800 | null | 20101875 | 35 |
| {"contigName":"1","contigLength":249250621} | 186794283 | null | 186794358 | 60 |
| {"contigName":"1","contigLength":249250621} | 165341382 | null | 165341457 | 60 |
| {"contigName":"1","contigLength":249250621} | 5469106 | null | 5469181 | 60 |
| {"contigName":"1","contigLength":249250621} | 89554252 | null | 89554327 | 60 |
| {"contigName":"1","contigLength":249250621} | 169801933 | null | 169802008 | 40 |
+------------+------------+-------------+------------+------------+
20 rows selected (0.082 seconds)
0: jdbc:drill:zk=local>

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Apr 3, 2015

Member

Drill is throwing an error because our Genotype tables contain an ENUM type, which it can't handle. The AlignmentRecord schema doesn't contain any ENUMs, and thus works fine. This is an error in Drill; our storage format is properly formed Avro/Parquet. Do you know how Drill reads in Parquet files? E.g., is it through parquet-hive, parquet-mr, etc? I'm not familiar with Drill, so I unfortunately can't help much here. I would either open a JIRA against Drill or post to their user/dev lists for help.

Member

fnothaft commented Apr 3, 2015

Drill is throwing an error because our Genotype tables contain an ENUM type, which it can't handle. The AlignmentRecord schema doesn't contain any ENUMs, and thus works fine. This is an error in Drill; our storage format is properly formed Avro/Parquet. Do you know how Drill reads in Parquet files? E.g., is it through parquet-hive, parquet-mr, etc? I'm not familiar with Drill, so I unfortunately can't help much here. I would either open a JIRA against Drill or post to their user/dev lists for help.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 6, 2016

Member

Closing as won't fix. Parquet's ENUM type is supported in the latest versions of Spark SQL, at the least.

Member

fnothaft commented Jul 6, 2016

Closing as won't fix. Parquet's ENUM type is supported in the latest versions of Spark SQL, at the least.

@fnothaft fnothaft closed this Jul 6, 2016

@fnothaft fnothaft added the wontfix label Jul 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment