Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive 2.X support for Hoodie #154

Closed
alunarbeach opened this issue Apr 11, 2017 · 12 comments
Closed

Hive 2.X support for Hoodie #154

alunarbeach opened this issue Apr 11, 2017 · 12 comments

Comments

@alunarbeach
Copy link
Contributor

alunarbeach commented Apr 11, 2017

While following the quickstart guide, I got the following error in hive sync tool. Same error is there if i follow "Manually via beeline" option as well.

#command:
java -cp target/hoodie-hive-0.3.6-SNAPSHOT-jar-with-dependencies.jar:target/jars/* com.uber.hoodie.hive.HiveSyncTool --base-path file:///tmp/hoodie/sample-
table/ --database default --table hoodie_test --user hive --pass hive --jdbc-url jdbc:hive2://test-m:10000/

#Exception

Exception in thread "main" com.uber.hoodie.hive.HoodieHiveDatasetException: Failed to sync dataset HoodieDatasetReference{tableName='hoodie_test', baseDatasetPath='file:///tmp/hoodie/sample-table/', databaseName='default'}
	at com.uber.hoodie.hive.HoodieHiveDatasetSyncTask.sync(HoodieHiveDatasetSyncTask.java:85)
	at com.uber.hoodie.hive.HiveSyncTool.sync(HiveSyncTool.java:66)
	at com.uber.hoodie.hive.HiveSyncTool.main(HiveSyncTool.java:80)
Caused by: com.uber.hoodie.hive.HoodieHiveDatasetException: Failed to sync dataset HoodieDatasetReference{tableName='hoodie_test', baseDatasetPath='file:///tmp/hoodie/sample-table/', databaseName='default'}
	at com.uber.hoodie.hive.HoodieHiveSchemaSyncTask.sync(HoodieHiveSchemaSyncTask.java:101)
	at com.uber.hoodie.hive.HoodieHiveDatasetSyncTask.sync(HoodieHiveDatasetSyncTask.java:73)
	... 2 more
Caused by: com.uber.hoodie.hive.HoodieHiveDatasetException: Failed in executing SQL CREATE EXTERNAL TABLE  IF NOT EXISTS default.hoodie_test( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `timestamp` double, `_row_key` string, `rider` string, `driver` string, `begin_lat` double, `begin_lon` double, `end_lat` double, `end_lon` double, `fare` double) PARTITIONED BY (datestr String) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'com.uber.hoodie.hadoop.HoodieInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:///tmp/hoodie/sample-table/'
	at com.uber.hoodie.hive.client.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:169)
	at com.uber.hoodie.hive.client.HoodieHiveClient.createTable(HoodieHiveClient.java:241)
	at com.uber.hoodie.hive.HoodieHiveSchemaSyncTask.sync(HoodieHiveSchemaSyncTask.java:88)
	... 3 more
@vinothchandar
Copy link
Member

Seems to be parquet jars missing? Let me see if I did anything special locally..

@alunarbeach
Copy link
Contributor Author

Thanks @vinothchandar

@alunarbeach
Copy link
Contributor Author

alunarbeach commented Apr 11, 2017

I think the issue was with Hive Version. I am using Hive 2.1.0 and the examples were for Hive 1.1.1 and with CDH dependency.
This worked for me after doing the following:

  • Overhauling import statements (changing parquet.* to org.apache.parquet.*)
  • Modifying dependencies in pom.xmls
  • Changing a couple of method signatures since Hive 2.1.0 does not have them anymore.

Can I make a pull request?

@prazanna
Copy link
Contributor

Glad you got it working @alunarbeach

The fix is a little tricky. We have to create profiles to support multiple hive versions. We cannot change the parquet library to use apache parquet instead of twitter parquet, because that will break support for hive 1.X versions.

Let me think about this and let you know the best way to create a PR for this.

@alunarbeach
Copy link
Contributor Author

Thanks @prazanna. I understand.

@vinothchandar
Copy link
Member

@alunarbeach of course, let us know if there is a way to do this easily.. may be a hoodie-hive2 module? @prazanna ?

@alunarbeach
Copy link
Contributor Author

@prazanna @vinothchandar any suggestions on how do you want to proceed?

@vinothchandar
Copy link
Member

I am happy with hoodie-hive2.. @prazanna is the hive registration expert.. :) Will wait for him to pitch in

@alunarbeach
Copy link
Contributor Author

@prazanna Any suggestions?

@prazanna
Copy link
Contributor

@alunarbeach - I am still figuring out the best way to do this. Please go ahead with the hoodie-hive2 module for now. If we need to change this to make it something like this (https://github.com/streamsets/datacollector/blob/master/pom.xml#L363) we will refactor this later. Would be happy to sign off a PR from you @alunarbeach - Thanks.

@alunarbeach
Copy link
Contributor Author

@prazanna Have a look at my pull request and let me know.

@vinothchandar vinothchandar changed the title java.lang.ClassNotFoundException: parquet.filter2.predicate.Operators$Column Hive 2.X support for Hoodie Mar 29, 2018
@vinothchandar
Copy link
Member

#420 fixes this

vinishjail97 pushed a commit to vinishjail97/hudi that referenced this issue Dec 15, 2023
* adding default labels to datadog/prometheus reporters

* renaming varaibles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants