Skip to content

when column type is decimal, should add precision and scale#753

Merged
vinothchandar merged 1 commit intoapache:masterfrom
cdmikechen:decimal-fix2
Jul 8, 2019
Merged

when column type is decimal, should add precision and scale#753
vinothchandar merged 1 commit intoapache:masterfrom
cdmikechen:decimal-fix2

Conversation

@cdmikechen
Copy link
Copy Markdown
Contributor

When using HoodieHiveClient to create a hive table, I found that a decimal column just only show DECIMAL in schema and don't have precision and scale. So that hoodie will report error, if hoodie is analysing this schema.
Try to fix this problem.

@vinothchandar
Copy link
Copy Markdown
Member

@cdmikechen thanks for the fix. does this happen when hive syncing a DECIMAL column?

@cdmikechen
Copy link
Copy Markdown
Contributor Author

@vinothchandar yes. I have a hive table with decimal column.
When I used run_sync_tool.sh to sync a hoodie data to hive table, hoodie used hive jdbc to read hive schema and checked difference from hive and hoodie. I found a decimal column in hive table didn't have precision and scale, but hoodie table had, so that hoodie thought they were different column type and reported error.

@cdmikechen
Copy link
Copy Markdown
Contributor Author

I found this problem when I revised a change about not using com.databricks:spark-avro_2.11 and letting hoodie support timestamp and decimal column type below spark2.4.
I have completed spark test about supporting timestamp and decimal column type but have some problem in testing hive. When I finish testing I will take a PR.

@vinothchandar
Copy link
Copy Markdown
Member

sg. In parallel, let me try to fully understand these gaps. overall lg otherwise

@vinothchandar
Copy link
Copy Markdown
Member

@cdmikechen have not been able to spend time on this much.. any updates from your testing?

@vinothchandar
Copy link
Copy Markdown
Member

This timestamp handling has dogged us for a while. :( if you understand it fully, can you please put up a HIP with your suggestions.. we can then divvy up the actual impl..

@cdmikechen
Copy link
Copy Markdown
Contributor Author

@vinothchandar add a pr: #770
You can see this pr, the problem of timestamp and decimal data types should be solved in spark. But timestamp can not be solved in Hive, because Hive don't support parquet logical_type.

@bvaradar
Copy link
Copy Markdown
Contributor

bvaradar commented Jul 3, 2019

@cdmikechen - @n3nash or myself will review this in couple of days and will get back to you

@vinothchandar
Copy link
Copy Markdown
Member

@cdmikechen finally understood :) .. Thanks. merging!

@vinothchandar vinothchandar merged commit 62ecb2d into apache:master Jul 8, 2019
thesuperzapper pushed a commit to thesuperzapper/incubator-hudi that referenced this pull request Jul 11, 2019
pkgajulapalli pushed a commit to pkgajulapalli/hudi that referenced this pull request Jun 14, 2024
… and Adding guards to catch spurious data files with clustering (apache#753)

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants