Skip to content

TIMESTAMP_MICROS handling #17222

@hudi-bot

Description

@hudi-bot

Hi Guys!
 
I am not able to use timestamp micro columns save with HUDI. 
I would like to save it keeping microsec granularity, but it only keeps milisec.
 
I have set this:
--conf spark.sql.parquet.outputTimestampType=TIMESTAMP_MICROS
and also this in the hoodie:
"hoodie.parquet.outputtimestamptype": "TIMESTAMP_MICROS",
but when I read it back (with pyspark, load api), it's only millisecond precision and unfortunately, I need the microsec in some case, because with this I run into a Schrödinger's cat situation  !https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!
So an entity has more than one states in the same time !https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!Can someone enlighten me what should I do?
 
Before the save, everything is fine! ("ts" column)

Darvi
SLACK Thread: [https://apache-hudi.slack.com/archives/C4D716NPQ/p1652347742173779]
 

JIRA info

Metadata

Metadata

Assignees

Labels

from-jirapriority:criticalProduction degraded; pipelines stalledtype:devtaskDevelopment tasks and maintenance work

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions