[HUDI-3402] Set hoodie.parquet.outputtimestamptype to TIMESTAMP_MICROS by default#4749
Conversation
nsivabalan
left a comment
There was a problem hiding this comment.
@YannByron : thanks for the fix. But lets be cautious of any any breaking change. If you think about a user who has been using hudi for the past 2 to 3 releases (even just upsert), when they upgrade to 0.11, wouldn't the updates be inadvertantly treated as inserts? I assume timestamp based partition path will be impacted with this.
Can you help clarify please.
|
@nsivabalan |
Which config is being set to true/false in the above statement? |
|
@codope |
Thanks for the clarification. |
|
can we file a jira please as we try to get consensus |
|
thanks Sagar for bringing up a good point. Just playing devil's advocate. Spark in general has much incompatability issues even after being very mature. So, for the benefit we get (same behavior across bulk insert and other operations), I am thinking if we can go ahead with this change. Again, my assumption is that w/ non row writer operations, we are already honoring micro secs with timestamp type. Even today, users have some inconsistencies here which they are living with. |
|
codope
left a comment
There was a problem hiding this comment.
@YannByron @nsivabalan I've verified the patch. It does not affect the partition path. Should be good to land.
|
cool, thanks a lot man! |
…uet.outputtimestamptype (apache#4749)
Hoodie converts
Timestampto TIMESTAMP_MICROS format when upsert and other operations, exceptbulk_insert.And
bulk_insertenableshoodie.datasource.write.row.writer.enable, and useHoodieRowParquetWriteSupportto write datas.For the issue #4552 , that will cause problems by default. So i suggest to modify the
hoodie.parquet.outputtimestamptypedefault value to TIMESTAMP_MICROS so that it will be convenience to users.