Skip to content

Multiple ordering fields for partial update to handle out-of-order events #15445

@hudi-bot

Description

@hudi-bot

This feature aims to improve PartialUpdatePayload to handle multiple sources properly
Let's give you some background about why we need multiple ordering fields
For example, we have 2 sources, one target table

  • source1's fields: id, ts, name
  • source2's fields:id, ts, price
  • target tables's fields:id,ts,name, price

ts is the precombine field;

in the 1st batch, we got two records from both sources:
Source1:

||id||ts||name||
|1|1|name_1|
Source 2:

||id||ts||price||
|1|3|price_3|
so the records in the target table should be:
||id||ts||name||price||
|1|3|name_1|price_3|

let's say in the 2nd batch, we got one event from the source1:
Source1:
||id||ts||name||
|1|2|name_2|

but name_2 won't be updated to the target table, since its ts value is smaller than the ts value in the target table.

This feature will allow users to perform partial updates across sub-tables/sources by determining the state of a set of columns in a row based on an ordering/precombine column.

As such, a table can have MULTIPLE ordering fields.

This use case is suitable for wide Hudi tables that are created from smaller sub-tables, where each of its sub-tables has its own precombine column, and where its records could be upserted out of order.
!image-2022-09-20-22-46-52-907.png!

JIRA info

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions