Skip to content

Row Level TTL Support for records stored in Hudi #14770

@hudi-bot

Description

@hudi-bot

For e:g : Have records only updated last month 

 

GH: #2743

JIRA info


Comments

31/Mar/21 00:43;vbalaji;[~shivnarayan] : FYI;;;


03/Apr/21 16:28;pratyakshsharma;Guess the same can be handled with this Jira - https://issues.apache.org/jira/browse/HUDI-349? [~vbalaji] [~shivnarayan];;;


05/Apr/21 15:25;aditiwari;[~pratyakshsharma] I guess with time based cleaning policy, we might need some modifications in compactor as well. 

For a recently updated base file also some of its records might be older.

Time based cleaner and filtering out records with older commit time while compacting(in MOR) or rewriting(in COW) base file should solve the issue.;;;


28/Oct/22 03:09;nicholasjiang;[~shivnarayan], IMO, each record of hudi has the commit time of hudi. The solution is to first follow the TTL, do not display expired data when checking, or even push down to the data source directly, and then delete it when doing operations such as clustering that need to rewrite the data. WDYT?

cc [~xleesf] ;;;


28/Oct/22 03:29;xleesf;[~nicholasjiang] agree with the solution;;;

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions