Skip to content

add new config controls whether input rdd should be first persist before insert. #15596

@hudi-bot

Description

@hudi-bot

When our input data comes from a complex rdd lineage, hudi writing will lead to repeated calculations.
For example, we will de duplicate according to the key of the input data, and we will obtain all partitions to be written to the data in the tag location. So I think we should cache the data to be written for downstream use.

JIRA info

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions