When our input data comes from a complex rdd lineage, hudi writing will lead to repeated calculations.
For example, we will de duplicate according to the key of the input data, and we will obtain all partitions to be written to the data in the tag location. So I think we should cache the data to be written for downstream use.
JIRA info