Skip to content

Performance improvements for streaming DAG write with secondary index #17438

@hudi-bot

Description

@hudi-bot

Couple of performance improvements on HUDI-9340.

  1. While fetching secondary key from file group, we can project the secondary key itself instead of reading the entire record.
  2. In HoodieAppendHandle, we can avoid reading the file slice twice to compute the secondary index changes. We can use the new records available in the handle and merge with previous file slice to compute the secondary index related changes.
  3. We currently use toString to get the string representation of secondary key. We need to ensure this works with all data types - like date, timestamp.
    [https://github.com/apache/hudi/blob/e017d85d76b5a2332e96ce0b7e4b2a552f98dadc/hudi-common/src/main/java/org/apache/hudi/metadata/SecondaryIndexRecordGenerationUtils.java#L259]

JIRA info

Metadata

Metadata

Assignees

Labels

from-jirapriority:criticalProduction degraded; pipelines stalledtype:devtaskDevelopment tasks and maintenance work

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions