Performance improvements for streaming DAG write with secondary index

Couple of performance improvements on HUDI-9340.
1. While fetching secondary key from file group, we can project the secondary key itself instead of reading the entire record.
2. In HoodieAppendHandle, we can avoid reading the file slice twice to compute the secondary index changes. We can use the new records available in the handle and merge with previous file slice to compute the secondary index related changes.
3. We currently use toString to get the string representation of secondary key. We need to ensure this works with all data types - like date, timestamp.
[https://github.com/apache/hudi/blob/e017d85d76b5a2332e96ce0b7e4b2a552f98dadc/hudi-common/src/main/java/org/apache/hudi/metadata/SecondaryIndexRecordGenerationUtils.java#L259]

## JIRA info

- Link: https://issues.apache.org/jira/browse/HUDI-9546
- Type: Sub-task
- Parent: https://issues.apache.org/jira/browse/HUDI-9616
- Fix version(s):
  - 1.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements for streaming DAG write with secondary index #17438

JIRA info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance improvements for streaming DAG write with secondary index #17438

Description

JIRA info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions