Skip to content

[HUDI-7045] fix evolution by using legacy ff for reader#10007

Closed
jonvex wants to merge 1 commit intoapache:masterfrom
jonvex:fix_schema_evolution_new_reader
Closed

[HUDI-7045] fix evolution by using legacy ff for reader#10007
jonvex wants to merge 1 commit intoapache:masterfrom
jonvex:fix_schema_evolution_new_reader

Conversation

@jonvex
Copy link
Contributor

@jonvex jonvex commented Nov 7, 2023

Change Logs

Some of schema on write schema evolution happens inside the reader so we need to use legacy file format

Impact

schema evolution works with the new reader and ff

Risk level (write none, low medium or high below)

low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@hudi-bot
Copy link
Collaborator

hudi-bot commented Nov 8, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

supportBatchResult
}

private def wrapWithBatchConverter(reader: PartitionedFile => Iterator[InternalRow]): PartitionedFile => Iterator[InternalRow] = {
Copy link
Member

@codope codope Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? i think the flatmap per row could incur some significant cost for a large batch. Instead of wrapping everytime, can it be guarded for some cases such as when schema on read is enabled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. @jonvex I think Spark internally handles the batch processing (InternalRow vs ColumnarBatch) based on the boolean supportBatch returns. So we don't have to do batch converter here?

@yihua yihua added priority:blocker Production down; release blocker release-1.0.0 priority:critical Production degraded; pipelines stalled and removed priority:blocker Production down; release blocker labels Nov 8, 2023
@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Feb 26, 2024
@jonvex jonvex closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:critical Production degraded; pipelines stalled release-1.0.0 size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants