Skip to content

[HUDI-4942] Fix RowSource schema provider#6817

Closed
codope wants to merge 2 commits intoapache:masterfrom
codope:fix-row-source-schema-provider
Closed

[HUDI-4942] Fix RowSource schema provider#6817
codope wants to merge 2 commits intoapache:masterfrom
codope:fix-row-source-schema-provider

Conversation

@codope
Copy link
Member

@codope codope commented Sep 28, 2022

Change Logs

Default value being provided by schema provider is being lost since RowSource sets a RowBasedSchemaProvider for the InputBatch. This PR fixes it by passing the user-specified schema provider.

Impact

Describe any public API or user-facing feature change or any performance impact.

Risk level: none | low | medium | high

Choose one. If medium or high, explain what verification was done to mitigate the risks.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@codope
Copy link
Member Author

codope commented Sep 28, 2022

@nsivabalan Can you please review this? I am yet to add a unit test but I have tested with my local confluent schema registry setup. The main issue is that if a schema provider is overridden then RowSource does not take it into consideration. It simply fetched the schema based on Row.

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is ti possible to write tests?

@xushiyan xushiyan assigned xushiyan and unassigned nsivabalan and codope Sep 29, 2022
Comment on lines +44 to +46
if (overriddenSchemaProvider != null) {
return new InputBatch<>(res.getKey(), res.getValue(), overriddenSchemaProvider);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.hudi.utilities.sources.Source#fetchNext actually checks and uses overriddenSchemaProvider.
And fetchNewData() is only used in fetchNext() . i think some misconfig caused the issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually a valid point. I wrote a unit test with evolving schema. But, it passes even without this change. I think we can hold off landing this PR. Let me investigate more.

@codope codope added priority:critical Production degraded; pipelines stalled and removed priority:blocker Production down; release blocker labels Sep 29, 2022
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan
Copy link
Contributor

@codope : whats the status of this PR. do we need this anymore. if not, do we still need to rootcause the original issue then ?

@codope
Copy link
Member Author

codope commented Nov 2, 2022

Closing the PR. We need to root cause the issue. Something more is happening here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:ingest Ingestion into Hudi priority:critical Production degraded; pipelines stalled

Projects

Status: 🚧 Needs Repro
Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants