Skip to content

[HUDI-5309] Support for Spark to automatically enable schema evolution when reading the Hoodie table#7301

Closed
fuyun2024 wants to merge 1 commit intoapache:masterfrom
fuyun2024:automatically-read-schema-evolution
Closed

[HUDI-5309] Support for Spark to automatically enable schema evolution when reading the Hoodie table#7301
fuyun2024 wants to merge 1 commit intoapache:masterfrom
fuyun2024:automatically-read-schema-evolution

Conversation

@fuyun2024
Copy link
Contributor

@fuyun2024 fuyun2024 commented Nov 24, 2022

Support for Spark to automatically enable schema evolution when reading the Hoodie table.

Change Logs

Check if table has internal schema to enable schema evolution.

Impact

none

Risk level (write none, low medium or high below)

none

Documentation Update

No doc or configuration changed.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@fuyun2024 fuyun2024 changed the title Support for Spark to automatically enable schema evolution when reading the Hoodie table [minor] Support for Spark to automatically enable schema evolution when reading the Hoodie table Nov 24, 2022
@fuyun2024 fuyun2024 changed the title [minor] Support for Spark to automatically enable schema evolution when reading the Hoodie table [MINOR] Support for Spark to automatically enable schema evolution when reading the Hoodie table Nov 24, 2022
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@fuyun2024
Copy link
Contributor Author

@alexeykudinkin Do you have time to help review this PR?

None
}
val internalSchemaOpt = Try {
specifiedQueryTimestamp.map(schemaResolver.getTableInternalSchemaFromCommitMetadata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fuyun2024 Can you please create a JIRA for this change? This does not look like a minor change.

Here, commit metadata will always be scanned. Previously, it would be scanned only when schema evolution was enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fuyun2024 unfortunately we can't do that -- @nsivabalan is actually currently working on the opposite change that will make sure that Schema Evolution is guarded behind the flag such that we can force it disabled if needs to be (even though it might not be safe for some use-cases, there are cases when this still is necessary)

@codope codope added area:schema Schema evolution and data types priority:high Significant impact; potential bugs labels Nov 29, 2022
@fuyun2024 fuyun2024 changed the title [MINOR] Support for Spark to automatically enable schema evolution when reading the Hoodie table [HUDI-5309] Support for Spark to automatically enable schema evolution when reading the Hoodie table Dec 1, 2022
@fuyun2024 fuyun2024 requested a review from codope December 1, 2022 12:49
@nsivabalan
Copy link
Contributor

@YannByron @xushiyan : is this safe to flip? can either of you review this please.

@YannByron
Copy link
Contributor

@YannByron @xushiyan : is this safe to flip? can either of you review this please.

I agree that enable schema evolution without any config. Looks good.

Copy link
Contributor

@alexeykudinkin alexeykudinkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsivabalan to chime in, since you have a PR doing the opposite

None
}
val internalSchemaOpt = Try {
specifiedQueryTimestamp.map(schemaResolver.getTableInternalSchemaFromCommitMetadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fuyun2024 unfortunately we can't do that -- @nsivabalan is actually currently working on the opposite change that will make sure that Schema Evolution is guarded behind the flag such that we can force it disabled if needs to be (even though it might not be safe for some use-cases, there are cases when this still is necessary)

@fuyun2024
Copy link
Contributor Author

Thanks for your time, I will close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:schema Schema evolution and data types priority:high Significant impact; potential bugs

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants