Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-5296] Allow disable schema on read after enabling #7421

Merged

Conversation

nsivabalan
Copy link
Contributor

Change Logs

If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.

Impact

Users can now disable schema on read feature if need be.

Testing:
Manually tested that we could able to read a table which had schema on read enabled. after disabling, upserts succeed.

Risk level (write none, low medium or high below)

low.

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@nsivabalan
Copy link
Contributor Author

@alexeykudinkin : you can review the patch. for some reason, I could not update the older patch #7333 and so have created a new one.

@nsivabalan nsivabalan added priority:blocker release-0.12.2 Patches targetted for 0.12.2 labels Dec 9, 2022
Copy link
Contributor

@alexeykudinkin alexeykudinkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, minor comments.

@nsivabalan did we check other places handling Schema Evolution that they are controlled by feature-flag now?

@nsivabalan
Copy link
Contributor Author

@alexeykudinkin : addressed all comments

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

} else {
None
} catch {
case _: Exception => None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do vaguely remember that there were a lot of noise coming from TableScheamResolver, but can't recollect exactly what that was. Do you have context on what those are?

In general swallowing exceptions in a blanket fashion w/o a log like that it's not a good idea (it's fine for particular exception which is not an issue, but not for all exceptions)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to address in this one, we can take as a follow-up

@alexeykudinkin alexeykudinkin merged commit aacfe6d into apache:master Dec 12, 2022
nsivabalan added a commit that referenced this pull request Dec 14, 2022
If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.
alexeykudinkin pushed a commit that referenced this pull request Dec 14, 2022
If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:blocker release-0.12.2 Patches targetted for 0.12.2
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants