Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-6947] Refactored HoodieSchemaUtils.deduceWriterSchema with many flags #10810

Conversation

geserdugarov
Copy link
Contributor

@geserdugarov geserdugarov commented Mar 4, 2024

Change Logs

Current implementation of HoodieSchemaUtils.deduceWriterSchema is really complicated with many flags used in one place.
To figure out how this method is working, we need to draw directed acyclic graph. Here the current implementation:
before_refactoring

Yellow is used for marking of flags, and purple is used for marking of returned schemas or exceptions.

We couldn't remove any of the used flags without changing processing logic. The only possibility is to refactor deduceWriterSchema() for simplification, and preparing for future removing of deprecated hoodie.datasource.write.reconcile.schema.

This MR proposes to consider the following changed processing schema:
after_refactoring

Impact

Refactored HoodieSchemaUtils.deduceWriterSchema.

Risk level (write none, low medium or high below)

Low.

Documentation Update

No need.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@geserdugarov geserdugarov marked this pull request as ready for review March 4, 2024 11:54
@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Mar 4, 2024
@geserdugarov geserdugarov reopened this Mar 5, 2024
@geserdugarov geserdugarov force-pushed the HUDI-6947-refactored-hoodieschemautils-flags branch from d0f3399 to 137b32a Compare March 5, 2024 08:34
@danny0405
Copy link
Contributor

Hmm, good code quality is what we always in persuit with.

@geserdugarov
Copy link
Contributor Author

geserdugarov commented Mar 6, 2024

@jonvex , hi!
After your last commit I need to update this MR. Updated schema:
after_refactoring_rebased

All conflicts are resolved and forced pushed.

@geserdugarov geserdugarov force-pushed the HUDI-6947-refactored-hoodieschemautils-flags branch from 137b32a to 6e22785 Compare March 6, 2024 12:55
@geserdugarov
Copy link
Contributor Author

Please, look at diffs in a split regime. The main refactoring is in the deduceWriterSchemaWithoutReconcile() method, simplified to one if-else:
https://github.com/geserdugarov/hudi-open-source/blob/6e2278541811793cc208b4da940444103245da36/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSchemaUtils.scala#L116C1-L151C4

@hudi-bot
Copy link

hudi-bot commented Mar 6, 2024

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@jonvex
Copy link
Contributor

jonvex commented Mar 6, 2024

@geserdugarov thanks for the contribution! I will review this today

@jonvex jonvex requested review from jonvex and yihua March 6, 2024 15:57
Copy link
Contributor

@jonvex jonvex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @yihua is going to review as well

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed the changes and LGTM

@yihua yihua merged commit 2e39bfb into apache:master Mar 7, 2024
40 checks passed
@geserdugarov geserdugarov deleted the HUDI-6947-refactored-hoodieschemautils-flags branch March 7, 2024 06:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M PR with lines of changes in (100, 300]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants