New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-6863] Revert auto-tuning of dedup parallelism #9722
Conversation
Lets revisit the problems 6802 was tackliing. Main issue it was addressing is, making our shuffle parallelism dynamic and relative to the incoming df's num partitions. So, if someone is running 1000s of pipelines, they don't need to statically set the right value for shuffle parallelism for each of the 1000 pipelines. can you help me understand whats the issue we are hitting that warrants us to revert it? |
This PR does not revert the dynamic determination of the shuffle parallelism. The decided target shuffle parallelism is passed in with " |
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 minor comments. source code changes looks good.
Change Logs
Before this PR, the auto-tuning logic for dedup parallelism dictates the write parallelism so that the user-configured
hoodie.upsert.shuffle.parallelism
is ignored. This PR reverts #6802 to fix the issue.Impact
Performance fix
Risk level
low
Documentation Update
N/A
Contributor's checklist