[MINOR] Fix default config values if not specified in MultipleSparkJobExecutionStrategy#9625
Conversation
|
@voonhous Please check the CI failures. |
|
Please chech the CI failures. |
c8d5378 to
2136b10
Compare
|
The affected tests
Since I changed the default config to be aligned with the global config of it to true, the tests started failing. As such, I have fixed the test by overriding it back to false in the test. Will open a separate PR to fix sorting for native row writers when performing clustering for ConsistentBucketClustering. |
2136b10 to
0847642
Compare
|
Alright, added comments for future devs whom are writing tests around this area. |
|
Looks like Azure CI still fails. I triggered a rerun. |
|
@voonhous There are still a lot of CI failures. Could you check them? |
|
@yihua @yihua Looked through the CI failures, they seem to be errors when trying to invoke the RowWriter implementation when performing clustering. Prior to this change, the existing tests are using the RDD implementation. But due to the mismatch in configs, the RowWriter implementation was not really tested for all tests invoking clustering. Since this is a "[MINOR]" PR fix, i will add configs in the affected tests to ensure that they use the RDD implementation. We can create another PR to increase the coverage of the clustering writers after this. |
Sounds good. |
239190a to
51a71ae
Compare
49743c6 to
1655dcc
Compare
1655dcc to
c84d093
Compare
|
I triggered the rerun of CI now. CI is flaky recently. |
|
@hudi-bot run azure |
|
@yihua @danny0405 @pratyakshsharma Me RN: |
|
+1. @voonhous Have you created follow-up JIRAs to fix the row writer in relevant write flows? |
Nope, no follow-up JIRAs to fix the tests yet. |
| Stream<HoodieData<WriteStatus>> writeStatusesStream = FutureUtils.allOf( | ||
| clusteringPlan.getInputGroups().stream() | ||
| .map(inputGroup -> { | ||
| if (getWriteConfig().getBooleanOrDefault("hoodie.datasource.write.row.writer.enable", false)) { |
There was a problem hiding this comment.
@yihua : this was intentionally kept it as false (default value).
So only if user explicitly enabled row writer, we will enable row writer w/ clustering.
The default values for the configs below are incorrect: 1. hoodie.datasource.write.row.writer.enable 2. hoodie.clustering.preserve.commit.metadata (getPreserveHoodieMetadata) The default values are not loaded from `#defaultVal` as the configurations are defined in a module-scope that is inaccessible by the current scope. This is why config keys are defined as string here. This commit fixes these inconsistencies first. Subsequent refactoring might be required to move these config-keys to a scope that is accessible by all other (relevant) modules. **Note:** The existing test coverage does not cover clustering performed using the RowWriter API. Only RDD API is included as of now. Co-authored-by: voon <voonhou.su@shopee.com>

Change Logs
The default values for the configs below are incorrect:
The default values are not loaded from
#defaultValas the configurations are defined in a module-scope that is inaccessible by the current scope. This is why config keys are defined as string here.Raising a PR to fix these inconsistencies first. Subsequent refactoring might be required to move these config-keys to a scope that is accessible by all other (relevant) modules.
Note: The existing test coverage does not cover clustering performed using the RowWriter API. Only RDD API is included as of now.
Impact
None - correctness + ease of debugging through consistency
Risk level (write none, low medium or high below)
None
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist