Skip to content

Comments

[MINOR] hoodie.datasource.write.row.writer.enable should set to be true.#8200

Closed
renshangtao wants to merge 1 commit intoapache:masterfrom
renshangtao:modify_row_writer_defaultvalue
Closed

[MINOR] hoodie.datasource.write.row.writer.enable should set to be true.#8200
renshangtao wants to merge 1 commit intoapache:masterfrom
renshangtao:modify_row_writer_defaultvalue

Conversation

@renshangtao
Copy link

@renshangtao renshangtao commented Mar 16, 2023

Change Logs

Fix the mismatch between the value of hoodie.datasource.write.row.writer.enable and the document.

Impact

If the user does not find this configuration when sorting within the partition, the desired result will not be obtained.

Risk level

low

Documentation Update

No documentation update needed.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

Copy link
Contributor

@KnightChess KnightChess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it cause error result?

@renshangtao
Copy link
Author

does it cause error result?

Yes, when I tested clustering, I found that the files under the same partition after sorting can only ensure internal order of the files, but there is still no order between the files. The location code only found that this configuration and document settings are inconsistent

@KnightChess
Copy link
Contributor

on, I got it, the default value in config is true. But I think it will not lead to the differences of sorting results

@renshangtao
Copy link
Author

on, I got it, the default value in config is true. But I think it will not lead to the differences of sorting results

You can test it,if the value is false , it will create a RDDCustomColumnsSortPartitioner who's class description is "
A partitioner that does sorting based on specified column values for each RDD partition."

@codope codope added priority:high Significant impact; potential bugs configs labels Mar 20, 2023
clusteringPlan.getInputGroups().stream()
.map(inputGroup -> {
if (getWriteConfig().getBooleanOrDefault("hoodie.datasource.write.row.writer.enable", false)) {
if (getWriteConfig().getBooleanOrDefault("hoodie.datasource.write.row.writer.enable", true)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @nsivabalan .

Good catch. It looks like we cannot use the ConfigProperty directly due to circular dependency. Can you comb the codebase to see if there are similar cases ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess this was intentional. unless user explicitly enables this config, we don't want to enable it for clustering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets also consider issues like #8259 before we can make it default.

@bvaradar
Copy link
Contributor

@renshangtao : Can you create a jira and add it to the PR description so that PR validation can succeed.

@vinothchandar vinothchandar added priority:critical Production degraded; pipelines stalled release-0.14.0 and removed priority:high Significant impact; potential bugs labels Apr 25, 2023
@codope codope changed the title The hoodie.datasource.write.row.writer.enable should set to be true. [MINOR] hoodie.datasource.write.row.writer.enable should set to be true. May 1, 2023
@codope codope force-pushed the modify_row_writer_defaultvalue branch from e0317dc to 6758894 Compare May 1, 2023 11:49
@hudi-bot
Copy link
Collaborator

hudi-bot commented May 1, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@xushiyan
Copy link
Member

xushiyan commented May 19, 2023

on, I got it, the default value in config is true. But I think it will not lead to the differences of sorting results

You can test it,if the value is false , it will create a RDDCustomColumnsSortPartitioner who's class description is " A partitioner that does sorting based on specified column values for each RDD partition."

@renshangtao Both RDDCustomColumnsSortPartitioner and RowCustomColumnsSortPartitioner should sort globally. If you observe sorting issue, then it's a different bug to be fixed. Flipping this default value here is irrelevant to sorting issue

@xushiyan
Copy link
Member

as row-writing for clustering is introduced recently and we want to keep row-writing to false for longer to fully stablize it, will close this for now.

@xushiyan xushiyan closed this May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:critical Production degraded; pipelines stalled release-0.14.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants