Skip to content

[HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners#10942

Closed
wombatu-kun wants to merge 1 commit intoapache:masterfrom
wombatu-kun:HUDI-7526_fix_constructors_for_bulkinsert_sort_partitioners
Closed

[HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners#10942
wombatu-kun wants to merge 1 commit intoapache:masterfrom
wombatu-kun:HUDI-7526_fix_constructors_for_bulkinsert_sort_partitioners

Conversation

@wombatu-kun
Copy link
Contributor

Change Logs

Our constructor for user defined sort partitioner takes in write config, while some of the partitioners used in out of the box sort mode, does not account for it.
Lets fix the sort partitioners to ensure anything can be used as user defined partitioners.
For eg, NoneSortMode does not have a constructor that takes in write config

Impact

none

Risk level (write none, low medium or high below)

none

Documentation Update

none

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Mar 31, 2024

/**
* Constructor to create as UserDefinedBulkInsertPartitioner class via reflection
* @param config HoodieWriteConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example how this partitioner got instantiated and customized?

Copy link
Contributor Author

@wombatu-kun wombatu-kun Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This partitioner will be instantiated when user define write config property hoodie.bulkinsert.user.defined.partitioner.class=org.apache.hudi.execution.bulkinsert.JavaGlobalSortPartitioner.
This constructor will be called via reflection in methods of DataSourceUtils class createUserDefinedBulkInsertPartitioner(HoodieWriteConfig config) and createUserDefinedBulkInsertPartitionerWithRows(HoodieWriteConfig config).
There is nothing to customize in this JavaGlobalSortPartitioner, but, for example, provided writeConfig is used for customization of RowSpatialCurveSortPartitioner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confused because the "customized" HoodieWriteConfig does not really play a role here and it is ignored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in this case HoodieWriteConfig is ignored just because this Partitioner is not configurable at all, but it does not mean that it should not be used as UserDefinedBulkInsertPartitioner.
So I think, the purpose of this task is not to make all BulkInsertPartitioners customizable with HoodieWriteConfig, but only to make them instantiable via reflection with already existing common approach for UserDefinedBulkInsertPartitioner (constructor with HoodieWriteConfig as the only parameter).
@nsivabalan am I right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in this case HoodieWriteConfig is ignored just because this Partitioner is not configurable at all, but it does not mean that it should not be used as UserDefinedBulkInsertPartitioner

That does not make sense for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this fix:
if user wants to use JavaGlobalSortPartitioner and he set hoodie.bulkinsert.user.defined.partitioner.class=org.apache.hudi.execution.bulkinsert.JavaGlobalSortPartitioner, it will not work because this partitioner could not be instantiated via reflection (as it has no constructor with writeConfig parameter).
We create this constructor to add ability to use JavaGlobalSortPartitioner as user defined partitioner just by setting it's class name in writeConfig.

Don't know how to explain more clear. Let's wait for the author's reply.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create this constructor to add ability to use JavaGlobalSortPartitioner as user defined partitioner just by setting it's class name in writeConfig.

But does it work correctly with the write config being igored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think, yes it does. before the fix this partitioner had only default constructor, it was enough for it's instantiation and working, as it has no referencies to any writeconfig params

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danny0405 can you please assign this PR to @nsivabalan to summon him to this discussion?

@wombatu-kun
Copy link
Contributor Author

@nsivabalan Hi! Sorry to bother you, but you are reporter of this task. Could you please review my PR?
Or close the PR if i totally misunderstood the task and did it wrong.

@wombatu-kun wombatu-kun force-pushed the HUDI-7526_fix_constructors_for_bulkinsert_sort_partitioners branch from ea11f68 to 55fb13f Compare May 15, 2024 04:56
…re we could use it as user defined partitioners
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants