Skip to content

refactor: Remove not used classes from org.apache.hudi.spark.internal#18211

Merged
wombatu-kun merged 1 commit intoapache:masterfrom
geserdugarov:master-remove-internal-defaultsource
Feb 17, 2026
Merged

refactor: Remove not used classes from org.apache.hudi.spark.internal#18211
wombatu-kun merged 1 commit intoapache:masterfrom
geserdugarov:master-remove-internal-defaultsource

Conversation

@geserdugarov
Copy link
Contributor

@geserdugarov geserdugarov commented Feb 16, 2026

Describe the issue this Pull Request addresses

#2328 added support of bulk insert using DSv2 for Spark 3.0 at 2020.

Later #8076 added support of bulk insert for insert overwrite (table) by

String targetFormat;
targetFormat = "org.apache.hudi.spark3.internal";
records.write().format(targetFormat)
    .option(DataSourceInternalWriterHelper.INSTANT_TIME_OPT_KEY, instantTime)
    .options(opts)
    .options(customOpts)
    .options(optsOverrides)
    .mode(SaveMode.Append)
    .save();

in DatasetBulkInsertCommitActionExecutor, which calls org.apache.hudi.spark3.internal.DefaultSource.

#13301 moved the code from hudi-spark3-common to hudi-spark-common module.

And then #13360 unified code paths, and switched bulk insert to call of HoodieDatasetBulkInsertHelper::bulkInsert without call of org.apache.hudi.spark.internal.

So, there is no internal calls of org.apache.hudi.spark.internal classes anymore. And all the following pack of classes could be removed:

Not used classes

Summary and Changelog

Removes classes in org.apache.hudi.spark.internal that are not used anymore.

Impact

No impact. First, they are named as internal. Second, there is no .../META-INF/services/org.apache.spark.sql.sources.DataSourceRegister files, which calls org.apache.hudi.spark.internal.*. This means that users couldn't call those classes.

Risk Level

None

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:XL PR with lines of changes > 1000 label Feb 16, 2026
@geserdugarov geserdugarov changed the title refactor: Remove not used classes from org.apache.hudi.spark.internal refactor: Remove not used classes from org.apache.hudi.spark.internal Feb 16, 2026
@geserdugarov
Copy link
Contributor Author

test-hudi-trino-plugin pipeline failed due to not related issue:

TestHudiMinioConnectorSmokeTest>AbstractTestQueryFramework.init:119->createQueryRunner:33 » 
IllegalState Previous attempts to find a Docker environment failed. Will not retry. Please see logs and check configuration

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@wombatu-kun wombatu-kun merged commit dfe3220 into apache:master Feb 17, 2026
74 of 76 checks passed
@wombatu-kun
Copy link
Contributor

nice!

@geserdugarov geserdugarov deleted the master-remove-internal-defaultsource branch February 17, 2026 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL PR with lines of changes > 1000

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments