Core: Support Hadoop bulk delete API. by steveloughran · Pull Request #15436 · apache/iceberg

steveloughran · 2026-02-24T19:06:03Z

Uses of Hadoop 3.4.0+ BulkDelete API so that S3 object deletions can be done in pages of objects, rather than one at a time.

Configuration option "iceberg.hadoop.bulk.delete.enabled" to switch to bulk deletes

All code to use the API is in BulkDeleter.java, which also contains a probe for the availability of the operation. This is to ensure that there's no accidental use of the method

Reflection-based used of Hadoop 3.4.1+ BulkDelete API so that S3 object deletions can be done in pages of objects, rather than one at a time. * Configuration option "iceberg.hadoop.bulk.delete.enabled" to switch to bulk deletes.

steveloughran · 2026-02-24T19:09:51Z

There's something else to consider here. Do we need full reflection given the method is available at compile time? Instead, only use the operations if enabled, catch link failures and report better.

then there'd be spark tests where 4.0 and 4.1 verify the operation is there, 3.x expect failure when requested.

Uses the API directly in iceberg-core, which is compiled at hadoop 3.4.3 But this is isolated to one class, org.apache.iceberg.hadoop.BulkDeleter, which is only loaded when bulk delete is enabled with "iceberg.hadoop.bulk.delete.enabled" There's no attempt at a graceful fallback. If it is enabled and not found, bulk delete will fail.

This is done with a new class in iceberg-spark 3.5

This is done by mocking the CNFE failure condition in the safety probe, allowing tests to point to a nonexistent class. As a result it verifies that * if the file isn't found bulk delete fails meaningfully, * the api isn't used. Ideally tests would be run in the spark 3.4/3.5 modules but their classpath still pulls in hadoop-3.4.3 and it'd be hard work to remove.

Core: Support Hadoop bulk delete API.

80e9d47

Reflection-based used of Hadoop 3.4.1+ BulkDelete API so that S3 object deletions can be done in pages of objects, rather than one at a time. * Configuration option "iceberg.hadoop.bulk.delete.enabled" to switch to bulk deletes.

steveloughran marked this pull request as draft February 24, 2026 19:06

github-actions bot added core docs labels Feb 24, 2026

steveloughran added 4 commits February 25, 2026 18:58

Testing of bulk delete rejection on older hadoop versions

22eb499

This is done with a new class in iceberg-spark 3.5

Big review of BulkDeleter design and new test case added

e7750fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: Support Hadoop bulk delete API.#15436

Core: Support Hadoop bulk delete API.#15436
steveloughran wants to merge 5 commits intoapache:mainfrom
steveloughran:pr/12055-bulk-delete-2026

steveloughran commented Feb 24, 2026 •

edited

Loading

Uh oh!

steveloughran commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

steveloughran commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveloughran commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

steveloughran commented Feb 24, 2026 •

edited

Loading