Skip to content

SQL-Procedure -> archive_commits #18429

@lucabem

Description

@lucabem

Bug Description

What happened:
I was using archive_commits with retain_commits = 200 due to multi-writer approach into one table. I have noticed that this parameter in doc says:

Archiving of instants is batched in best-effort manner, to pack more instants into a single archive log. This config controls such archival batch size.

But if you check the code, the HoodieWriteConfig is built as

    HoodieWriteConfig config = HoodieWriteConfig.newBuilder().withPath(basePath)
        .withArchivalConfig(HoodieArchivalConfig.newBuilder().archiveCommitsWith(minCommits, maxCommits).build())
        .withCleanConfig(HoodieCleanConfig.newBuilder().retainCommits(commitsRetained).build())
        .withEmbeddedTimelineServerEnabled(false)
        .withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(enableMetadata).build())
        .build();

I think it must map the conf COMMITS_ARCHIVAL_BATCH_SIZE, because current logs shows

2026-03-31 11:15:39.232 [Thread-10] WARN  org.apache.hudi.config.HoodieWriteConfig - Increase hoodie.keep.min.commits=4 to be greater than hoodie.cleaner.commits.retained=200 (there is risk of incremental pull missing data from few instants based on the current configuration). The Hudi archiver will automatically adjust the configuration regardless.

but code must point

  public static final ConfigProperty<String> COMMITS_ARCHIVAL_BATCH_SIZE = ConfigProperty
      .key("hoodie.commits.archival.batch")
      .defaultValue(String.valueOf(10))
      .markAdvanced()
      .withDocumentation("Archiving of instants is batched in best-effort manner, to pack more instants into a single"
          + " archive log. This config controls such archival batch size.");

Environment

Hudi version:
Query engine: (Spark/Flink/Trino etc)
Relevant configs:

Logs and Stack Trace

No response

Metadata

Metadata

Assignees

Labels

type:bugBug reports and fixes

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions