Skip to content

Use restore Keeper retries for restored parts#104610

Merged
pamarcos merged 3 commits into
ClickHouse:masterfrom
pamarcos:restore-retry-restored-parts
May 19, 2026
Merged

Use restore Keeper retries for restored parts#104610
pamarcos merged 3 commits into
ClickHouse:masterfrom
pamarcos:restore-retry-restored-parts

Conversation

@pamarcos
Copy link
Copy Markdown
Member

RESTORE of replicated tables restores data by attaching parts through the same
sink path used by inserts. That path used insert_keeper_max_retries, so a
temporary Keeper outage or replica read-only period could fail a long-running
restore even when backup_restore_keeper_max_retries was configured to tolerate
it.

Pass the restore Keeper retry settings into restored part attachment, so the
data phase follows the same retry budget as the rest of RESTORE.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed RESTORE of replicated MergeTree tables so restored part attachment uses backup_restore_keeper_max_retries instead of the regular insert Keeper retry budget.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

RESTORE attaches replicated parts through the insert sink, which used the regular insert Keeper retry budget.

Pass the restore retry settings into that path so transient Keeper failures do not fail RESTORE prematurely.
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 11, 2026

Workflow [PR], commit [7d6b959]

Summary:


AI Review

Summary

This PR routes restored-part attachment in replicated MergeTree restore flows through backup_restore_keeper_* retry settings by threading ZooKeeperRetriesInfo from RestorerFromBackup into ReplicatedMergeTreeSink. The implementation is coherent across touched call sites, and the new stateless test exercises both the failing (backup_restore_keeper_max_retries = 0) and succeeding (backup_restore_keeper_max_retries = 1) paths with a targeted failpoint.

Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label May 11, 2026
Comment thread src/Storages/MergeTree/MergeTreeData.h Outdated
Forward declare ZooKeeperRetriesInfo instead of including the full Keeper retries header
from the high fan-out MergeTreeData header.
@CheSema CheSema self-assigned this May 12, 2026
@pamarcos pamarcos marked this pull request as ready for review May 12, 2026 16:32
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 12, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 91.10% 91.10% +0.00%
Branches 76.50% 76.60% +0.10%

Changed lines: 100.00% (61/61) | lost baseline coverage: 17 line(s) · Uncovered code

Full report · Diff report

@pamarcos pamarcos added this pull request to the merge queue May 19, 2026
Merged via the queue into ClickHouse:master with commit 52750b3 May 19, 2026
325 of 326 checks passed
@pamarcos pamarcos deleted the restore-retry-restored-parts branch May 19, 2026 16:34
@pamarcos pamarcos added pr-must-backport Pull request should be backported intentionally. Use this label with great care! v25.8-must-backport labels May 19, 2026
robot-ch-test-poll1 added a commit that referenced this pull request May 19, 2026
Cherry pick #104610 to 26.2: Use restore Keeper retries for restored parts
robot-ch-test-poll1 added a commit that referenced this pull request May 19, 2026
Cherry pick #104610 to 26.3: Use restore Keeper retries for restored parts
robot-ch-test-poll1 added a commit that referenced this pull request May 19, 2026
Cherry pick #104610 to 26.4: Use restore Keeper retries for restored parts
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label May 19, 2026
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label May 19, 2026
robot-ch-test-poll1 added a commit that referenced this pull request May 20, 2026
Cherry pick #104610 to 25.8: Use restore Keeper retries for restored parts
pamarcos added a commit that referenced this pull request May 20, 2026
Backport #104610 to 26.4: Use restore Keeper retries for restored parts
pamarcos added a commit that referenced this pull request May 20, 2026
Backport #104610 to 26.3: Use restore Keeper retries for restored parts
pamarcos added a commit that referenced this pull request May 20, 2026
Backport #104610 to 26.2: Use restore Keeper retries for restored parts
@robot-ch-test-poll robot-ch-test-poll added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label May 20, 2026
pamarcos added a commit that referenced this pull request May 20, 2026
Backport #104610 to 25.8: Use restore Keeper retries for restored parts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-bugfix Pull request with bugfix, not backported by default pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo v25.8-must-backport

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants