Skip to content

Fix max_execution_time not being applied for backup/restore#99205

Merged
kssenii merged 3 commits into
masterfrom
backup-restore-propogate-query-limit
Mar 11, 2026
Merged

Fix max_execution_time not being applied for backup/restore#99205
kssenii merged 3 commits into
masterfrom
backup-restore-propogate-query-limit

Conversation

@kssenii
Copy link
Copy Markdown
Member

@kssenii kssenii commented Mar 10, 2026

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix max_execution_time not being applied for backup/restore.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Note

Medium Risk
Changes when/where BACKUP/RESTORE query settings are applied, which can affect timeouts/limits and cancellation behavior for long-running backup/restore operations. Adds new failpoints used by integration tests; low runtime impact unless explicitly enabled.

Overview
Fixes BACKUP/RESTORE query SETTINGS handling so core execution limits (e.g. max_execution_time=0) are applied to the query context before the ProcessList entry and cancellation checker are created, preventing async backups/restores from being cancelled using profile-level timeouts.

Adds pauseable failpoints (backup_pause_on_start, restore_pause_on_start) and an integration regression test that stalls async backup/restore on a node with a short profile timeout to verify the timeout-disable setting is honored.

Written by Cursor Bugbot for commit f36922f. This will update automatically on new commits. Configure here.

@kssenii kssenii requested a review from vitlibar March 10, 2026 15:44
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Comment thread src/Interpreters/InterpreterSetQuery.cpp
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 10, 2026

Workflow [PR], commit [f36922f]

Summary:


AI Review

1) Summary

This PR fixes BACKUP/RESTORE query-setting propagation so core settings (notably max_execution_time) are applied before ProcessList registration, and removes duplicate late application in BackupsWorker; it also adds targeted integration coverage with failpoints. High-level verdict: change is directionally correct and matches the reported root cause, with no clear correctness/safety regressions found in touched code.

2) Missing context (if any)

  • No full Praktika artifact logs were reviewed from this environment (only current check statuses).
  • No performance benchmark artifacts were provided for backup/restore startup path (expected impact appears negligible).

3) Findings (by severity)

  • No Blockers or Majors identified.
  • PR looks good from a correctness/safety standpoint for the touched paths.

4) Tests & Evidence

  • Positives: new integration test test_async_backup_restore_with_max_execution_time_zero validates both async BACKUP and async RESTORE under profile timeout pressure, directly exercising the original failure mode.
  • Negative/error-handling coverage: the new test validates non-failure under prior cancellation conditions; explicit invalid-setting/constraint-failure cases are not added in this PR.
  • Optional follow-up tests: add one case with constrained user profile to verify constraint rejection still occurs when backup/restore SETTINGS contains forbidden core settings; add one ON CLUSTER variant to confirm identical early-setting behavior on distributed path.

5) ClickHouse Compliance Checklist (Yes/No + short note)

  • Data deletions logged? Yes (no new deletion path added).
  • Serialization formats versioned? Yes (no serialization changes).
  • Experimental setting gate present? N/A (bug fix, not new feature).
  • Settings exposed for constants/thresholds? N/A (no new magic thresholds in server code).
  • Backward compatibility preserved? Yes (behavior aligns with explicit query settings semantics).
  • SettingsHistory.cpp updated for new/changed settings? N/A (no settings definitions changed).
  • Existing tests untouched (only additions)? Yes (adds test/config; no test relaxation).
  • Docs/user-facing notes updated? N/A for this targeted bug fix.
  • Core-area change got extra scrutiny? Yes (InterpreterSetQuery + backup worker lifecycle reviewed).

6) Performance & Safety Notes

  • Hot-path impact is minimal: one additional settings extraction/apply step during query setup for ASTBackupQuery, while duplicate application in worker constructors was removed.
  • Memory/concurrency: no new shared mutable state or lock-order changes introduced; failpoints are test-oriented and gated by explicit enable commands.
  • Benchmarks: none provided; if needed, run a minimal benchmark comparing async backup startup latency before/after with and without SETTINGS max_execution_time=0.

7) User-Lens Review

  • Behavior becomes more intuitive: explicit query-level SETTINGS for BACKUP/RESTORE now affect timeout/cancellation from query start, consistent with user expectation.
  • Error/actionability remains adequate; no regression in user-visible diagnostics observed in touched code.

8) Final Verdict

  • Status: Approve
  • Minimum required actions: none.

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 10, 2026
@pamarcos pamarcos self-assigned this Mar 10, 2026
@pamarcos pamarcos self-requested a review March 10, 2026 16:19
Copy link
Copy Markdown
Member

@pamarcos pamarcos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comment here of a tiny improvement we could make, but I'm approving the PR as it is cause I don't think it's worth blocking just because of that

@pamarcos
Copy link
Copy Markdown
Member

Thanks for fixing this, @kssenii 🫂

@kssenii kssenii added pr-must-backport Pull request should be backported intentionally. Use this label with great care! v25.12-must-backport v25.10-must-backport labels Mar 10, 2026
if (!core_settings.empty())
{
context_->checkSettingsConstraints(core_settings, SettingSource::QUERY);
context_->applySettingsChanges(core_settings);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the lines

backup_context->checkSettingsConstraints(backup_settings.core_settings, SettingSource::QUERY);
backup_context->applySettingsChanges(backup_settings.core_settings);
now?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I think no as we do copy existing query context backup_context(Context::createCopy(query_context)), will remove

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the same code also from RestoreStarter then

@vitlibar vitlibar self-assigned this Mar 10, 2026
Comment thread src/Backups/BackupsWorker.cpp Outdated
namespace FailPoints
{
extern const char backup_pause_before_main_loop[];
extern const char restore_pause_before_main_loop[];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small thing about naming - backup and restore processes are not loops, they're linear processes from start to finish

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backup_pause_on_start?

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 11, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.80% 83.80% +0.00%
Functions 23.90% 23.90% +0.00%
Branches 76.30% 76.30% +0.00%

PR changed lines: PR changed-lines coverage: 100.00% (48/48)
Diff coverage report
Uncovered code

@kssenii kssenii added this pull request to the merge queue Mar 11, 2026
Merged via the queue into master with commit 4374426 Mar 11, 2026
322 of 323 checks passed
@kssenii kssenii deleted the backup-restore-propogate-query-limit branch March 11, 2026 22:07
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Mar 11, 2026
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 11, 2026
robot-clickhouse added a commit that referenced this pull request Mar 11, 2026
robot-clickhouse added a commit that referenced this pull request Mar 11, 2026
robot-clickhouse added a commit that referenced this pull request Mar 11, 2026
kssenii added a commit that referenced this pull request Mar 12, 2026
Backport #99205 to 26.2: Fix max_execution_time not being applied for backup/restore
kssenii added a commit that referenced this pull request Mar 12, 2026
Backport #99205 to 26.1: Fix max_execution_time not being applied for backup/restore
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Mar 12, 2026
pull Bot pushed a commit to weiyilai/ClickHouse that referenced this pull request May 18, 2026
Issue: ClickHouse#103324

`BACKUP TABLE t TO S3('uri', 'k', 's') SETTINGS base_backup = S3({backup_name:String}, ...) ASYNC`
threw `Code: 36. BAD_ARGUMENTS Expected literal, got {backup_name:String}` since
PR ClickHouse#99205 (commit `e6721ef7b16`, "Apply settings changes from backup/restore
earlier"). The regression hit users on `26.1.5`, `25.12`, `25.10`, `25.8`,
`26.2`, `26.3`, `26.4` and `master`.

Root cause: PR ClickHouse#99205 added a call to `BackupSettings::fromBackupQuery` (and
`RestoreSettings::fromRestoreQuery`) inside `InterpreterSetQuery::applySettingsFromQuery`
to surface core settings (e.g. `max_execution_time`) before `ProcessList::insert`.
But `fromBackupQuery` unconditionally calls `BackupInfo::fromAST(*query.base_backup_name)`,
which only accepts `ASTLiteral` and throws `BAD_ARGUMENTS` on `ASTQueryParameter`.
`applySettingsFromQuery` is invoked on the client side from `ClientBase::processOrdinaryQuery`
BEFORE `ReplaceQueryParameterVisitor` substitutes parameters, so any
`{name:Type}` placeholder inside `base_backup` triggered the throw. The
destination URI was unaffected because `fromBackupQuery` does not call
`fromAST` on `query.backup_name`.

Fix: introduce two lightweight helpers,
`BackupSettings::extractCoreSettingsFromQuery` and
`RestoreSettings::extractCoreSettingsFromQuery`, that iterate over
`query.settings` and return only the non-backup/restore-specific entries.
They never look at `base_backup_name`, so they are safe to call before
parameter substitution. `InterpreterSetQuery::applySettingsFromQuery` now
uses these helpers. The original PR ClickHouse#99205 motivation is preserved -- core
settings such as `max_execution_time` still reach the context before
`ProcessList::insert`. The existing `BackupsWorker::doBackup` /
`BackupsWorker::doRestore` callers continue to use `fromBackupQuery` /
`fromRestoreQuery`; by the time they run, parameters have been substituted
and `BackupInfo::fromAST` works as before.

Bisect credit and proposed fix sketch from `@fm4v`.

Regression test:
`tests/queries/0_stateless/04213_base_backup_with_query_parameter.sh`
covers both `BACKUP` and `RESTORE` paths plus a sanity check that
`max_execution_time` can still be passed alongside the parameterized
`base_backup` (verifying PR ClickHouse#99205 behaviour). The test fails with
`BAD_ARGUMENTS` against unfixed master and passes against this fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-bugfix Pull request with bugfix, not backported by default pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo v25.10-must-backport v25.12-must-backport

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants