Skip to content

[WIP] Add backup engine URL()#99626

Draft
vitlibar wants to merge 1 commit intoClickHouse:masterfrom
vitlibar:url-backup-engine
Draft

[WIP] Add backup engine URL()#99626
vitlibar wants to merge 1 commit intoClickHouse:masterfrom
vitlibar:url-backup-engine

Conversation

@vitlibar
Copy link
Member

@vitlibar vitlibar commented Mar 16, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Add backup engine URL().
Uploads and downloads files similar to table function url():

BACKUP ... TO URL(url)
RESTORE ... FROM URL(url)

(the implementation is incomplete, don't review yet)

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Mar 16, 2026

Workflow [PR], commit [47daa95]

Summary:


AI Review

Summary

This PR adds a new URL backup engine with HTTP(S) reader/writer support and registration in BackupFactory. The implementation is close, but I found three correctness/safety issues that should be fixed before merge: non-strict positional argument parsing, missing remote_host_filter enforcement on write/delete paths, and swallowed delete exceptions that can mask cleanup failures.

Missing context

  • ⚠️ No CI test results or benchmark output were provided in this review context.

Findings

❌ Blockers

  • [src/Backups/BackupIO_URL.cpp:216] writeFile issues outbound HTTP requests without remote_host_filter checks, while read paths enforce host filtering. This creates an SSRF/policy-bypass gap for BACKUP ... TO URL(...).
    Suggested fix: enforce host filtering for write targets (at writer construction and/or per-file URL before request creation).

  • [src/Backups/BackupIO_URL.cpp:267] removeFile catches all exceptions and only logs warnings. This can make corrupted-backup cleanup report success when delete requests actually failed.
    Suggested fix: propagate delete failures (except explicit 404/not-found cases if intended as non-fatal).

⚠️ Majors

  • [src/Backups/registerBackupEngineURL.cpp:82] In positional args mode, arity 2 is accepted but args[1] is silently ignored because credentials are read only when args.size() >= 3.
    Suggested fix: require exactly 1 or 3 args, consistent with S3 engine parsing.

ClickHouse Rules

Item Status Notes
Deletion logging ⚠️ Delete failures are logged, but errors are swallowed, so failure semantics are weak
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility ⚠️ URL positional argument contract is inconsistent (2 args accepted but partially ignored)
SettingsHistory.cpp
Safe rollout ⚠️ Host-filter bypass on write/delete path is a security rollout risk
Compilation time

Performance & Safety

  • Missing host-filter enforcement on write/delete URLs is a safety issue (policy bypass).
  • Swallowing delete exceptions weakens failure handling and can leave backup artifacts undeleted while reporting success.

Final Verdict

  • Status: ⚠️ Request changes
  • Minimum required actions:
    • Enforce remote_host_filter for URL backup write/delete operations.
    • Stop suppressing delete exceptions in BackupWriterURL::removeFile (keep only intentional not-found handling if needed).
    • Make positional argument validation strict (1 or 3) for URL engine non-named-collection mode.

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Mar 16, 2026
}
else
{
if (args.empty() || args.size() > 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args.size() == 2 currently passes validation but silently ignores args[1] (username), because credentials are read only when args.size() >= 3.

That makes BACKUP ... TO URL(url, user) look accepted while actually sending unauthenticated requests, which is a correctness/safety footgun.

Please make the arity check strict (like S3): allow only 1 or 3 arguments in the non-named-collection path.

std::unique_ptr<WriteBuffer> BackupWriterURL::writeFile(const String & file_name)
{
auto file_uri = getURIForFile(file_name);
return BuilderWriteBufferFromHTTP(file_uri)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeFile doesn't apply remote_host_filter (reads use .withHostFilter(&context->getRemoteHostFilter()), writes do not).

This lets BACKUP ... TO URL(...) bypass host filtering for outbound write requests, while read paths are restricted.

Please add an explicit host-filter check for write targets (for example before creating the writer, and/or per-file URL in writeFile).

file_uri.toString());
}
}
catch (...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catch (...) swallows every network/protocol exception and only logs a warning.

As a result, backup cleanup can report success even when HTTP DELETE failed, because BackupImpl::tryRemoveAllFiles sees no exception from writer->removeFiles.

Other backup writers (S3, Azure, Disk) don't suppress deletion errors this way. Please propagate failures (except maybe explicit 404/not-found handling).

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Mar 17, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.80% 83.70% -0.10%
Functions 23.90% 23.90% +0.00%
Branches 76.30% 76.30% +0.00%

PR changed lines: PR changed-lines coverage: 3.72% (11/296, 0 noise lines excluded)
Diff coverage report
Uncovered code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant