Skip to content

Restrict user-controllable S3 credential resolution#104889

Draft
pamarcos wants to merge 33 commits into
ClickHouse:masterfrom
pamarcos:harden-s3-env-credentials
Draft

Restrict user-controllable S3 credential resolution#104889
pamarcos wants to merge 33 commits into
ClickHouse:masterfrom
pamarcos:harden-s3-env-credentials

Conversation

@pamarcos
Copy link
Copy Markdown
Member

@pamarcos pamarcos commented May 13, 2026

Restricts user-controlled S3 credential resolution so SQL-visible S3 entry points cannot inherit server-managed credentials from environment/IMDS/IRSA, top-level <s3> config, or stale endpoint-scoped config.

The change covers s3(), the S3 engine, S3Queue through shared configuration, S3 named collections, dynamic S3 disks, BACKUP TO S3, and Glue catalog creation. It also redacts AWS ARNs from S3 exceptions and makes config reload drop revoked endpoint credentials instead of preserving stale clients.

Closes https://github.com/ClickHouse/clickhouse-private/issues/54202

Changelog category (leave one):

  • Critical Bug Fix (crash, data loss, RBAC)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Tightened validation of S3 credentials provided through user SQL to prevent S3 table functions, engines, named collections, dynamic disks, backups, and Glue catalogs from reusing server-managed credentials.

pamarcos added 10 commits May 13, 2026 16:52
Reject user-controlled `use_environment_credentials` and `role_arn` paths that could make
S3 clients use server environment credentials. Also redact AWS ARNs in S3 error messages.
Do not depend on whether a named collection came from SQL because user overrides of
config collections still enter the S3 named collection path. Apply the
`use_environment_credentials` and `role_arn` restrictions uniformly.
Replace the `role_arn` output parameter with an explicit result struct so
callers can read whether `extra_credentials` was present and whether it supplied
`role_arn` without mixing parsing metadata into the argument list.
The previous version dropped the `PreformattedMessage` and only forwarded
the sanitized text to `Exception`, so anonymized error reporting
(`Exception::callback`, `ErrorCodes` statistics, `message_format_string`)
stopped working for `S3Exception`. Keep the `PreformattedMessage` shape and
sanitize only the `text` field.
`object_storage_type` also accepts `s3_plain`, `s3_with_keeper`, and
`s3_plain_rewritable` (registered in `ObjectStorageFactory`), so the
previous exact `value_str == "s3"` check let a user write
`disk(object_storage_type=s3_plain, use_environment_credentials=1, ...)`
and bypass the hardening. Match `s3` and any `s3_*` variant.
The previous gtest only checked a single ARN in the middle of a message.
Add cases for: passthrough with no ARN, empty input, ARN at start and at
end of the message, multiple ARNs in one message, and the degenerate
`arn:` token.
Replace the previous SQL test with a shell test that scopes every
global name (named collections, table names, disk names) to
`$CLICKHOUSE_DATABASE`, so several invocations of the test can run in
parallel without colliding on shared state. Cover additional vectors
from the underlying ClickHouse Cloud finding: dynamic-disk subtypes
`s3_plain`, `s3_with_keeper`, `s3_plain_rewritable`; `disk()` with
`role_arn` but only one of the keys; nested `disk(cache, disk=disk(s3))`.
Also include positive cases that exercise the same code path with
`DESCRIBE TABLE` so we never touch the network.
`hasExplicitS3Credentials` looked at the merged `auth_settings` after the
global `<s3>` section and any endpoint-scoped config block had already been
loaded. An on-prem operator that configures static admin keys in `<s3>`
would therefore implicitly authorize any user-supplied `role_arn` for STS
`AssumeRole`, signed with the admin keys. Track key provenance instead so
the check only counts keys supplied in the same SQL scope as the user-
supplied `role_arn`. Applies to both the `s3()` table function and S3 named
collections.
…arn`

`BACKUP TO S3` builds its own `S3Client` outside of `StorageS3Configuration`,
so the hardening checks added there do not apply to backups. Reject
`use_environment_credentials = 1` in named-collection backups and reject
any user-supplied `role_arn` that is not paired with `access_key_id` and
`secret_access_key` in the same SQL scope, for both the named-collection
and the positional-args entry points.
Add stateless cases for the variants the new hardening introduces:
named collection with `role_arn` and only `access_key_id` (no secret),
`s3()` with one positional key empty alongside `extra_credentials(role_arn=...)`,
and four `BACKUP TO S3` variants (positional, partial keys, named
collection with `role_arn`, named collection with `use_environment_credentials = 1`).
Add an integration test under `test_s3_credentials_hardening` that puts
static admin keys in the top-level `<s3>` config block and asserts that
user SQL cannot reuse those keys via `role_arn`. This scenario cannot be
expressed in a stateless test because it depends on server-config state.
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 13, 2026

Workflow [PR], commit [0a30170]

Summary:


AI Review

Summary

This PR hardens user-visible S3 entry points so SQL cannot inherit server-managed credentials from environment providers, top-level <s3> config, or stale endpoint-scoped config. The direction is right, but the current patch still leaves one credential revocation path open and introduces a regression for internal BACKUP TO S3 work.

Findings

❌ Blockers

  • [src/Disks/DiskObjectStorage/ObjectStorages/S3/S3ObjectStorage.cpp:714-715] [dismissed by author -- https://github.com/Restrict user-controllable S3 credential resolution #104889#discussion_r3317042809] S3ObjectStorage::applyNewSettings still treats the whole auth payload as static when the SQL/catalog supplied any credential. If a table is created with user keys while a trusted endpoint also injects a header, role_arn, SSE credential, or other auth field, endpoint removal later reaches copyCredentialsFrom(*current_settings) and preserves that endpoint-derived field in memory. Endpoint credential revocation therefore does not take effect until restart/drop for mixed-provenance clients. The fix needs field-level provenance or an equivalent split: preserve user/catalog credentials, but replace or clear endpoint-derived credential fields on reload.

⚠️ Majors

  • [src/Backups/BackupIO_S3.cpp:226] applyEndpointCredentialsOrReset clears credentials even when ignore_user is true, which is the path used for internal BACKUP work via is_internal_backup. Before this change, a non-user-controlled internal backup with no explicit S3 keys could use the loaded top-level <s3> credentials; now, without a matching endpoint block, the reset leaves makeS3Client with empty credentials. Preserve the loaded <s3> credentials for ignore_user / is_internal_backup, or add a separate internal-backup credential path, and reset only for user-controlled backups.
Tests

⚠️ Add a focused reload test for the mixed-provenance case: create an S3 table with user credentials while an endpoint config injects a credential-bearing field, remove the endpoint config, run SYSTEM RELOAD CONFIG, and verify only the user credentials remain.

⚠️ Add coverage for internal BACKUP TO S3 with top-level <s3> credentials and no explicit keys, so the hardening does not regress the non-user-controlled backup path.

Final Verdict

Status: ❌ Block

Minimum required actions: fix the mixed-provenance endpoint credential revocation path, preserve or explicitly rework internal BACKUP credential handling, and add the two focused regression tests above.

@clickhouse-gh clickhouse-gh Bot added pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! labels May 13, 2026
Comment thread src/Disks/getDiskConfigurationFromAST.cpp Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens ClickHouse’s S3 integration to prevent user-provided SQL (table function/engine args, named collections, dynamic disks, and BACKUP TO S3) from steering AWS credential resolution toward server-managed credentials (env/IMDS/IRSA or operator-configured <s3> keys), and adds ARN redaction in S3-related error messages.

Changes:

  • Add parse-time validation to reject user-controllable use_environment_credentials and to require explicit user-supplied key pairs when role_arn is provided via SQL scope.
  • Extend the same restrictions to dynamic disk configurations for all S3-backed disk types and to BACKUP TO S3 (positional args + named collections).
  • Add AWS ARN redaction for S3Exception messages plus unit/integration coverage for both hardening and sanitization.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/queries/0_stateless/04235_s3_user_credentials_hardening.sh Stateless coverage for rejecting env creds / unauthorized role_arn across table function, named collections, dynamic disks, and backup.
tests/queries/0_stateless/04235_s3_user_credentials_hardening.reference Expected output for the stateless test’s “allowed” cases.
tests/integration/test_s3_credentials_hardening/test.py Integration regression test ensuring user SQL cannot piggy-back on server <s3> credentials for STS AssumeRole.
tests/integration/test_s3_credentials_hardening/configs/config.d/admin_s3.xml Test config injecting “admin” <s3> keys to validate the privilege-escalation scenario.
tests/integration/test_s3_credentials_hardening/init.py Initializes the new integration test package.
src/Storages/ObjectStorage/S3/Configuration.h Changes credential-collection API to return structured info about what user supplied.
src/Storages/ObjectStorage/S3/Configuration.cpp Implements the new hardening checks for named collections and table function arguments, including role_arn gating.
src/IO/tests/gtest_s3_uri.cpp Unit tests for AWS ARN sanitization behavior.
src/IO/S3Common.h Declares sanitizer helpers and wires sanitization into S3Exception.
src/IO/S3Common.cpp Implements ARN redaction for S3 error messages and preformatted exception messages.
src/IO/S3/Credentials.cpp Sanitizes STS AssumeRole failure log messages.
src/Disks/getDiskConfigurationFromAST.cpp Rejects env creds and unauthorized role_arn in dynamic S3 disk configs.
src/Backups/registerBackupEngineS3.cpp Applies the same restrictions to BACKUP TO S3 for named collections and extra_credentials(...).

Comment thread src/IO/S3Common.cpp
Comment thread src/IO/S3/Credentials.cpp Outdated
pamarcos added 2 commits May 14, 2026 11:49
When the dynamic `disk()` function is given `type`/`object_storage_type` via
`from_env`/`from_zk` (or augmented by an `include` directive), the literal
`s3`/`s3_*` token never appears in this function. The previous code only
checked for the literal value, so `is_s3_configuration` stayed `false` and
the new `use_environment_credentials` / `role_arn` ACCESS_DENIED checks were
skipped — even though the disk could still resolve to S3 after substitution.

Fail closed for indirect type sources: track when `type` is set via
`from_env`/`from_zk`, and when `include` is present, and apply the S3
credential checks in those cases too.

Also tighten the `role_arn` authorization check: only treat `access_key_id`
and `secret_access_key` as user-supplied keys when they are literals. Indirect
`from_env`/`from_zk` values must not authorize a user-controlled `role_arn`,
since they could resolve to credentials configured by the operator.

The stateless test `04235_s3_user_credentials_hardening` is extended with
coverage for each bypass scenario and tagged `no-fasttest` (the fast-test
build is compiled without S3 support, which made the existing positive cases
fail with `Unknown table function s3`).

Addresses review feedback on ClickHouse#104889
`sanitizeS3PreformattedMessage` previously only redacted ARNs in `text`. The
`format_string_args` vector — which holds stringified argument values — was
left untouched. Those args are propagated into structured logs and telemetry
(`Exception::message_format_string_args`,
`system.query_log.exception_format_string_args`), so an ARN passed as a
format argument to an `S3Exception` would still leak through that channel
even after the human-readable message was redacted.

Sanitize every element of `format_string_args` too. `format_string` stays
untouched: it is a literal template (e.g. `"{}"`) that never contains
user-supplied data, and preserving it keeps anonymized error reporting
(`Exception::callback`, `ErrorCodes` statistics) working for
`S3Exception`. The header docstring is updated to match.

Also fix a typo in the STS log line in
`AwsAuthSTSAssumeRoleCredentialsProvider::Reload`: the code path is
`AssumeRole`, not "AssumeRule". Making the spelling correct keeps logs
searchable.

The `gtest_s3_uri.SanitizePreformattedMessage` test covers both the
args-sanitization and the `format_string` passthrough.

Addresses Copilot review comments on
ClickHouse#104889
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated no new comments.

pamarcos added 2 commits May 18, 2026 19:40
Do not let user-controlled `S3` URLs inherit top-level `<s3>` credentials or the AWS environment provider chain unless a matching endpoint-scoped config exists. Preserve the parsed credential set when `S3ObjectStorage` refreshes settings so a later config update cannot reintroduce top-level credentials.

Force new dynamic `S3` disk configs to serialize `use_environment_credentials` as false when omitted, and reject Glue `aws_role_arn` unless Glue access keys are provided in the same settings.
Comment thread src/Databases/DataLake/DatabaseDataLake.cpp Outdated
`GlueCatalog::GlueCatalog` configured the AWS credential chain with
`use_environment_credentials = true`. Combined with the fact that
`DatabaseDataLake` is only reachable from user SQL (`CREATE DATABASE ENGINE =
DataLakeCatalog(...)` with `catalog_type = 'glue'`), this left a
user-controlled inheritance path open: a user could create a Glue database
without `aws_role_arn`, without explicit `aws_access_key_id` /
`aws_secret_access_key`, and the resulting AWS client would fall back to the
server's environment / IMDS / IRSA / STS credentials.

This is the same pattern the rest of this PR forbids elsewhere (`s3()`, S3
named collections, dynamic disks, `BACKUP TO S3`); the existing Glue check
only blocked the narrower `aws_role_arn` + missing-keys case.

Fail closed at validation time: `DatabaseDataLake::validateSettings` now
requires user-supplied `aws_access_key_id` and `aws_secret_access_key` for
Glue catalog. As defense in depth, also flip
`creds_config.use_environment_credentials` to `false` in `GlueCatalog` so
that even if validation is bypassed in the future, the AWS client cannot
silently inherit server credentials.

The integration test helper `create_clickhouse_glue_database` is updated to
forward the test minio keys as settings (in addition to the positional engine
args, which only authenticate the underlying object storage reads) when the
caller opts into credentials. A new test `test_glue_requires_explicit_keys`
covers the negative case where neither `aws_role_arn` nor explicit keys are
provided.

Addresses review feedback on ClickHouse#104889
Comment thread src/Databases/DataLake/DatabaseDataLake.cpp Outdated
pamarcos added 3 commits May 26, 2026 12:54
Limit the new `DatabaseDataLake::validateSettings` Glue credential checks to fresh metadata. Existing metadata still loads with the previous Glue credential chain so startup does not reject pre-existing `DataLakeCatalog` databases after upgrade.

Addresses review on ClickHouse#104889.
Clear configured S3 auth headers and encryption credentials when preparing a user-controlled S3 request with no matching endpoint config. This prevents top-level `<s3>` secrets from being sent to arbitrary URLs while preserving endpoint-scoped credentials.

The hardening integration mock now detects leaked top-level headers. See ClickHouse#104889.
Use endpoint-scoped server config where tests intentionally exercise environment credentials, and pass explicit user keys where tests exercise `extra_credentials(role_arn = ...)`.

Also update the Glue e2e credential-mode test to assert that fresh Glue catalogs cannot rely on server AWS environment credentials. See ClickHouse#104889.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated no new comments.

@pamarcos
Copy link
Copy Markdown
Member Author

Addressed latest CI failures and pushed fixes.

Commits:

  • 0ab4bcfc388: refresh matched endpoint-scoped S3 credentials instead of preserving stale sanitized credentials for trusted endpoint config.
  • 38da160acc4: add explicit Glue keys to test_sts_credential_refresh_on_expired_token, matching the new aws_role_arn hardening contract.

Verification:

  • cmake --build build -- clickhouse: passed.
  • test_s3_access_headers/test.py::test_custom_access_header[test_access_over_custom_header]: passed.
  • test_s3_access_headers/test.py::test_custom_access_header --count 4: 12 passed.
  • test_storage_s3/test.py::test_custom_auth_headers: passed.
  • test_storage_s3/test.py::test_custom_auth_headers --count 3: 3 passed.
  • test_database_glue/test.py::test_sts_credential_refresh_on_expired_token: passed.
  • test_database_glue/test.py::test_sts_credential_refresh_on_expired_token --count 3: 3 passed.
  • test_s3_credentials_hardening: 11 passed.
  • test_s3_credentials_hardening --count 4: 44 passed.
  • check_cpp.sh: passed.

various_checks.sh still reports unrelated pre-existing workspace issues: missing __init__.py in unrelated integration test directories and permission errors under old _instances* directories.

Comment thread src/Disks/DiskObjectStorage/ObjectStorages/S3/S3ObjectStorage.cpp Outdated
Refreshable `S3` storages should not keep endpoint-scoped credentials after the endpoint config stops matching the URL. Reset credential-bearing fields on the user-controlled refresh path, and keep endpoint settings above the top-level `s3` config when a trusted endpoint still matches.

Add an integration test covering endpoint add and removal through `SYSTEM RELOAD CONFIG`.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 1 comment.

Comment thread tests/queries/0_stateless/04235_s3_user_credentials_hardening.sh Outdated
Comment thread src/Disks/DiskObjectStorage/ObjectStorages/S3/S3ObjectStorage.cpp
pamarcos added 2 commits May 28, 2026 10:45
Decide whether `S3` object storage credentials are refreshable from the source of the credentials, not from merged settings. User and catalog credentials stay static across reloads, while endpoint-scoped credentials remain refreshable and revocable.

Extend the `test_s3_credentials_hardening` reload test to cover tables created while endpoint credentials already match.
The positive cases in `04235_s3_user_credentials_hardening` should prove that allowed forms still execute locally. Make the helper fail on any non-zero `clickhouse-client` result instead of only looking for `ACCESS_DENIED`.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 2 comments.

Comment thread src/Disks/DiskObjectStorage/ObjectStorages/S3/S3ObjectStorage.cpp Outdated
Comment thread src/Disks/DiskObjectStorage/ObjectStorages/S3/S3ObjectStorage.cpp Outdated
pamarcos added 3 commits May 28, 2026 12:52
When a disk-backed `S3` object storage reloads a still-matching endpoint block, start endpoint credentials from an empty credential surface before applying the fresh endpoint settings. This makes removing credential-bearing endpoint fields revoke the previous values instead of preserving them from the current client.
For refreshable user-controlled `S3` storages, do not merge freshly loaded endpoint settings into the previous credential surface. Clear credential-bearing fields first so removing credentials from a still-matching endpoint block takes effect immediately.

Extend the reload regression test to keep the endpoint block present while removing its credentials.
A matching endpoint block without endpoint-specific credentials inherits the top-level `s3` credentials by design. Keep the regression focused on proving that the old endpoint credentials are gone after reload.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated no new comments.

Resetting user-controlled credentials marks `access_key_id` as changed even when the resulting value is empty. `access_header` must still be attached in that state, otherwise endpoint header reloads silently drop the configured header.

Fixes `test_s3_access_headers/test.py::test_custom_access_header[test_access_over_custom_header]` from the CI report:

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104889&sha=5f436725f9418862170b684d0dab1b6c2eb1c030&name_0=PR

PR: ClickHouse#104889
return;
}

s3_settings.resetCredentialsForUserControlledRequest();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resets credentials even when ignore_user is true, which is the path used for internal BACKUP work (is_internal_backup). Before this change, an internal backup with no explicit S3 keys could use the server-managed top-level <s3> credentials; now, if there is no matching endpoint block, this line clears them and makeS3Client receives empty credentials.

That breaks the non-user-controlled internal backup path while the hardening only needs to fail closed for user SQL. Please preserve the loaded <s3> credentials when ignore_user / is_internal_backup is true, or add an explicit internal-backup credential path, and reset only for user-controlled backups.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 28, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.40% 84.40% +0.00%
Functions 91.40% 91.40% +0.00%
Branches 77.00% 76.90% -0.10%

Changed lines: Changed C/C++ lines covered by tests: 372/400 (93.00%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 10 line(s) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants