Restrict user-controllable S3 credential resolution#104889
Conversation
Reject user-controlled `use_environment_credentials` and `role_arn` paths that could make S3 clients use server environment credentials. Also redact AWS ARNs in S3 error messages.
Do not depend on whether a named collection came from SQL because user overrides of config collections still enter the S3 named collection path. Apply the `use_environment_credentials` and `role_arn` restrictions uniformly.
Replace the `role_arn` output parameter with an explicit result struct so callers can read whether `extra_credentials` was present and whether it supplied `role_arn` without mixing parsing metadata into the argument list.
The previous version dropped the `PreformattedMessage` and only forwarded the sanitized text to `Exception`, so anonymized error reporting (`Exception::callback`, `ErrorCodes` statistics, `message_format_string`) stopped working for `S3Exception`. Keep the `PreformattedMessage` shape and sanitize only the `text` field.
`object_storage_type` also accepts `s3_plain`, `s3_with_keeper`, and `s3_plain_rewritable` (registered in `ObjectStorageFactory`), so the previous exact `value_str == "s3"` check let a user write `disk(object_storage_type=s3_plain, use_environment_credentials=1, ...)` and bypass the hardening. Match `s3` and any `s3_*` variant.
The previous gtest only checked a single ARN in the middle of a message. Add cases for: passthrough with no ARN, empty input, ARN at start and at end of the message, multiple ARNs in one message, and the degenerate `arn:` token.
Replace the previous SQL test with a shell test that scopes every global name (named collections, table names, disk names) to `$CLICKHOUSE_DATABASE`, so several invocations of the test can run in parallel without colliding on shared state. Cover additional vectors from the underlying ClickHouse Cloud finding: dynamic-disk subtypes `s3_plain`, `s3_with_keeper`, `s3_plain_rewritable`; `disk()` with `role_arn` but only one of the keys; nested `disk(cache, disk=disk(s3))`. Also include positive cases that exercise the same code path with `DESCRIBE TABLE` so we never touch the network.
`hasExplicitS3Credentials` looked at the merged `auth_settings` after the global `<s3>` section and any endpoint-scoped config block had already been loaded. An on-prem operator that configures static admin keys in `<s3>` would therefore implicitly authorize any user-supplied `role_arn` for STS `AssumeRole`, signed with the admin keys. Track key provenance instead so the check only counts keys supplied in the same SQL scope as the user- supplied `role_arn`. Applies to both the `s3()` table function and S3 named collections.
…arn` `BACKUP TO S3` builds its own `S3Client` outside of `StorageS3Configuration`, so the hardening checks added there do not apply to backups. Reject `use_environment_credentials = 1` in named-collection backups and reject any user-supplied `role_arn` that is not paired with `access_key_id` and `secret_access_key` in the same SQL scope, for both the named-collection and the positional-args entry points.
Add stateless cases for the variants the new hardening introduces: named collection with `role_arn` and only `access_key_id` (no secret), `s3()` with one positional key empty alongside `extra_credentials(role_arn=...)`, and four `BACKUP TO S3` variants (positional, partial keys, named collection with `role_arn`, named collection with `use_environment_credentials = 1`). Add an integration test under `test_s3_credentials_hardening` that puts static admin keys in the top-level `<s3>` config block and asserts that user SQL cannot reuse those keys via `role_arn`. This scenario cannot be expressed in a stateless test because it depends on server-config state.
|
Workflow [PR], commit [0a30170] Summary: ✅
AI ReviewSummaryThis PR hardens user-visible Findings❌ Blockers
Tests
Final VerdictStatus: ❌ Block Minimum required actions: fix the mixed-provenance endpoint credential revocation path, preserve or explicitly rework internal |
There was a problem hiding this comment.
Pull request overview
This PR hardens ClickHouse’s S3 integration to prevent user-provided SQL (table function/engine args, named collections, dynamic disks, and BACKUP TO S3) from steering AWS credential resolution toward server-managed credentials (env/IMDS/IRSA or operator-configured <s3> keys), and adds ARN redaction in S3-related error messages.
Changes:
- Add parse-time validation to reject user-controllable
use_environment_credentialsand to require explicit user-supplied key pairs whenrole_arnis provided via SQL scope. - Extend the same restrictions to dynamic disk configurations for all S3-backed disk types and to
BACKUP TO S3(positional args + named collections). - Add AWS ARN redaction for
S3Exceptionmessages plus unit/integration coverage for both hardening and sanitization.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/queries/0_stateless/04235_s3_user_credentials_hardening.sh | Stateless coverage for rejecting env creds / unauthorized role_arn across table function, named collections, dynamic disks, and backup. |
| tests/queries/0_stateless/04235_s3_user_credentials_hardening.reference | Expected output for the stateless test’s “allowed” cases. |
| tests/integration/test_s3_credentials_hardening/test.py | Integration regression test ensuring user SQL cannot piggy-back on server <s3> credentials for STS AssumeRole. |
| tests/integration/test_s3_credentials_hardening/configs/config.d/admin_s3.xml | Test config injecting “admin” <s3> keys to validate the privilege-escalation scenario. |
| tests/integration/test_s3_credentials_hardening/init.py | Initializes the new integration test package. |
| src/Storages/ObjectStorage/S3/Configuration.h | Changes credential-collection API to return structured info about what user supplied. |
| src/Storages/ObjectStorage/S3/Configuration.cpp | Implements the new hardening checks for named collections and table function arguments, including role_arn gating. |
| src/IO/tests/gtest_s3_uri.cpp | Unit tests for AWS ARN sanitization behavior. |
| src/IO/S3Common.h | Declares sanitizer helpers and wires sanitization into S3Exception. |
| src/IO/S3Common.cpp | Implements ARN redaction for S3 error messages and preformatted exception messages. |
| src/IO/S3/Credentials.cpp | Sanitizes STS AssumeRole failure log messages. |
| src/Disks/getDiskConfigurationFromAST.cpp | Rejects env creds and unauthorized role_arn in dynamic S3 disk configs. |
| src/Backups/registerBackupEngineS3.cpp | Applies the same restrictions to BACKUP TO S3 for named collections and extra_credentials(...). |
When the dynamic `disk()` function is given `type`/`object_storage_type` via `from_env`/`from_zk` (or augmented by an `include` directive), the literal `s3`/`s3_*` token never appears in this function. The previous code only checked for the literal value, so `is_s3_configuration` stayed `false` and the new `use_environment_credentials` / `role_arn` ACCESS_DENIED checks were skipped — even though the disk could still resolve to S3 after substitution. Fail closed for indirect type sources: track when `type` is set via `from_env`/`from_zk`, and when `include` is present, and apply the S3 credential checks in those cases too. Also tighten the `role_arn` authorization check: only treat `access_key_id` and `secret_access_key` as user-supplied keys when they are literals. Indirect `from_env`/`from_zk` values must not authorize a user-controlled `role_arn`, since they could resolve to credentials configured by the operator. The stateless test `04235_s3_user_credentials_hardening` is extended with coverage for each bypass scenario and tagged `no-fasttest` (the fast-test build is compiled without S3 support, which made the existing positive cases fail with `Unknown table function s3`). Addresses review feedback on ClickHouse#104889
`sanitizeS3PreformattedMessage` previously only redacted ARNs in `text`. The
`format_string_args` vector — which holds stringified argument values — was
left untouched. Those args are propagated into structured logs and telemetry
(`Exception::message_format_string_args`,
`system.query_log.exception_format_string_args`), so an ARN passed as a
format argument to an `S3Exception` would still leak through that channel
even after the human-readable message was redacted.
Sanitize every element of `format_string_args` too. `format_string` stays
untouched: it is a literal template (e.g. `"{}"`) that never contains
user-supplied data, and preserving it keeps anonymized error reporting
(`Exception::callback`, `ErrorCodes` statistics) working for
`S3Exception`. The header docstring is updated to match.
Also fix a typo in the STS log line in
`AwsAuthSTSAssumeRoleCredentialsProvider::Reload`: the code path is
`AssumeRole`, not "AssumeRule". Making the spelling correct keeps logs
searchable.
The `gtest_s3_uri.SanitizePreformattedMessage` test covers both the
args-sanitization and the `format_string` passthrough.
Addresses Copilot review comments on
ClickHouse#104889
Do not let user-controlled `S3` URLs inherit top-level `<s3>` credentials or the AWS environment provider chain unless a matching endpoint-scoped config exists. Preserve the parsed credential set when `S3ObjectStorage` refreshes settings so a later config update cannot reintroduce top-level credentials. Force new dynamic `S3` disk configs to serialize `use_environment_credentials` as false when omitted, and reject Glue `aws_role_arn` unless Glue access keys are provided in the same settings.
`GlueCatalog::GlueCatalog` configured the AWS credential chain with `use_environment_credentials = true`. Combined with the fact that `DatabaseDataLake` is only reachable from user SQL (`CREATE DATABASE ENGINE = DataLakeCatalog(...)` with `catalog_type = 'glue'`), this left a user-controlled inheritance path open: a user could create a Glue database without `aws_role_arn`, without explicit `aws_access_key_id` / `aws_secret_access_key`, and the resulting AWS client would fall back to the server's environment / IMDS / IRSA / STS credentials. This is the same pattern the rest of this PR forbids elsewhere (`s3()`, S3 named collections, dynamic disks, `BACKUP TO S3`); the existing Glue check only blocked the narrower `aws_role_arn` + missing-keys case. Fail closed at validation time: `DatabaseDataLake::validateSettings` now requires user-supplied `aws_access_key_id` and `aws_secret_access_key` for Glue catalog. As defense in depth, also flip `creds_config.use_environment_credentials` to `false` in `GlueCatalog` so that even if validation is bypassed in the future, the AWS client cannot silently inherit server credentials. The integration test helper `create_clickhouse_glue_database` is updated to forward the test minio keys as settings (in addition to the positional engine args, which only authenticate the underlying object storage reads) when the caller opts into credentials. A new test `test_glue_requires_explicit_keys` covers the negative case where neither `aws_role_arn` nor explicit keys are provided. Addresses review feedback on ClickHouse#104889
Limit the new `DatabaseDataLake::validateSettings` Glue credential checks to fresh metadata. Existing metadata still loads with the previous Glue credential chain so startup does not reject pre-existing `DataLakeCatalog` databases after upgrade. Addresses review on ClickHouse#104889.
Clear configured S3 auth headers and encryption credentials when preparing a user-controlled S3 request with no matching endpoint config. This prevents top-level `<s3>` secrets from being sent to arbitrary URLs while preserving endpoint-scoped credentials. The hardening integration mock now detects leaked top-level headers. See ClickHouse#104889.
Use endpoint-scoped server config where tests intentionally exercise environment credentials, and pass explicit user keys where tests exercise `extra_credentials(role_arn = ...)`. Also update the Glue e2e credential-mode test to assert that fresh Glue catalogs cannot rely on server AWS environment credentials. See ClickHouse#104889.
|
Addressed latest CI failures and pushed fixes. Commits:
Verification:
|
Refreshable `S3` storages should not keep endpoint-scoped credentials after the endpoint config stops matching the URL. Reset credential-bearing fields on the user-controlled refresh path, and keep endpoint settings above the top-level `s3` config when a trusted endpoint still matches. Add an integration test covering endpoint add and removal through `SYSTEM RELOAD CONFIG`.
Decide whether `S3` object storage credentials are refreshable from the source of the credentials, not from merged settings. User and catalog credentials stay static across reloads, while endpoint-scoped credentials remain refreshable and revocable. Extend the `test_s3_credentials_hardening` reload test to cover tables created while endpoint credentials already match.
The positive cases in `04235_s3_user_credentials_hardening` should prove that allowed forms still execute locally. Make the helper fail on any non-zero `clickhouse-client` result instead of only looking for `ACCESS_DENIED`.
When a disk-backed `S3` object storage reloads a still-matching endpoint block, start endpoint credentials from an empty credential surface before applying the fresh endpoint settings. This makes removing credential-bearing endpoint fields revoke the previous values instead of preserving them from the current client.
For refreshable user-controlled `S3` storages, do not merge freshly loaded endpoint settings into the previous credential surface. Clear credential-bearing fields first so removing credentials from a still-matching endpoint block takes effect immediately. Extend the reload regression test to keep the endpoint block present while removing its credentials.
A matching endpoint block without endpoint-specific credentials inherits the top-level `s3` credentials by design. Keep the regression focused on proving that the old endpoint credentials are gone after reload.
Resetting user-controlled credentials marks `access_key_id` as changed even when the resulting value is empty. `access_header` must still be attached in that state, otherwise endpoint header reloads silently drop the configured header. Fixes `test_s3_access_headers/test.py::test_custom_access_header[test_access_over_custom_header]` from the CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104889&sha=5f436725f9418862170b684d0dab1b6c2eb1c030&name_0=PR PR: ClickHouse#104889
| return; | ||
| } | ||
|
|
||
| s3_settings.resetCredentialsForUserControlledRequest(); |
There was a problem hiding this comment.
This resets credentials even when ignore_user is true, which is the path used for internal BACKUP work (is_internal_backup). Before this change, an internal backup with no explicit S3 keys could use the server-managed top-level <s3> credentials; now, if there is no matching endpoint block, this line clears them and makeS3Client receives empty credentials.
That breaks the non-user-controlled internal backup path while the hardening only needs to fail closed for user SQL. Please preserve the loaded <s3> credentials when ignore_user / is_internal_backup is true, or add an explicit internal-backup credential path, and reset only for user-controlled backups.
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 372/400 (93.00%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 10 line(s) · Uncovered code |
Restricts user-controlled S3 credential resolution so SQL-visible S3 entry points cannot inherit server-managed credentials from environment/IMDS/IRSA, top-level
<s3>config, or stale endpoint-scoped config.The change covers
s3(), theS3engine,S3Queuethrough shared configuration, S3 named collections, dynamic S3 disks,BACKUP TO S3, and Glue catalog creation. It also redacts AWS ARNs from S3 exceptions and makes config reload drop revoked endpoint credentials instead of preserving stale clients.Closes https://github.com/ClickHouse/clickhouse-private/issues/54202
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Tightened validation of S3 credentials provided through user SQL to prevent S3 table functions, engines, named collections, dynamic disks, backups, and Glue catalogs from reusing server-managed credentials.