Skip to content

Added support for s3 tables#1617

Open
subkanthi wants to merge 2 commits intoantalya-26.1from
s3table_support
Open

Added support for s3 tables#1617
subkanthi wants to merge 2 commits intoantalya-26.1from
s3table_support

Conversation

@subkanthi
Copy link
Copy Markdown
Collaborator

@subkanthi subkanthi commented Apr 5, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Support for s3 tables(iceberg REST Catalog), closes #ClickHouse#95340

Documentation entry for user-facing changes

Support for s3 tables(Iceberg serverless REST Catalog) using sigv4 authentication.
Can be enabled with allow_experimental_database_s3_tables

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Workflow [PR], commit [dbcf541]

…s only support single namespace, for every namespace , make call to getTables. Use ThreadPoolCallbackRunnerLocal to perform in thredpool.

tryGetTableMetadata , Inject https://s3.{region}.amazonaws.com  so that requests dont goto bucket.s3.amazonaws.com. pre-strips the s3://bucket-name/ prefix, leaving just the relative path metadata/00001-....json, so downstream code constructs the correct S3 key. Detects missing/empty credentials and injects the catalog's own IAM credentials from credentials_provider->GetAWSCredentials().
@subkanthi subkanthi marked this pull request as ready for review April 6, 2026 21:10
@subkanthi subkanthi force-pushed the s3table_support branch 3 times, most recently from 301f82e to dbcf541 Compare April 6, 2026 21:44
@subkanthi
Copy link
Copy Markdown
Collaborator Author

subkanthi commented Apr 6, 2026

TESTING
Using Athena to create iceberg tables in s3Tables

CREATE TABLE `nyc`.daily_sales (
sale_date date, 
product_category string, 
sales_amount double)
PARTITIONED BY (month(sale_date))
TBLPROPERTIES ('table_type' = 'iceberg')

INSERT INTO daily_sales
VALUES
(DATE '2024-01-15', 'Laptop', 900.00),
(DATE '2024-01-15', 'Monitor', 250.00),
(DATE '2024-01-16', 'Laptop', 1350.00),
(DATE '2024-02-01', 'Monitor', 300.00),
(DATE '2024-02-01', 'Keyboard', 60.00),
(DATE '2024-02-02', 'Mouse', 25.00),
(DATE '2024-02-02', 'Laptop', 1050.00),
(DATE '2024-02-03', 'Laptop', 1200.00),
(DATE '2024-02-03', 'Monitor', 375.00);

SET allow_experimental_database_s3_tables = 1;

CREATE DATABASE my_s3_tables_sales
ENGINE = DataLakeCatalog('https://s3tables.us-east-2.amazonaws.com/iceberg')
SETTINGS
    catalog_type = 's3tables',
    warehouse = 'arn:aws:s3tables:us-east-2:<account_id>:bucket/nyc-taxis',
    aws_access_key_id = '',
    aws_secret_access_key = '',
    region = 'us-east-2';

use my_s3_tables_sales;

USE my_s3_tables_sales

Query id: 6e72f6fd-568f-4dd2-95fa-b1ff6bddd7df

Ok.

0 rows in set. Elapsed: 0.002 sec. 

Ubuntu-2404-noble-amd64-base :) select * from my_s3_tables_sales.`nyc.daily_sales`;

SELECT *
FROM my_s3_tables_sales.`nyc.daily_sales`

Query id: 1cec644c-8927-4b9c-a2c3-b770375f7734

   ┌──sale_date─┬─product_category─┬─sales_amount─┐
1. │ 2024-01-15 │ Laptop           │          900 │
2. │ 2024-01-15 │ Monitor          │          250 │
3. │ 2024-01-16 │ Laptop           │         1350 │
4. │ 2024-02-01 │ Monitor          │          300 │
5. │ 2024-02-01 │ Keyboard         │           60 │
6. │ 2024-02-02 │ Mouse            │           25 │
7. │ 2024-02-02 │ Laptop           │         1050 │
8. │ 2024-02-03 │ Laptop           │         1200 │
9. │ 2024-02-03 │ Monitor          │          375 │
   └────────────┴──────────────────┴──────────────┘



@mkmkme
Copy link
Copy Markdown
Collaborator

mkmkme commented Apr 7, 2026

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Audit update for PR #1617 (Added support for s3 tables / S3 Tables Iceberg REST catalog):

Confirmed defects:

No confirmed defects in reviewed scope.

Coverage summary:

Scope reviewed: PR diff only—settings/enums, `DatabaseDataLake` registration and validation, new `AWSV4Signer` + `S3TablesCatalog`, `TableMetadata::hasDataLakeSpecificProperties`, `RestCatalog` virtual hooks, `S3Credentials::isEmpty`, experimental-settings enablement
Categories failed: none
Categories passed: build/link assumptions (`add_headers_and_sources` for `Databases/DataLake`), SigV4 signer wiring vs AWS SDK `AWSAuthV4Signer` API, guarded catalog factory paths (`USE_AWS_S3`/`USE_SSL` vs TU `USE_AVRO`/`USE_PARQUET`), parallel `getTables` vs existing `RestCatalog` patterns, credential/endpoint/metadata-path overrides
Assumptions/limits: no build or integration tests were run; defects would require runtime/catalog-specific failures not evidenced in the diff alone

@arthurpassos
Copy link
Copy Markdown
Collaborator

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dbcf541b8d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


/// Iceberg REST catalog for Amazon S3 Tables (SigV4, signing name `s3tables`).
/// https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-integrating-open-source.html
class S3TablesCatalog final : public RestCatalog
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Override S3 Tables drop to request purge

S3TablesCatalog inherits RestCatalog::dropTable, which hardcodes ?purgeRequested=False in the DELETE URL; AWS S3 Tables documents that purge=false is rejected with 400 Bad Request, so DROP TABLE against catalog_type='s3tables' will fail consistently. Please override the drop path for this catalog to send a purge-enabled delete request.

Useful? React with 👍 / 👎.

{
if (settings[DatabaseDataLakeSetting::catalog_type].value == DB::DatabaseDataLakeCatalogType::GLUE)
if (settings[DatabaseDataLakeSetting::catalog_type].value == DB::DatabaseDataLakeCatalogType::GLUE
|| settings[DatabaseDataLakeSetting::catalog_type].value == DB::DatabaseDataLakeCatalogType::S3_TABLES)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce non-empty warehouse for s3tables

validateSettings() now groups S3_TABLES with Glue and skips the warehouse check, but this catalog still relies on warehouse for loadConfig() (/v1/config?warehouse=...) and for deriving config.prefix when the server does not return one; allowing an empty warehouse defers this to runtime failures on namespace/table requests instead of rejecting invalid configuration at CREATE DATABASE time.

Useful? React with 👍 / 👎.


LOG_DEBUG(log, "Requesting: {}", url.toString());

return DB::BuilderRWBufferFromHTTP(url)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the base class implementation has some sort of retry mechanism, why not have it here as well?

try
  {
      return create_buffer(false);
  }
  catch (const DB::HTTPException & e)
  {
      const auto status = e.getHTTPStatus();
      if (update_token_if_expired &&
          (status == Poco::Net::HTTPResponse::HTTPStatus::HTTP_UNAUTHORIZED
           || status == Poco::Net::HTTPResponse::HTTPStatus::HTTP_FORBIDDEN))
      {
          return create_buffer(true);
      }
      throw;
  }


const auto & context = getContext();

Poco::URI url(endpoint, /* enable_url_encoding */ false);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// enable_url_encoding=false to allow use tables with encoded sequences in names like 'foo%2Fbar'


}

void signRequestWithAWSV4(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Perhaps name it like signHeaders as the output of this method is a list of headers and not a request?


static constexpr bool sign_body = true;
if (!signer.SignRequest(request, region.c_str(), service.c_str(), sign_body))
throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR, "AWS SigV4 signing failed");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOGICAL_ERROR will crash ClickHouse, it should be something else

Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::Always,
/* urlEscapePath = */ false);

config = loadConfig();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of dangerous. You've made createReadBuffer virtual, and loadConfig calls it. If S3TablesCatalog::createReadBuffer relies on members that are initialized after loadConfig, it is UB

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, there doesn't seem to be an imminent problem, but i would at least add a comment about this?

I don't see a way around this "problem" tbh

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return true;
}

DB::HTTPHeaderEntries S3TablesCatalog::getAuthHeaders(bool /* update_token */) const
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use this method to implement the signing instead of overwriting the createReadBuffer method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants