-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accessing tables as files in HTTP interface #46925
Comments
May be we don't need to have separate feature for that? |
Yes, let's check if it's already possible. |
Looks like it would not work for file extensions. |
If we could add |
My idea was to create an overlay S3 interface for DeltaLake/Hudi/Iceberg data already on S3. Right now ClickHouse provides read-only support for those. If a full support for cooperatively transactional data on S3 is planned in the future, I think it is worth considering providing a S3-compatible interface on top in addition to a standard ClickHouse HTTP interface for the following reasons:
I haven't thought about an S3-compatible interface for ordinary ClickHouse deployment, but maybe this is a good idea too. I suggested this functionality only for the data which is already on S3 in DeltaLake/Hudi/Iceberg format. There are numerous technical challenges here. How to support range queries? How to support ETags/hashing? However, I think most of them are solvable somehow and if not then 90% compatibility is still better in my view than having the above problems like impossibility to support derived data, streaming data, silent corruption etc.. As a prototype/demo it will be nice to have Snowflake accessing a mat. view in ClickHouse thinking it is an Iceberg table on S3. |
Allow to add an HTTP handler representing a database or all databases under a specified URL prefix:
prefix/table.tsv
prefix/database/table.json
It contains the database name, the table name, optional data format as a file extension, and optional additional parameters.
The request is processed as
SELECT * FROM database.table FORMAT TSV
.The additional parameters are the same as what's available in the HTTP interface, including the settings, like
offset
,limit
, but we should add more parameters for query construction:columns
- provide a subset of columns to return;where
- filter the data by the corresponding expression;select
- write a select query explicitly, omitting theFROM
section;The authentication and access control work as usual.
Additional notes
It composes nicely with #38775.
Content-length, range requests, and ETag are either not implemented or can be implemented by caching the resulting content.
You might want a GraphQL interface, but it's out of scope.
We can also represent it under an S3-compatible interface, although it is much harder to do.
Motivation
Exploration of the possibilities. Proposed by @camuel
The text was updated successfully, but these errors were encountered: