-
Notifications
You must be signed in to change notification settings - Fork 8k
Description
Company or project name
ClickHouse
Use case
Let's suppose we have many ClickHouse instances of different customers, and all of them have some small tables of the same structure - for example, system tables. We don't want to store these tables on every customer cluster. Instead we want to use a single remote table on a shared cluster. This will save cost on storage due to better compression, lower resource usage on customer clusters, and combining with asynchronous inserts, lower the number of requests to S3.
Describe the solution you'd like
Create a new table engine MultiTenantRemote. It will accept the cluster name, remote database, and table name, like a Distributed table. It will use two parameters from the server configuration: customer_id and customer_secret. On INSERT, it will add two columns to the block: customer_id LowCardinality(String), customer_secret LowCardinality(String) and perform an INSERT into the remote table. On SELECT, it will query the remote server and send the customer_id and customer_secret as query parameters.
Restricting access to the data has to be implemented on the remote server side. It can be done in the following way:
The table has customer_id as the first column of the sorting key for fast queries.
It has a row policy that restricts selects to customer_id = {customer_id:String} AND customer_secret = {customer_secret:String}.
A more sophisticated row policy condition can support time-based key rotation. The customer_secret will contain the cutoff time and HMAC, using the key that is only known to the remote server. The row policy validates the HMAC, extracts the cutoff time, and filters records using this cutoff time. So that the access to the newer data will be restricted to newer keys, which will be periodically rotated.
The table can have a constraint that allows only legitimate inserts. This can be implemented by adding HMAC to the customer_secret, which is checked on the remote server side, using the key that is only known to the remote server.
Creating the tables on the remote server and configuring the remote server are out of scope of the implementation - it has to be done manually or with a script.
The tables cannot be altered.
Describe alternatives you've considered
No response
Additional context
No response