Table engine `QueryRunner`

### Company or project name

ClickHouse

### Use case

Records inserted into a `QueryRunner` table represent queries, that the table engine will run. The table engine can be used for asynchronous query execution, batch execution of generated queries, for directing queries on remote clusters, for benchmarks, fuzzing, and testing with shadow traffic.


### Describe the solution you'd like

```
CREATE TABLE runner
[DEFINER = { user | CURRENT_USER }] [SQL SECURITY { DEFINER | INVOKER | NONE }]
(
    query String,
    database String,
    settings Map(LowCardinality(String), String),
    delay_microseconds UInt64,
)
ENGINE = QueryRunner
SETTINGS
    cluster = 'cluster_name',
    mode = 'asynchronous'/'synchronous',
    threads = 4,
    max_queue_size = 1000;
```

The table must be created with a subset of allowed columns: `query`, `database`, `settings`, `delay_microseconds`. The column `query` is mandatory, and the other columns are optional.

The table allows only INSERTs and does not allow SELECT queries. On INSERT into this table, it will run the queries from the `query` column, using settings from the `settings` column (if it is present) and default database from the `database` column (if it is present), directing them to the specified cluster or running them locally. In the local case, the user under whom the queries will run is determined by the SQL SECURITY:  the queries will be run either from the specified user, or from the user who performed the INSERT, or from the user who created the table. The table allows synchronous and asynchronous mode. In the synchronous mode, the queries will be run at the time of INSERT, and in the asynchronous mode - in the background.

The table could run any type of queries, but in the case of INSERT, only queries with inline data are supported.

The queries are run in the "fire and forget" mode - in case of any exceptions, there are no retries. The success can be checked using the `system.query_log` table. The queries initiated by this engine, should be marked as `internal` in the query_log. The results of SELECT queries are not saved anywhere - the only way to keep results is to use INSERT SELECT.

When the `cluster` is specified, the queries will be sent to the specified cluster for execution entirely (up to the `Complete` stage). The DEFINER and SQL SECURITY only have effect in the local mode (without the specified cluster) and their presence with cluster should trigger an exception. The default is the local mode.

Both synchronous and asynchronous modes run queries in a set of background threads, the difference is only that the synchronous mode waits for completion of all the queries in the corresponding batch of records, inserted into the table. The default is the asynchronous mode.

Queries are run in a number of background threads, controlled by the `threads` setting. The table queues up to `max_queue_size` scheduled queries, and if it is exceeded, logs an error and ignores the overflown queries.

Queries can be run as soon as the background thread is ready to pick the next one (if there is no `delay_microseconds` column) or not earlier than with a delay, determined by the `delay_microseconds` column.

The data in the table (queued queries) is in-memory and ephemeral - it does not survive server restart. On server shutdown, no particular actions are performed for this table, the queued queries will be ignored.

### Describe alternatives you've considered

Queries are run in the "fire and forget" mode, and their results are ignored. This is fine for the most of usage scenarios. The success can be checked in the `system.query_log`. For simplicity, we don't provide any other system tables to observe the status.

The usage of this table might be undesirable for the general audience, as it makes it too easy to overload the cluster. We can easily prevent its usage with table engine grants (restricting creating this table). The table, if created with the default INVOKER SQL SECURITY, does not give any additional access to the system. However, with the DEFINER SQL SECURITY, it enables running all queries on behalf of another user, which requires understanding the consequences.

The engine can support INSERT queries with inline data, CREATE queries, etc., but is mostly meant to run SELECT queries.

The column `delay_microseconds` allows to run queries with a higher or lower rate than they were in the data (e.g., when inserted using a materialized view from the query_log). I've also considered using `event_time_microseconds` and using the time related to the first record, plus the `time_factor` setting - but this is both more complex and less flexible. The only downside is that `delay_microseconds` has to be calculated from the `query_log` using something like `event_time_microseconds - min(event_time_microseconds) OVER ()`.

I've also considered adding the `ordered` mode, which let's run queries in the same order as they were in the inserted data. But it gives too much hassle for small benefits.

I've also considered an option to throw an exception on the first failed query in the synchronous mode, or on the queue overflow. I don't want to complicate the code base.

There could many `QueryRunner` tables created on a single server - they work independently. 

I've considered allowing a `user` column, so we can pick it from the `query_log` to run queries on behalf of the same user. But compared to the already listed options, it sounds too complicated and also too unsafe.

### Additional context

## Usage scenarios

### Batch execution of generated queries

Let's say you want to run a set of programmatically generated queries, but don't want to write an external script. It will be possible (local, synchronous mode).

### Asynchronous query execution

You can write a query and schedule it after certain time, then expect an attempt to run it assuming the server was not restarted.

### Benchmarks

The engine lets you to do something similar to what `clickhouse-benchmark` does, but entirely inside the server.

### Fuzzing

You can set up a materialized view on top of `query_log`, which will select a subset of queries (for example, all previously successful SELECT queries without table functions), transform them with `fuzzQuery` and insert into a `QueryRunner` table. (In the local mode, don't forget to filter out internal queries from the `query_log` to avoid creating a loop)

This is similar to the server-side query fuzzer that we already have, but more high-level and more flexible.

### Shadow traffic

When you have a readonly replica with a different ClickHouse version or a build type, or even a different cluster, you can set up a materialized view on top of `query_log` (it can, as well, filter a subset of queries by type or by used table functions/engines), and direct them to the desired replica. It can even transform queries with `obfuscateQuery`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table engine `QueryRunner` #101833

Company or project name

Use case

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Usage scenarios

Batch execution of generated queries

Asynchronous query execution

Benchmarks

Fuzzing

Shadow traffic

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Table engine QueryRunner #101833

Description

Company or project name

Use case

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Usage scenarios

Batch execution of generated queries

Asynchronous query execution

Benchmarks

Fuzzing

Shadow traffic

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Table engine `QueryRunner` #101833