add system.distributed_ddl_queue table #17656

bharatnc · 2020-12-01T04:50:27Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Adds a new table called system.distributed_ddl_queue that displays the queries in the DDL worker queue.

Detailed description / Documentation draft:

This PR adds a new table called the system.distributed_ddl_queue that lists all the queries that are currently in the DDL worker queue.

To accomplish this, the zookeeper path for distributed_ddl.path (default is /clickhouse/task_queue/ddl/) is polled for
all the queries and for each query, subpaths /active and /finished are queried to get the list of nodes that are present under the active and finished zookeeper paths. The data is then populated into the table as follows:

Querying the system.distributed_ddl_queue table from one of the shards:

SELECT *
FROM system.distributed_ddl_queue
WHERE cluster = 'test_cluster'
LIMIT 1
FORMAT Vertical

Query id: a1c53d35-5a84-4e3d-8176-13b1f4791141

Row 1:
──────
entry:             query-0000000000
host_name:         clickhouse01
host_address:      172.23.0.11
port:              9000
status:            Finished
cluster:           test_cluster
query:             CREATE DATABASE test_db UUID '4a82697e-c85e-4e5b-a01e-a36f2a758456' ON CLUSTER test_cluster
initiator:         clickhouse01:9000
query_start_time:  2020-12-30 13:07:51
query_finish_time: 2020-12-30 13:07:51
query_duration_ms: 6
exception_code:    ZOK

1 rows in set. Elapsed: 0.038 sec.

relates to #17082

bharatnc · 2020-12-06T19:23:49Z

@tavplubix , @alesapin this is my current implementation of the ddl_worker_queue table. It's now ready for review.

The table structure is as follows:

:) describe system.ddl_worker_queue;

DESCRIBE TABLE system.ddl_worker_queue

Query id: 5c760afb-f63c-4ee9-b359-e4ee38c663f9

┌─name─────┬─type──────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ name     │ String        │              │                    │         │                  │                │
│ active   │ Array(String) │              │                    │         │                  │                │
│ finished │ Array(String) │              │                    │         │                  │                │
└──────────┴───────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

3 rows in set. Elapsed: 0.004 sec.

Also I rely on zookeeper for fetching a list of queries and the entries under active and finished paths. I'm pretty sure that there might be a better way to accomplish this end result and also the table can include more columns probably or even the structure perhaps might not be the one that you all would have visualized. Do let me know in your reviews.

azat

also the table can include more columns probably

How about adding the following columns?

query itself (from query-X znode)
query_duration_ms (mtime for the parent znode (query-X) and query-X/finished/{node})
exception_code (finished/{node} znode)

This way it can be used to track errors and is query executed eventually (since right now if the timeout will be reached you will not know the status of the query in the client)

Also since znodes in zookeeper periodically cleaned up, looks like adding system.ddl_worker_queue_log seems useful (but can be done as a separate step I guess), but it should not be enabled by default of course (since the content is the same for all cluster)

And by the way it worth mention this new table in system-tables.md

src/Storages/System/StorageSystemDDLWorkerQueue.cpp

tavplubix · 2020-12-07T14:50:23Z

Maybe unfold arrays in table structure? I mean something like:

┌─entry────────────┬─host───────┬─status──────┬─some_column─┐
│ query-0000000042 │ node1:9000 │ active      │ ...         │
│ query-0000000042 │ node2:9000 │ finished    │ ...         │
│ query-0000000042 │ node3:9000 │ unknown     │ ...         │
│ ...              │ ...        │ ...         │ ...         │
└──────────────────┴────────────┴─────────────┴─────────────┘

And I suggest the following columns:

entry String (or entry_name) instead of name
cluster String - cluster name, we can extract it from query, but it's easier to use where cluster='smth' in select
host String (or maybe two separate columns with host and port) - name of host, which should execute the query. We can obtain it from hosts list in DDL entry (query-xxxxxxxxxx node data). However, hosts list in DDL entry contains IP addresses rather than hostnames, so I'm not sure if it will be convenient to use. We also can try to resolve it back to hostnames (see DDLWorker::parseQueryAndResolveHost(...)), but it's too complicated. Anyway, DDLQueryStatusInputStream relies on this hosts list when it waiting for query to be executed on all hosts, so we should take it into account and show all involved hosts in system.ddl_worker_queue, even if some of them have not started to execute query yet (and have not created corresponding active node yet).
status Enum('unknown' = 1, 'active' = 2, 'finished' = 3, 'error' = 4, ...) - status of query execution on each host, unknown is for hosts which have not created either active or finished node in zk, error is for finished nodes with non-zero exception code.
query, query_duration_ms and exception_code will be useful too
Maybe some additional columns with name of replica which executes query and tries_to_execute count (if query should be executed on leader, see DDLWorker::tryExecuteQueryOnLeaderReplica(...))

As for system.ddl_worker_queue_log, now all such queries are logged into system.query_log with special comment, so it's possible to get query execution status with select ... from clusterAllReplicas('cluster', 'system.query_log') where query like '/* ddl_entry=query-xxxxxxxxxx */%'. But it's a bit different thing and I agree that system.ddl_worker_queue_log maybe also useful in some cases.

Another consideration: DDLWorker is a name of class in code, so name ddl_worker_queue is clear for developers, but not for users. Maybe distributed_ddl_queue or on_cluster_ddl_queue is more suitable?

@azat, @alesapin, do you have any comments?

azat · 2020-12-07T17:29:51Z

host String (or maybe two separate columns with host and port) - name of host, which should execute the query

system.clusters has host_address/host_name and port, so better to use the same naming.

Maybe unfold arrays in table structure? I mean something like:

Completely agree with everything and unfolding arrays in particular.

tests/integration/test_system_ddl_worker_queue/test.py

src/Storages/System/StorageSystemDDLWorkerQueue.cpp

bharatnc · 2020-12-07T17:53:23Z

Thank you for the review @azat and @tavplubix let me take sometime to read the reviews more closer and also make these changes.

bharatnc · 2020-12-15T07:49:12Z

@tavplubix and @azat

I have implemented the following as per your review:

entry String (or entry_name) instead of name
cluster String - cluster name, we can extract it from query, but it's easier to use where cluster='smth' in select
host String (or maybe two separate columns with host and port) - name of host, which should execute the query .....
status Enum('unknown' = 1, 'active' = 2, 'finished' = 3, 'error' = 4, ...) - status of query execution on each host, .....
query, query_duration_ms and exception_code will be useful too
Maybe some additional columns with name of replica which executes query and tries_to_execute count (if query should be executed on leader, see DDLWorker::tryExecuteQueryOnLeaderReplica(...))
- replace getChildren() with tryGetChildren()
- using the table name as system.distributed_ddl_queue - Let me know if you want me to change the file names to match as well. But I think DDLWorkerQueue sounds clearer. for developers as pointed out.

Additional points:

Current query output looks like the following:

Row 1:
──────
entry:             query-0000000000
host_name:         clickhouse01
host_address:      172.23.0.11
port:              9000
status:            finished
cluster:           test_cluster
values:            version: 1
query: CREATE DATABASE test_db UUID '40ac7692-70d3-48a9-bc29-4ade18957f59' ON CLUSTER test_cluster
hosts: ['clickhouse01:9000','clickhouse02:9000','clickhouse03:9000','clickhouse04:9000']
initiator: clickhouse01:9000
query_start_time:  2020-12-15 10:06:35
query_finish_time: 2020-12-15 10:06:35
query_duration_ms: 7
exception_code:    ZOK

1 rows in set. Elapsed: 0.037 sec.

I've written the checks in such a way that if any one of the columns are unavailable perhaps, due to exception, then the columns are populated with best effort.

Concerns:

I'm worried about the complexity of the code that I've written. The current flow looks like:
- I'm now using WHERE cluster='cluster' in the query.
- For each cluster I'm querying all replicas.
- For all the queries, I'm populating all the relevant fields using the replicas and also iterating through a list of queries to
  determine the status of the queries.

Let me know if the implementation looks better now (but it's doing more now). Thank you!

src/Storages/System/StorageSystemDDLWorkerQueue.h

src/Storages/System/StorageSystemDDLWorkerQueue.cpp

docs/en/operations/system-tables/distributed_ddl_queue.md

bharatnc · 2020-12-30T10:34:21Z

@tavplubix pushed a second round of changes since your last review and have updated the PR description to contain the latest o/p. Let me know if it looks good to you. I hope that I've addressed most of the concerns.

One note about exception handling, if anywhere a zk exception occurs, the exception code is updated and we leave the column empty instead of throwing that exception. This is so that we don't fail on all results one of the zk query fails / timeouts.

docs/en/operations/system-tables/distributed_ddl_queue.md

src/Storages/System/StorageSystemDDLWorkerQueue.cpp

bharatnc · 2021-01-07T16:54:34Z

@tavplubix @azat just following up - let me know if you have any other inputs, I have implemented the changes based on your latest reviews.

alexey-milovidov · 2021-01-07T17:09:20Z

@tavplubix is on vacation until Jan 11.

azat

LGTM

src/Storages/System/StorageSystemDDLWorkerQueue.cpp

src/Interpreters/DDLWorker.h

robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Dec 1, 2020

tavplubix self-assigned this Dec 1, 2020

bharatnc marked this pull request as ready for review December 6, 2020 19:17

azat reviewed Dec 6, 2020

View reviewed changes

src/Storages/System/StorageSystemDDLWorkerQueue.cpp Show resolved Hide resolved

tavplubix reviewed Dec 7, 2020

View reviewed changes

tests/integration/test_system_ddl_worker_queue/test.py Outdated Show resolved Hide resolved

tests/integration/test_system_ddl_worker_queue/test.py Outdated Show resolved Hide resolved

src/Storages/System/StorageSystemDDLWorkerQueue.cpp Outdated Show resolved Hide resolved

bharatnc changed the title ~~add system.ddl_worker_queue table~~ add system.distributed_ddl_queue table Dec 16, 2020

tavplubix reviewed Dec 24, 2020

View reviewed changes

azat reviewed Jan 2, 2021

View reviewed changes

docs/en/operations/system-tables/distributed_ddl_queue.md Outdated Show resolved Hide resolved

src/Storages/System/StorageSystemDDLWorkerQueue.cpp Outdated Show resolved Hide resolved

src/Storages/System/StorageSystemDDLWorkerQueue.cpp Outdated Show resolved Hide resolved

azat approved these changes Jan 7, 2021

View reviewed changes

src/Storages/System/StorageSystemDDLWorkerQueue.cpp Outdated Show resolved Hide resolved

src/Interpreters/DDLWorker.h Outdated Show resolved Hide resolved

bharatnc added 12 commits January 7, 2021 11:10

DDLWorkerQueueTable - add intial scaffolding

2c1f9e2

DDLWorkerQueueTable - initial attempt to fetch entries from zk

da174a6

DLLWorkerQueueTable - remove unimplemented function

5af6f11

DDLWorkerQueueTable - populate active and finished columns

56a8532

DDlWorkerQueueTable - add integration test

b008c49

DDLWorkerQueueTable - load distributed_ddl.path from server config

cc568a1

DDLWorkerQueueTable - make loop vars a const ref

c0d32de

DDLWorkerQueueTable - fix integration test

f6d33c4

Review - initial round of changes

2a12290

Add values column and fix tests

57126d1

Use tryGetChildren() for Zookeeper

6da923c

Add query_start|finish_time, query_dureation_ms

9004d43

bharatnc added 8 commits January 7, 2021 11:10

add exception_status column

c796b0c

few more changes and simplification to integration test

24d6708

Docs - distributed_ddl_queue

5365d55

review fixes

9cddd22

DDLLogEntry.parse() to populate query and initiator

f0ea07b

extract cluster_name from DDLQuery

6f0009f

Docs - update to reflect code changes

8dacfde

Fixes to code, build and style checks

1e579ac

tavplubix approved these changes Jan 11, 2021

View reviewed changes

tavplubix merged commit 213c653 into ClickHouse:master Jan 11, 2021

sevirov mentioned this pull request Jan 31, 2021

DOCSUP-5973: Update the system.distributed_ddl_queue system table #19882

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add system.distributed_ddl_queue table #17656

add system.distributed_ddl_queue table #17656

bharatnc commented Dec 1, 2020 •

edited

bharatnc commented Dec 6, 2020

azat left a comment

tavplubix commented Dec 7, 2020

azat commented Dec 7, 2020

bharatnc commented Dec 7, 2020

bharatnc commented Dec 15, 2020 •

edited

bharatnc commented Dec 30, 2020

bharatnc commented Jan 7, 2021

alexey-milovidov commented Jan 7, 2021

azat left a comment

add system.distributed_ddl_queue table #17656

add system.distributed_ddl_queue table #17656

Conversation

bharatnc commented Dec 1, 2020 • edited

bharatnc commented Dec 6, 2020

azat left a comment

Choose a reason for hiding this comment

tavplubix commented Dec 7, 2020

azat commented Dec 7, 2020

bharatnc commented Dec 7, 2020

bharatnc commented Dec 15, 2020 • edited

bharatnc commented Dec 30, 2020

bharatnc commented Jan 7, 2021

alexey-milovidov commented Jan 7, 2021

azat left a comment

Choose a reason for hiding this comment

bharatnc commented Dec 1, 2020 •

edited

bharatnc commented Dec 15, 2020 •

edited