-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add system.distributed_ddl_queue table #17656
add system.distributed_ddl_queue table #17656
Conversation
@tavplubix , @alesapin this is my current implementation of the The table structure is as follows:
Also I rely on zookeeper for fetching a list of queries and the entries under active and finished paths. I'm pretty sure that there might be a better way to accomplish this end result and also the table can include more columns probably or even the structure perhaps might not be the one that you all would have visualized. Do let me know in your reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also the table can include more columns probably
How about adding the following columns?
- query itself (from
query-X
znode) - query_duration_ms (
mtime
for the parent znode (query-X
) andquery-X/finished/{node}
) - exception_code (
finished/{node}
znode)
This way it can be used to track errors and is query executed eventually (since right now if the timeout will be reached you will not know the status of the query in the client)
Also since znodes in zookeeper periodically cleaned up, looks like adding system.ddl_worker_queue_log
seems useful (but can be done as a separate step I guess), but it should not be enabled by default of course (since the content is the same for all cluster)
And by the way it worth mention this new table in system-tables.md
Maybe unfold arrays in table structure? I mean something like:
And I suggest the following columns:
As for Another consideration: |
Completely agree with everything and unfolding arrays in particular. |
Thank you for the review @azat and @tavplubix let me take sometime to read the reviews more closer and also make these changes. |
@tavplubix and @azat I have implemented the following as per your review:
Additional points:
Row 1:
──────
entry: query-0000000000
host_name: clickhouse01
host_address: 172.23.0.11
port: 9000
status: finished
cluster: test_cluster
values: version: 1
query: CREATE DATABASE test_db UUID '40ac7692-70d3-48a9-bc29-4ade18957f59' ON CLUSTER test_cluster
hosts: ['clickhouse01:9000','clickhouse02:9000','clickhouse03:9000','clickhouse04:9000']
initiator: clickhouse01:9000
query_start_time: 2020-12-15 10:06:35
query_finish_time: 2020-12-15 10:06:35
query_duration_ms: 7
exception_code: ZOK
1 rows in set. Elapsed: 0.037 sec.
Concerns:
Let me know if the implementation looks better now (but it's doing more now). Thank you! |
@tavplubix pushed a second round of changes since your last review and have updated the PR description to contain the latest o/p. Let me know if it looks good to you. I hope that I've addressed most of the concerns. One note about exception handling, if anywhere a zk exception occurs, the exception code is updated and we leave the column empty instead of throwing that exception. This is so that we don't fail on all results one of the zk query fails / timeouts. |
@tavplubix @azat just following up - let me know if you have any other inputs, I have implemented the changes based on your latest reviews. |
@tavplubix is on vacation until Jan 11. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Adds a new table called
system.distributed_ddl_queue
that displays the queries in the DDL worker queue.Detailed description / Documentation draft:
This PR adds a new table called the
system.distributed_ddl_queue
that lists all the queries that are currently in the DDL worker queue.To accomplish this, the zookeeper path for
distributed_ddl.path
(default is/clickhouse/task_queue/ddl/
) is polled forall the queries and for each query, subpaths
/active
and/finished
are queried to get the list of nodes that are present under theactive
andfinished
zookeeper paths. The data is then populated into the table as follows:Querying the
system.distributed_ddl_queue
table from one of the shards:relates to #17082