Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CFR and WTF diagnostics programs #88

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Add CFR and WTF diagnostics programs #88

wants to merge 8 commits into from

Conversation

amotl
Copy link
Contributor

@amotl amotl commented Dec 1, 2023

About

There are many smart queries to leverage information from CrateDB's internal sys.* tables. This subsystem aims to bundle and collect them, in order to unlock easy access from CLI and HTTP interfaces, and to be able to use them as building blocks for other software components.

Status

It is still a work in progress, and needs more attention and love. What can be done right now is outlined within the preview section. Any kind of help to expand this is much appreciated.

Documentation

Rendered in preview mode.

Both documents include directives to outline how operations work using Docker. However, the program can also be installed natively, also while not on PyPI yet. Please use one of those commands to acquire the software. When using Docker, make sure to use the cratedb-toolkit:pr-88 image on GHCR for preview purposes.

# Native
pip install --upgrade 'cratedb-wtf @ git+https://github.com/crate-workbench/cratedb-toolkit@cratedb-wtf'
# OCI
docker pull ghcr.io/crate-workbench/cratedb-toolkit:pr-88

Help

ctk cfr --help
ctk wtf --help

Backlog

  • Export and import CrateDB system tables conveniently.
  • Informing about tables sizes, like Admin UI is doing it.
  • Informing about shard imbalances.
  • Possibly tap into profiling, using JFR, profefe, and/or Grafana Pyroscope.
  • Handovers.

Copy link

codecov bot commented Dec 1, 2023

Codecov Report

Attention: 151 lines in your changes are missing coverage. Please review.

Comparison is base (0f39ebd) 84.93% compared to head (2b2e8ff) 81.48%.

Files Patch % Lines
cratedb_toolkit/wtf/query_collector.py 42.74% 71 Missing ⚠️
cratedb_toolkit/wtf/http.py 0.00% 24 Missing ⚠️
cratedb_toolkit/util/platform.py 42.85% 20 Missing ⚠️
cratedb_toolkit/wtf/recorder.py 46.87% 17 Missing ⚠️
cratedb_toolkit/wtf/cli.py 88.17% 11 Missing ⚠️
cratedb_toolkit/util/service.py 0.00% 4 Missing ⚠️
cratedb_toolkit/wtf/model.py 95.45% 3 Missing ⚠️
cratedb_toolkit/util/data.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #88      +/-   ##
==========================================
- Coverage   84.93%   81.48%   -3.45%     
==========================================
  Files          48       58      +10     
  Lines        1805     2285     +480     
==========================================
+ Hits         1533     1862     +329     
- Misses        272      423     +151     
Flag Coverage Δ
influxdb 41.83% <39.30%> (-0.83%) ⬇️
main 67.30% <68.93%> (+0.32%) ⬆️
mongodb 53.39% <39.30%> (-3.90%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@amotl amotl changed the title cratedb-wtf: Add cratedb-wtf diagnostics program Add cratedb-wtf diagnostics program Dec 1, 2023
Comment on lines 46 to 62
class Logs:
# TODO: Implement `tail` in one way or another. -- https://stackoverflow.com/q/4714975
# SELECT * FROM sys.jobs_log OFFSET -10;
# SELECT * FROM sys.jobs_log OFFSET (SELECT count(*) FROM sys.jobs_log)-10;
# https://cratedb.com/docs/crate/reference/en/latest/general/builtins/scalar-functions.html#to-char-expression-format-string
# https://cratedb.com/docs/crate/reference/en/latest/general/builtins/scalar-functions.html#date-format-format-string-timezone-timestamp
user_queries = """
SELECT
DATE_FORMAT('%Y-%m-%dT%H:%i:%s.%f', started) AS started,
DATE_FORMAT('%Y-%m-%dT%H:%i:%s.%f', ended) AS ended,
classification, stmt, username, node
FROM
sys.jobs_log
WHERE
stmt NOT LIKE '%sys.%' AND
stmt NOT LIKE '%information_schema.%'
"""
Copy link
Contributor Author

@amotl amotl Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @hlcianfagna, @hammerhead, and @WalBeh,

I would like to implement tailing the sys.jobs_log table in one way or another. Actually, tail --follow. Did you manage to do that yet, using any kind of OFFSET/LIMIT magic?

SELECT * FROM sys.jobs_log OFFSET -10;
SELECT * FROM sys.jobs_log OFFSET (SELECT count(*) FROM sys.jobs_log)-10;

Those statements outlined above failed for me. Is there any way to get the number of total records out of the subselect into the OFFSET parameter? Most probably, I am only too silly to make it work, so I am humbly asking for your support.

Cheers,
Andreas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 SELECT CURRENT_TIMESTAMP AS last_timestamp,
                            (ended / 10000) * 10000 + 5000 AS ended_time,
                            COUNT(*) / 10.0 AS qps,
                            TRUNC(AVG(ended::bigint - started::bigint), 2) AS duration,
                            UPPER(regexp_matches(stmt, '^\s*(\w+).*') [1]) AS query_type
FROM sys.jobs_log
WHERE ended > now() - ('15 minutes')::interval
GROUP BY 1,
         2,
         5
ORDER BY ended_time ASC

This is the query used in the panel to show the latest 15 minutes QPS, perhaps it can serve as inspiration :P

Copy link
Contributor Author

@amotl amotl Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. So this is synthesizing the limiting by timestamp, compared to what is stored within the ended column? Hmm. Naturally I'd favor a more generic solution, but in this case, it could make an acceptable workaround. Sweet.

What's stored in ended when, well, the query has not finished yet? For emulating a tail -f, I'd probably better rely on started instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems empty when it doesn't have a value.

Something like

Query ran at 2023-12-02T11:02:47.288Z on CrateDB 5.5.0
SELECT
  DATE_FORMAT(started)
FROM
  "sys"."jobs_log"
ORDER BY
  started DESC
LIMIT
  10
QUERY OK, 10 record(s) returned in 0.0045s
date_format(started)
"2023-12-02T11:02:40.890000Z"
"2023-12-02T11:02:40.825000Z"
"2023-12-02T11:02:40.822000Z"
"2023-12-02T11:02:40.819000Z"
"2023-12-02T11:02:40.819000Z"
"2023-12-02T11:02:40.819000Z"
"2023-12-02T11:02:40.819000Z"
"2023-12-02T11:02:35.892000Z"
"2023-12-02T11:02:35.821000Z"
"2023-12-02T11:02:35.821000Z"

Seem to work just fine as tail -f

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow what is new I would keep track of the last started timestamp observed and query what is new

@amotl amotl force-pushed the cratedb-wtf branch 2 times, most recently from 0e3ca31 to 002a255 Compare December 4, 2023 04:04
@amotl amotl force-pushed the cratedb-wtf branch 3 times, most recently from 3f53a92 to 2b2e8ff Compare January 2, 2024 10:51
@amotl
Copy link
Contributor Author

amotl commented Apr 2, 2024

Backlog: Check if the patch includes relevant details from here.

https://community.cratedb.com/t/monitoring-an-on-premises-cratedb-cluster-with-prometheus-and-grafana/1236

@codecov-commenter
Copy link

codecov-commenter commented Apr 16, 2024

Codecov Report

Attention: Patch coverage is 71.31148% with 210 lines in your changes are missing coverage. Please review.

Project coverage is 77.01%. Comparing base (52f3a27) to head (c16ae9d).

Files Patch % Lines
cratedb_toolkit/wtf/query_collector.py 42.74% 71 Missing ⚠️
cratedb_toolkit/wtf/http.py 0.00% 24 Missing ⚠️
cratedb_toolkit/util/platform.py 42.85% 20 Missing ⚠️
cratedb_toolkit/wtf/recorder.py 46.87% 17 Missing ⚠️
cratedb_toolkit/cfr/systable.py 90.15% 13 Missing ⚠️
cratedb_toolkit/util/cli.py 23.52% 13 Missing ⚠️
cratedb_toolkit/sqlalchemy/patch.py 62.06% 11 Missing ⚠️
cratedb_toolkit/wtf/cli.py 88.17% 11 Missing ⚠️
cratedb_toolkit/cfr/cli.py 82.22% 8 Missing ⚠️
cratedb_toolkit/util/data.py 46.15% 7 Missing ⚠️
... and 4 more
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #88       +/-   ##
===========================================
+ Coverage   62.11%   77.01%   +14.89%     
===========================================
  Files          56       68       +12     
  Lines        1969     2684      +715     
===========================================
+ Hits         1223     2067      +844     
+ Misses        746      617      -129     
Flag Coverage Δ
influxdb 36.10% <37.56%> (?)
main 64.49% <70.49%> (+2.38%) ⬆️
mongodb 45.93% <36.88%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@amotl amotl force-pushed the cratedb-wtf branch 6 times, most recently from 2f96848 to f0902b2 Compare April 17, 2024 00:42
@amotl amotl changed the title Add cratedb-wtf diagnostics program Add CFR and WTF diagnostics programs Apr 17, 2024
@amotl amotl force-pushed the cratedb-wtf branch 2 times, most recently from 5f62d1a to c3222b3 Compare April 17, 2024 08:15
@amotl amotl force-pushed the cratedb-wtf branch 2 times, most recently from 4d3b461 to 0b3a47c Compare May 8, 2024 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants