Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add observability for SQL queries that require all nodes to complete #81922

Open
lunevalex opened this issue May 26, 2022 · 2 comments
Open
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects

Comments

@lunevalex
Copy link
Collaborator

lunevalex commented May 26, 2022

Is your feature request related to a problem? Please describe.

CockroachDB already has a number of tools to track queries that perform a full table scan and highlght to users that they should consider optimizing such a query. Recently in a customer escalation we observed a pattern when there were a number of queries that needed every single node in the cluster to be available to complete, which could be equally problematic for a customer workload. There should be a way for the customer to identify, as they could be detrimental to the application stability. It is not quite clear what a customer should do when they find this pattern, as it's going to be very workload dependent, but observability is a start.

Describe the solution you'd like
The ask is two-fold:

  • Improve the SQL observability tool set to be able to surface queries that require every node in the cluster to complete. This information should be surfaced in the same way we surface full table scans today i.e. explain/analyze and something similar to https://www.cockroachlabs.com/docs/stable/show-full-table-scans.html.
  • Add a telemetry data-point to start capturing when a customer workload experiences this condition, so that CockroachDB R&D starts getting data on how prevalent and common such a pattern is in the wild (separate issue tracked here)

Jira issue: CRDB-16329

@lunevalex lunevalex added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-observability Related to observability of the SQL layer labels May 26, 2022
@lunevalex lunevalex added this to Triage in Cluster Observability via automation May 26, 2022
@lunevalex
Copy link
Collaborator Author

cc: @mwang1026 @maryliag @kevin-v-ngo

@lunevalex lunevalex added the O-postmortem Originated from a Postmortem action item. label May 26, 2022
@mari-crl mari-crl added sync-me and removed sync-me labels Jun 2, 2022
@maryliag maryliag added this to Triage in SQL Queries via automation Jun 8, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jun 8, 2022
@kevin-v-ngo
Copy link

@lunevalex, can you help provide context on this issue? Is there an escalation you can send me?

Agree we should collect telemetry on this scenario but as for introducing user-facing observability, i'd love to understand what actions this would inform the user to take using this information.

@kevin-v-ngo kevin-v-ngo moved this from Triage to Backlog in Cluster Observability Jun 8, 2022
@mgartner mgartner moved this from Triage to Backlog in SQL Queries Jun 13, 2022
@maryliag maryliag removed the O-postmortem Originated from a Postmortem action item. label Mar 2, 2023
@maryliag maryliag removed A-sql-observability Related to observability of the SQL layer T-sql-observability labels Nov 3, 2023
@maryliag maryliag removed this from Backlog in Cluster Observability Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: Backlog
SQL Queries
Backlog (DO NOT ADD NEW ISSUES)
Development

No branches or pull requests

4 participants