Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface the transaction retry chart more prominently in the DB Console and add documentation and tool tips on the Distributed metric dashboard #65856

Closed
kevin-v-ngo opened this issue May 28, 2021 · 7 comments · Fixed by #66973
Assignees
Labels
A-sql-console-timeseries SQL Observability issues related to console timeseries metrics and views A-sql-observability Related to observability of the SQL layer C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) E-quick-win Likely to be a quick win for someone experienced.

Comments

@kevin-v-ngo
Copy link

kevin-v-ngo commented May 28, 2021

We have a Transaction Restarts chart in the DB console under the distributed dashboard. Currently there is no documentation on this part of the console and we should also provide a way to surface this transaction restart view more prominently. One place to consider is the SQL dashboard where there are related counters for statements and transactions.

Documentation we should also update: https://www.cockroachlabs.com/docs/v21.1/ui-overview.html

Epic CRDB-6570

@kevin-v-ngo kevin-v-ngo added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-observability A-sql-console-timeseries SQL Observability issues related to console timeseries metrics and views labels May 28, 2021
@kevin-v-ngo kevin-v-ngo added this to Triage in Cluster Observability via automation May 28, 2021
@kevin-v-ngo kevin-v-ngo added the A-sql-observability Related to observability of the SQL layer label May 28, 2021
@bdarnell
Copy link
Contributor

We already export a metric txn.restarts as well as a bunch of reason-specifc metrics like txn.restarts.writetooold. These are shown in the KV Transaction Restarts chart on the distributed dashboard in the web console. Are these not what you want?

@kevin-v-ngo
Copy link
Author

Ah thanks for pointing this out Ben! Did not know this was here 😀 It would be good if this was more discoverable. Are there any differences between "KV" transactions and how we refer to SQL transactions on the SQL dashboard page?

@bdarnell
Copy link
Contributor

It would be good if this was more discoverable.

I may be misremembering but I think we used to have a transactions section of the dashboard (which apparently got folded into "distributed"?).

Are there any differences between "KV" transactions and how we refer to SQL transactions on the SQL dashboard page?

There are some, so you won't get the numbers to line up one-to-one. For example, there are some internal KV transactions that weren't initiated by SQL, and when a transaction auto-retries I think one SQL transaction may get counted as two KV transactions (I think it varies depending on exactly why there was a retry). But it's close enough that you can generally ignore the differences. For questions like "am I experiencing retries? and if so, why?" these metrics are what you want.

@exalate-issue-sync exalate-issue-sync bot changed the title Add time series counter for total transaction retries Surface the transaction retry chart more prominently in the DB Console and add documentation and tool tips on the "Distributed" metric dashboard. Jun 1, 2021
@exalate-issue-sync exalate-issue-sync bot changed the title Surface the transaction retry chart more prominently in the DB Console and add documentation and tool tips on the "Distributed" metric dashboard. Surface the transaction retry chart more prominently in the DB Console and add documentation and tool tips on the Distributed metric dashboard Jun 1, 2021
@maryliag maryliag moved this from Triage to Up next in Cluster Observability Jun 7, 2021
@kevin-v-ngo
Copy link
Author

Adding @Annebirzin and @ianjevans for visibility. @Annebirzin, the SQL activity page might be a good place for this when we introduce a metrics tab.

@ianjevans
Copy link
Contributor

Pinging @taroface. I'm having trouble finding any docs release note issues for the new metrics dashboards.

I think we're missing docs for the Distributed, Queues, and Slow Queries dashboards, but I think that's because there never was a release note string that triggered a doc issue. I don't know why that occurred, though.
E.g. https://github.com/cockroachdb/docs/issues?page=2&q=is%3Aissue+slow+queries

@ianjevans
Copy link
Contributor

So after some digging by Jordan, it appears that the missing dashboards have been in the codebase for quite some time, but have been undocumented. They likely were added before our RN text -> doc issue process began, so that's why we don't have doc issues for them. I'll create a separate issue to handle adding the docs for these dashboards.

@maryliag maryliag added the E-quick-win Likely to be a quick win for someone experienced. label Jun 16, 2021
@maryliag maryliag moved this from Up next to Quick Win in Cluster Observability Jun 25, 2021
@matthewtodd matthewtodd self-assigned this Jun 28, 2021
@kevin-v-ngo
Copy link
Author

To update this thread, we are moving the transaction restart chart (broken down by errors) into the SQL dashboard beneath transactions to make this more discoverable. @ianjevans, FYI, we should provide documentation on what to do for each error. I think this many are already captured in this reference documentation: https://www.cockroachlabs.com/docs/stable/transaction-retry-error-reference.html

craig bot pushed a commit that referenced this issue Jun 28, 2021
66973: ui: surface the transaction restarts chart r=matthewtodd a=matthewtodd

Resolves #65856
    
Release note (ui change): The KV transaction restarts chart was moved from the "distributed" metrics to the "sql" metrics page so as to be close to the SQL transactions chart, for more prominent visibility.

66979: opt: add cost penalty for scans with large cardinality r=rytaft a=rytaft

**opt: ensure we prefer a reverse scan to sorting a forward scan**

This commit fixes an issue where in some edge cases the optimizer would
prefer sorting the output of a forward scan over performing a reverse scan
(when there is no need to sort the output of the reverse scan).

Release note (performance improvement): The optimizer now prefers
performing a reverse scan over a forward scan + sort if the reverse
scan eliminates the need for a sort and the plans are otherwise
equivalent. This was the case before in most cases, but some edge
cases with a small number of rows have been fixed.

**opt: add cost penalty for scans with large cardinality**

This commit adds a new cost function, `largeCardinalityRowCountPenalty`,
which calculates a penalty that should be added to the row count of scans.
It is non-zero for expressions with unbounded maximum cardinality or with
maximum cardinality exceeding the row count estimate. Adding a few rows
worth of cost helps prevent surprising plans for very small tables or for
when stats are stale.

Fixes #64570

Release note (performance improvement): When choosing between index
scans that are estimated to have the same number of rows, the optimizer
now prefers indexes for which it has higher certainty about the maximum
number of rows over indexes for which there is more uncertainty in the
estimated row count. This helps to avoid choosing suboptimal plans for
small tables or if the statistics are stale.

Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
@craig craig bot closed this as completed in ef5135d Jun 28, 2021
Cluster Observability automation moved this from Quick Win to Done Jun 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-console-timeseries SQL Observability issues related to console timeseries metrics and views A-sql-observability Related to observability of the SQL layer C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) E-quick-win Likely to be a quick win for someone experienced.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants