sql: create new system observability tables and update job #104714

koorosh · 2023-06-12T08:51:51Z

This commit is the combination of two separate work streams,
brought together for resolving logic test fallout simultaneously.

The first, authored by @koorosh is the creation of
system.transaction_exec_insights and system.statement_exec_insights.

The second, authored by @zachlite in #111365 is the creation of
system.mvcc_statistics and the MVCCStatisticsUpdate job.

Regarding persisted insights:
Before, this data was kept in memory only and tracked limited
number of latest insights. These tables will be used to persist
this data periodically.

Tables allow to store the same information as in memory insights
without aggregation.

To control the amount of data stored in tables, there will be
follow up PR to run GC job and prune old records.

To make tables flexible to changes when some columns might become
obsolete, most of the columns defined as nullable.

Regarding persisted MVCC Statistics:
The system.mvcc_statistics table stores historical mvcc data
for a tenant's SQL objects. It's purpose it to serve mvcc data for a
SQL object quickly - The span stats API is too slow to use in a hot path.
Storing data over time unlocks new use cases like showing a table or
index's accumulated garbage over time.

The MVCCStatisticsUpdate Job is responsible for managing the contents of
the table, decoupled from the read-hotpath.

Both the table and job are baked when a cluster bootstraps itself, or upgrades
itself from a previous version.

This PR supersedes #111365 with the following changes:

Descriptor fixes to the mvcc_statistics table. No logical changes,
just housekeeping to make sure that the create table schema and descriptors
produce the same table.
Fixes to the job to make sure the job system can wind down.

Partially resolves: #104582
Epic: CRDB-25491
Release note: None

blathers-crl · 2023-06-12T08:51:56Z

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2023-06-12T08:52:01Z

This change is

j82w

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @gtr, @koorosh, @maryliag, @THardy98, @xinhaoz, and @zachlite)

pkg/sql/catalog/systemschema/system.go line 997 at r1 (raw file):

	InsightsTableSchema = `
	CREATE TABLE system.insights (

We need 2 tables to match the current insights. 1 for transaction and 1 for statements. I think it will be to much data for 1 table and it will make the queries painful because the data available for txn vs stmt is to different.

pkg/sql/catalog/systemschema/system.go line 998 at r1 (raw file):

	InsightsTableSchema = `
	CREATE TABLE system.insights (
		id UUID DEFAULT gen_random_uuid(),

How about using the fingerprint + start time for the primary key? Using a random guid isn't useful and will cause lookup queries to find the id.

pkg/sql/catalog/systemschema/system.go line 999 at r1 (raw file):

	CREATE TABLE system.insights (
		id UUID DEFAULT gen_random_uuid(),
		insight JSONB NOT NULL,

Why not model the tables off the current insights virtual tables, and have a JSON column for additional context to make it easy to expand in the future? The problem with JSON blob is indexes. For example on the statement_statistics a bunch of virtual columns had to be created to add necessary indexes.

cockroach/pkg/sql/catalog/systemschema/system.go

Line 544 in f14710f

indexes_usage JSONB AS (` + indexUsageComputeExpr + `) VIRTUAL,

koorosh

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @gtr, @j82w, @maryliag, @THardy98, @xinhaoz, and @zachlite)

pkg/sql/catalog/systemschema/system.go line 997 at r1 (raw file):

Previously, j82w (Jake) wrote…

We need to tables to match the current insights. 1 for transaction and 1 for statements. I think it will be to much data for 1 table and it will make the queries painful because the data available for txn vs stmt is to different.

Keeping insights for transactions and statements in separate tables loses context of what statements were executed within a particular transaction. To retain this info, we would need to join these tables and that's also
affects performance.
I'd prefer to reflect the table structure based on the Insight proto message (pkg/sql/sqlstats/insights/insights.proto) rather than based on current usage on the UI.

I agree that storing everything in JSON format looks odd and doesn't look right, but feel it might be okay for read only information that periodically removed and provides flexibility for possible changes.

pkg/sql/catalog/systemschema/system.go line 998 at r1 (raw file):

Previously, j82w (Jake) wrote…

How about using the fingerprint + start time for the primary key? Using a random guid isn't useful and will cause lookup queries to find the id.

Agree, id field should be better defined.

pkg/sql/catalog/systemschema/system.go line 999 at r1 (raw file):
Do we need to index other fields in addition to start/end time?
start_time and end_time defined as computed stored fields rather than virtual fields to avoid problems with indexes.

Why not model the tables off the current insights virtual tables,

do we need well normalised data here? Created views still lose some strictness on fields like Problem, Cause (they're treated as strings).
I suggest that both transaction and statements insights are used all together and accomplish each other, ie to not lose the context they were executed in.

j82w · 2023-06-12T14:54:26Z

pkg/sql/catalog/systemschema/system.go line 997 at r1 (raw file):

Keeping insights for transactions and statements in separate tables loses context of what statements were executed within a particular transaction. To retain this info, we would need to join these tables and that's also affects performance.

Why would need to join the tables? The transaction and statement pages are separate. Each table should have the necessary context to load the individual pages. The statement has the transaction id so a simple filter can get all the stmt info if necessary for a transaction id if needed for the details page.

I'd prefer to reflect the table structure based on the Insight proto message (pkg/sql/sqlstats/insights/insights.proto) rather than based on current usage on the UI.

insights.proto has two separate messages. One for statement and transaction id. Storing all the transaction and statement info into a single json blob will likely cause issues. It's a lot of data because it has the query text for each statement, and not sure what the max size is.

I agree that storing everything in JSON format looks odd and doesn't look right, but feel it might be okay for read only information that periodically removed and provides flexibility for possible changes.

It's about discoverability and usability. There is no way to see what information is in that JSON blob. It makes it hard to discover what information is available and it makes writing queries expensive, painful, and error prone because it's not clear if a field is optional or always available.

j82w · 2023-06-12T15:04:37Z

pkg/sql/catalog/systemschema/system.go line 999 at r1 (raw file):

Do we need to index other fields in addition to start/end time?
start_time and end_time defined as computed stored fields rather than virtual fields to avoid problems with indexes.

Yes, because it's to much data. Returning 5k rows isn't helpful to users. The UI only show the latest insight per a fingerprint if I remember correctly. I can see user wanting to filter by the problem or cause like they only want to see scenarios with high contention or missing indexes. It will be to much data to return all of it to the UI just like it was for the statement/transaction statistics.

do we need well normalised data here? Created views still lose some strictness on fields like Problem, Cause (they're treated as strings).

For known columns that won't change like fingerprint_id I think it's worth having a separate column for the ability to index and the additional validation from the column constraints like not null. I do agree that some of the fields like problem/causes might be worth grouping into a json column for more flexibility.

koorosh · 2023-08-03T12:31:23Z

Updated tables to include all available fields in statement and txn insights. It is assumed insights will be persisted without aggregation.
Added created filed to track inserting time of record
Defined ttl_expiration_expression to delete rows after 14 days (should be discussed how long do we need to persist insights)
Defined fields match to defined fields in crdb_internal.cluster_txn_execution_insights and crdb_internal.cluster_execution_insights vtables with exception for problem and cause enums - these fields defined as INT8 to match enum ids instead of string values.

cc @j82w , @maryliag

j82w

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @maryliag, @rafiss, @THardy98, @xinhaoz, @yuzefovich, and @zachlite)

blathers-crl · 2023-10-04T20:25:31Z

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
Please ensure your git commit message contains a release note.
When CI has completed, please ensure no errors have appeared.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

blathers-crl · 2023-10-04T20:32:55Z

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
Please ensure your git commit message contains a release note.
When CI has completed, please ensure no errors have appeared.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

blathers-crl · 2023-10-05T09:53:11Z

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
Please ensure your git commit message contains a release note.
When CI has completed, please ensure no errors have appeared.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

maryliag · 2023-10-05T13:14:33Z

please squash all commits before merging

blathers-crl · 2023-10-05T16:18:33Z

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
Please ensure your git commit message contains a release note.
When CI has completed, please ensure no errors have appeared.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

j82w

just make sure there are tests that query all the new tables.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @maryliag, @rafiss, @THardy98, @xinhaoz, @yuzefovich, and @zachlite)

@koorosh

This commit is the combination of two separate work streams, brought together for resolving logic test fallout simultaneously. The first, authored by @koorosh is the creation of system.transaction_exec_insights and system.statement_exec_insights. The second, authored by @zachlite in cockroachdb#111365 is the creation of system.mvcc_statistics and the MVCCStatisticsUpdate job. Regarding persisted insights: Before, this data was kept in memory only and tracked limited number of latest insights. These tables will be used to persist this data periodically. Tables allow to store the same information as in memory insights without aggregation. To control the amount of data stored in tables, there will be follow up PR to run GC job and prune old records. To make tables flexible to changes when some columns might become obsolete, most of the columns defined as nullable. Regarding persisted MVCC Statistics: The system.mvcc_statistics table stores historical mvcc data for a tenant's SQL objects. It's purpose it to serve mvcc data for a SQL object quickly - The span stats API is too slow to use in a hot path. Storing data over time unlocks new use cases like showing a table or index's accumulated garbage over time. The MVCCStatisticsUpdate Job is responsible for managing the contents of the table, decoupled from the read-hotpath. Both the table and job are baked when a cluster bootstraps itself, or upgrades itself from a previous version. This PR supersedes cockroachdb#111365 with the following changes: - Descriptor fixes to the mvcc_statistics table. No logical changes, just housekeeping to make sure that the create table schema and descriptors produce the same table. - Fixes to the job to make sure the job system can wind down. Partially resolves: cockroachdb#104582 Epic: CRDB-25491 Release note: None

koorosh

Done.

Added tests to query mvcc_statistics table and presence of MVCCStatisticsJob job in system.jobs table;
Tests for insights tables were added before.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @j82w, @maryliag, @rafiss, @THardy98, @xinhaoz, @yuzefovich, and @zachlite)

koorosh · 2023-10-06T14:27:51Z

bors r+

craig · 2023-10-06T15:19:36Z

Build succeeded:

Bazel Essential CI (Cockroach)

yuzefovich

I have a couple of drive-by comments.

yuzefovich · 2023-10-06T16:19:19Z

pkg/sql/catalog/systemschema/system.go

@@ -3946,7 +4090,7 @@ var (
 				KeyColumnNames: []string{"id"},
 				KeyColumnDirections: []catenumpb.IndexColumn_Direction{
 					catenumpb.IndexColumn_ASC,
-					//catenumpb.IndexColumn_ASC,
+					// catenumpb.IndexColumn_ASC,


nit: this line should be removed.

yuzefovich · 2023-10-06T16:22:20Z

pkg/sql/logictest/testdata/logic_test/zone_config_system_tenant

@@ -130,8 +130,6 @@ FROM system.span_configurations
 WHERE end_key > (SELECT crdb_internal.table_span($t_id)[1])
 ORDER BY start_key
 ----
-/Table/110  {"gcPolicy": {"ttlSeconds": 90001}, "numReplicas": 3, "rangeMaxBytes": "67108864", "rangeMinBytes": "1048576"}


The removal of these two lines seems suspicious - why did it happen?

Thanks for catching this! --rewrite logic tests sometimes removes unrelated parts of test data and I didn't notice this one. It will brought back within this PR #111920

- revert removed rows by `--rewrite` from `zone_config_system_tenant` test data file; - Clean up unnecessary commented code; Related to PR cockroachdb#104714 Release note: None

111920: sql: revert removed auto generated test data r=koorosh a=koorosh - revert removed rows by `--rewrite` from `zone_config_system_tenant` test data file; - clean up unnecessary commented code; Related to PR #104714 Release note: None Release justification: non-production code changes Epic: None 111922: gcjob_test: use medium pool for RBE r=rail a=rickystewart Epic: CRDB-8308 Release note: None Co-authored-by: Andrii Vorobiov <and.vorobiov@gmail.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>

koorosh requested review from maryliag, zachlite, j82w, THardy98, xinhaoz and gtr June 12, 2023 08:51

blathers-crl bot added the O-community Originated from the community label Jun 12, 2023

j82w reviewed Jun 12, 2023

View reviewed changes

koorosh commented Jun 12, 2023

View reviewed changes

koorosh force-pushed the sql-add-system-insights-table branch 2 times, most recently from 372c906 to e8d454b Compare June 29, 2023 10:27

koorosh changed the title ~~WIP. sql: create system.insights table~~ WIP. sql: create system.txn_exec_insights and system.stmnt_exec_insights tables Jun 29, 2023

koorosh force-pushed the sql-add-system-insights-table branch 4 times, most recently from 03260f4 to 2c8c165 Compare August 3, 2023 12:00

koorosh force-pushed the sql-add-system-insights-table branch from 2c8c165 to 45abd15 Compare August 4, 2023 07:48

cockroachdb deleted a comment from blathers-crl bot Aug 4, 2023

koorosh force-pushed the sql-add-system-insights-table branch from 45abd15 to d47335a Compare August 8, 2023 08:21

koorosh marked this pull request as ready for review August 8, 2023 17:05

koorosh requested review from a team as code owners August 8, 2023 17:05

cockroachdb deleted a comment from blathers-crl bot Oct 3, 2023

j82w approved these changes Oct 4, 2023

View reviewed changes

koorosh force-pushed the sql-add-system-insights-table branch from 45811d6 to fa3a5cf Compare October 4, 2023 12:40

koorosh requested a review from a team as a code owner October 4, 2023 12:40

koorosh force-pushed the sql-add-system-insights-table branch from d2772fc to bb87c48 Compare October 5, 2023 10:15

cockroachdb deleted a comment from blathers-crl bot Oct 5, 2023

maryliag mentioned this pull request Oct 5, 2023

sql: create reset_insights_tables builtin #111833

Merged

cockroachdb deleted a comment from blathers-crl bot Oct 5, 2023

zachlite force-pushed the sql-add-system-insights-table branch from 655a906 to 0522d46 Compare October 5, 2023 21:26

j82w approved these changes Oct 5, 2023

View reviewed changes

zachlite changed the title ~~sql: create system.transaction_exec_insights and statement_exec_insights tables~~ sql: create new system observability tables and update job Oct 5, 2023

zachlite mentioned this pull request Oct 5, 2023

sql: create system.mvcc_statistics table and update job #111365

Closed

koorosh force-pushed the sql-add-system-insights-table branch from 0522d46 to a87f0c7 Compare October 6, 2023 09:34

koorosh commented Oct 6, 2023

View reviewed changes

craig bot merged commit 3b438b4 into cockroachdb:master Oct 6, 2023
7 of 8 checks passed

yuzefovich reviewed Oct 6, 2023

View reviewed changes

koorosh mentioned this pull request Oct 6, 2023

sql: revert removed auto generated test data #111920

Merged

RaduBerinde mentioned this pull request Apr 7, 2024

sql: add a test that verifies that the system database schema version is correct after upgrade #121914

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: create new system observability tables and update job #104714

sql: create new system observability tables and update job #104714

koorosh commented Jun 12, 2023 •

edited by zachlite

blathers-crl bot commented Jun 12, 2023

cockroach-teamcity commented Jun 12, 2023

j82w left a comment •

edited

koorosh left a comment

j82w commented Jun 12, 2023

j82w commented Jun 12, 2023

koorosh commented Aug 3, 2023

j82w left a comment

blathers-crl bot commented Oct 4, 2023

blathers-crl bot commented Oct 4, 2023

blathers-crl bot commented Oct 5, 2023

maryliag commented Oct 5, 2023

blathers-crl bot commented Oct 5, 2023

j82w left a comment

koorosh left a comment

koorosh commented Oct 6, 2023

craig bot commented Oct 6, 2023

yuzefovich left a comment

yuzefovich Oct 6, 2023

yuzefovich Oct 6, 2023

koorosh Oct 6, 2023

sql: create new system observability tables and update job #104714

sql: create new system observability tables and update job #104714

Conversation

koorosh commented Jun 12, 2023 • edited by zachlite

blathers-crl bot commented Jun 12, 2023

cockroach-teamcity commented Jun 12, 2023

j82w left a comment • edited

Choose a reason for hiding this comment

koorosh left a comment

Choose a reason for hiding this comment

j82w commented Jun 12, 2023

j82w commented Jun 12, 2023

koorosh commented Aug 3, 2023

j82w left a comment

Choose a reason for hiding this comment

blathers-crl bot commented Oct 4, 2023

blathers-crl bot commented Oct 4, 2023

blathers-crl bot commented Oct 5, 2023

maryliag commented Oct 5, 2023

blathers-crl bot commented Oct 5, 2023

j82w left a comment

Choose a reason for hiding this comment

koorosh left a comment

Choose a reason for hiding this comment

koorosh commented Oct 6, 2023

craig bot commented Oct 6, 2023

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich Oct 6, 2023

Choose a reason for hiding this comment

yuzefovich Oct 6, 2023

Choose a reason for hiding this comment

koorosh Oct 6, 2023

Choose a reason for hiding this comment

koorosh commented Jun 12, 2023 •

edited by zachlite

j82w left a comment •

edited