Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: create new system observability tables and update job #104714

Merged
merged 1 commit into from Oct 6, 2023

Conversation

koorosh
Copy link
Collaborator

@koorosh koorosh commented Jun 12, 2023

This commit is the combination of two separate work streams,
brought together for resolving logic test fallout simultaneously.

The first, authored by @koorosh is the creation of
system.transaction_exec_insights and system.statement_exec_insights.

The second, authored by @zachlite in #111365 is the creation of
system.mvcc_statistics and the MVCCStatisticsUpdate job.

Regarding persisted insights:
Before, this data was kept in memory only and tracked limited
number of latest insights. These tables will be used to persist
this data periodically.

Tables allow to store the same information as in memory insights
without aggregation.

To control the amount of data stored in tables, there will be
follow up PR to run GC job and prune old records.

To make tables flexible to changes when some columns might become
obsolete, most of the columns defined as nullable.

Regarding persisted MVCC Statistics:
The system.mvcc_statistics table stores historical mvcc data
for a tenant's SQL objects. It's purpose it to serve mvcc data for a
SQL object quickly - The span stats API is too slow to use in a hot path.
Storing data over time unlocks new use cases like showing a table or
index's accumulated garbage over time.

The MVCCStatisticsUpdate Job is responsible for managing the contents of
the table, decoupled from the read-hotpath.

Both the table and job are baked when a cluster bootstraps itself, or upgrades
itself from a previous version.

This PR supersedes #111365 with the following changes:

  • Descriptor fixes to the mvcc_statistics table. No logical changes,
    just housekeeping to make sure that the create table schema and descriptors
    produce the same table.
  • Fixes to the job to make sure the job system can wind down.

Partially resolves: #104582
Epic: CRDB-25491
Release note: None

@blathers-crl
Copy link

blathers-crl bot commented Jun 12, 2023

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added the O-community Originated from the community label Jun 12, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@j82w j82w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @gtr, @koorosh, @maryliag, @THardy98, @xinhaoz, and @zachlite)


pkg/sql/catalog/systemschema/system.go line 997 at r1 (raw file):

	InsightsTableSchema = `
	CREATE TABLE system.insights (

We need 2 tables to match the current insights. 1 for transaction and 1 for statements. I think it will be to much data for 1 table and it will make the queries painful because the data available for txn vs stmt is to different.


pkg/sql/catalog/systemschema/system.go line 998 at r1 (raw file):

	InsightsTableSchema = `
	CREATE TABLE system.insights (
		id UUID DEFAULT gen_random_uuid(),

How about using the fingerprint + start time for the primary key? Using a random guid isn't useful and will cause lookup queries to find the id.


pkg/sql/catalog/systemschema/system.go line 999 at r1 (raw file):

	CREATE TABLE system.insights (
		id UUID DEFAULT gen_random_uuid(),
		insight JSONB NOT NULL,

Why not model the tables off the current insights virtual tables, and have a JSON column for additional context to make it easy to expand in the future? The problem with JSON blob is indexes. For example on the statement_statistics a bunch of virtual columns had to be created to add necessary indexes.

indexes_usage JSONB AS (` + indexUsageComputeExpr + `) VIRTUAL,

Copy link
Collaborator Author

@koorosh koorosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @gtr, @j82w, @maryliag, @THardy98, @xinhaoz, and @zachlite)


pkg/sql/catalog/systemschema/system.go line 997 at r1 (raw file):

Previously, j82w (Jake) wrote…

We need to tables to match the current insights. 1 for transaction and 1 for statements. I think it will be to much data for 1 table and it will make the queries painful because the data available for txn vs stmt is to different.

Keeping insights for transactions and statements in separate tables loses context of what statements were executed within a particular transaction. To retain this info, we would need to join these tables and that's also
affects performance.
I'd prefer to reflect the table structure based on the Insight proto message (pkg/sql/sqlstats/insights/insights.proto) rather than based on current usage on the UI.

I agree that storing everything in JSON format looks odd and doesn't look right, but feel it might be okay for read only information that periodically removed and provides flexibility for possible changes.


pkg/sql/catalog/systemschema/system.go line 998 at r1 (raw file):

Previously, j82w (Jake) wrote…

How about using the fingerprint + start time for the primary key? Using a random guid isn't useful and will cause lookup queries to find the id.

Agree, id field should be better defined.


pkg/sql/catalog/systemschema/system.go line 999 at r1 (raw file):
Do we need to index other fields in addition to start/end time?
start_time and end_time defined as computed stored fields rather than virtual fields to avoid problems with indexes.

Why not model the tables off the current insights virtual tables,

  1. do we need well normalised data here? Created views still lose some strictness on fields like Problem, Cause (they're treated as strings).
  2. I suggest that both transaction and statements insights are used all together and accomplish each other, ie to not lose the context they were executed in.

@j82w
Copy link
Contributor

j82w commented Jun 12, 2023

pkg/sql/catalog/systemschema/system.go line 997 at r1 (raw file):

Keeping insights for transactions and statements in separate tables loses context of what statements were executed within a particular transaction. To retain this info, we would need to join these tables and that's also affects performance.

Why would need to join the tables? The transaction and statement pages are separate. Each table should have the necessary context to load the individual pages. The statement has the transaction id so a simple filter can get all the stmt info if necessary for a transaction id if needed for the details page.

I'd prefer to reflect the table structure based on the Insight proto message (pkg/sql/sqlstats/insights/insights.proto) rather than based on current usage on the UI.

insights.proto has two separate messages. One for statement and transaction id. Storing all the transaction and statement info into a single json blob will likely cause issues. It's a lot of data because it has the query text for each statement, and not sure what the max size is.

I agree that storing everything in JSON format looks odd and doesn't look right, but feel it might be okay for read only information that periodically removed and provides flexibility for possible changes.

It's about discoverability and usability. There is no way to see what information is in that JSON blob. It makes it hard to discover what information is available and it makes writing queries expensive, painful, and error prone because it's not clear if a field is optional or always available.

@j82w
Copy link
Contributor

j82w commented Jun 12, 2023

pkg/sql/catalog/systemschema/system.go line 999 at r1 (raw file):

Do we need to index other fields in addition to start/end time?
start_time and end_time defined as computed stored fields rather than virtual fields to avoid problems with indexes.

Yes, because it's to much data. Returning 5k rows isn't helpful to users. The UI only show the latest insight per a fingerprint if I remember correctly. I can see user wanting to filter by the problem or cause like they only want to see scenarios with high contention or missing indexes. It will be to much data to return all of it to the UI just like it was for the statement/transaction statistics.

  1. do we need well normalised data here? Created views still lose some strictness on fields like Problem, Cause (they're treated as strings).

For known columns that won't change like fingerprint_id I think it's worth having a separate column for the ability to index and the additional validation from the column constraints like not null. I do agree that some of the fields like problem/causes might be worth grouping into a json column for more flexibility.

@koorosh koorosh force-pushed the sql-add-system-insights-table branch 2 times, most recently from 372c906 to e8d454b Compare June 29, 2023 10:27
@koorosh koorosh changed the title WIP. sql: create system.insights table WIP. sql: create system.txn_exec_insights and system.stmnt_exec_insights tables Jun 29, 2023
@koorosh koorosh force-pushed the sql-add-system-insights-table branch 4 times, most recently from 03260f4 to 2c8c165 Compare August 3, 2023 12:00
@koorosh
Copy link
Collaborator Author

koorosh commented Aug 3, 2023

  • Updated tables to include all available fields in statement and txn insights. It is assumed insights will be persisted without aggregation.
  • Added created filed to track inserting time of record
  • Defined ttl_expiration_expression to delete rows after 14 days (should be discussed how long do we need to persist insights)
  • Defined fields match to defined fields in crdb_internal.cluster_txn_execution_insights and crdb_internal.cluster_execution_insights vtables with exception for problem and cause enums - these fields defined as INT8 to match enum ids instead of string values.

cc @j82w , @maryliag

@koorosh koorosh force-pushed the sql-add-system-insights-table branch from 2c8c165 to 45abd15 Compare August 4, 2023 07:48
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Aug 4, 2023
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Aug 4, 2023
@koorosh koorosh force-pushed the sql-add-system-insights-table branch from 45abd15 to d47335a Compare August 8, 2023 08:21
@koorosh koorosh marked this pull request as ready for review August 8, 2023 17:05
@koorosh koorosh requested review from a team as code owners August 8, 2023 17:05
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Oct 3, 2023
Copy link
Contributor

@j82w j82w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag, @rafiss, @THardy98, @xinhaoz, @yuzefovich, and @zachlite)

@koorosh koorosh force-pushed the sql-add-system-insights-table branch from 45811d6 to fa3a5cf Compare October 4, 2023 12:40
@koorosh koorosh requested a review from a team as a code owner October 4, 2023 12:40
@blathers-crl
Copy link

blathers-crl bot commented Oct 4, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

2 similar comments
@blathers-crl
Copy link

blathers-crl bot commented Oct 4, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl
Copy link

blathers-crl bot commented Oct 5, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@koorosh koorosh force-pushed the sql-add-system-insights-table branch from d2772fc to bb87c48 Compare October 5, 2023 10:15
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Oct 5, 2023
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Oct 5, 2023
@maryliag
Copy link
Contributor

maryliag commented Oct 5, 2023

please squash all commits before merging

@blathers-crl
Copy link

blathers-crl bot commented Oct 5, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@zachlite zachlite force-pushed the sql-add-system-insights-table branch from 655a906 to 0522d46 Compare October 5, 2023 21:26
Copy link
Contributor

@j82w j82w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: just make sure there are tests that query all the new tables.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @maryliag, @rafiss, @THardy98, @xinhaoz, @yuzefovich, and @zachlite)

@zachlite zachlite changed the title sql: create system.transaction_exec_insights and statement_exec_insights tables sql: create new system observability tables and update job Oct 5, 2023
This commit is the combination of two separate work streams,
brought together for resolving logic test fallout simultaneously.

The first, authored by @koorosh is the creation of
system.transaction_exec_insights and system.statement_exec_insights.

The second, authored by @zachlite in cockroachdb#111365 is the creation of
system.mvcc_statistics and the MVCCStatisticsUpdate job.

Regarding persisted insights:
Before, this data was kept in memory only and tracked limited
number of latest insights. These tables will be used to persist
this data periodically.

Tables allow to store the same information as in memory insights
without aggregation.

To control the amount of data stored in tables, there will be
follow up PR to run GC job and prune old records.

To make tables flexible to changes when some columns might become
obsolete, most of the columns defined as nullable.

Regarding persisted MVCC Statistics:
The system.mvcc_statistics table stores historical mvcc data
for a tenant's SQL objects. It's purpose it to serve mvcc data for a
SQL object quickly - The span stats API is too slow to use in a hot path.
Storing data over time unlocks new use cases like showing a table or
index's accumulated garbage over time.

The MVCCStatisticsUpdate Job is responsible for managing the contents of
the table, decoupled from the read-hotpath.

Both the table and job are baked when a cluster bootstraps itself, or upgrades
itself from a previous version.

This PR supersedes cockroachdb#111365 with the following changes:
- Descriptor fixes to the mvcc_statistics table. No logical changes,
just housekeeping to make sure that the create table schema and descriptors
produce the same table.
- Fixes to the job to make sure the job system can wind down.

Partially resolves: cockroachdb#104582
Epic: CRDB-25491
Release note: None
@koorosh koorosh force-pushed the sql-add-system-insights-table branch from 0522d46 to a87f0c7 Compare October 6, 2023 09:34
Copy link
Collaborator Author

@koorosh koorosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

  • Added tests to query mvcc_statistics table and presence of MVCCStatisticsJob job in system.jobs table;
  • Tests for insights tables were added before.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @j82w, @maryliag, @rafiss, @THardy98, @xinhaoz, @yuzefovich, and @zachlite)

@koorosh
Copy link
Collaborator Author

koorosh commented Oct 6, 2023

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 6, 2023

Build succeeded:

@craig craig bot merged commit 3b438b4 into cockroachdb:master Oct 6, 2023
7 of 8 checks passed
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of drive-by comments.

@@ -3946,7 +4090,7 @@ var (
KeyColumnNames: []string{"id"},
KeyColumnDirections: []catenumpb.IndexColumn_Direction{
catenumpb.IndexColumn_ASC,
//catenumpb.IndexColumn_ASC,
// catenumpb.IndexColumn_ASC,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this line should be removed.

@@ -130,8 +130,6 @@ FROM system.span_configurations
WHERE end_key > (SELECT crdb_internal.table_span($t_id)[1])
ORDER BY start_key
----
/Table/110 {"gcPolicy": {"ttlSeconds": 90001}, "numReplicas": 3, "rangeMaxBytes": "67108864", "rangeMinBytes": "1048576"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of these two lines seems suspicious - why did it happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! --rewrite logic tests sometimes removes unrelated parts of test data and I didn't notice this one. It will brought back within this PR #111920

koorosh added a commit to koorosh/cockroach that referenced this pull request Oct 6, 2023
- revert removed rows by `--rewrite` from `zone_config_system_tenant` test data file;
- Clean up unnecessary commented code;

Related to PR cockroachdb#104714

Release note: None
craig bot pushed a commit that referenced this pull request Oct 6, 2023
111920: sql: revert removed auto generated test data r=koorosh a=koorosh

- revert removed rows by `--rewrite` from `zone_config_system_tenant` test data file;
- clean up unnecessary commented code;

Related to PR #104714

Release note: None

Release justification: non-production code changes

Epic: None

111922: gcjob_test: use medium pool for RBE r=rail a=rickystewart

Epic: CRDB-8308
Release note: None

Co-authored-by: Andrii Vorobiov <and.vorobiov@gmail.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-community Originated from the community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Persist Insights to help users correlate problematic signals over time
7 participants