New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: create new system observability tables and update job #104714
sql: create new system observability tables and update job #104714
Conversation
Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @gtr, @koorosh, @maryliag, @THardy98, @xinhaoz, and @zachlite)
pkg/sql/catalog/systemschema/system.go
line 997 at r1 (raw file):
InsightsTableSchema = ` CREATE TABLE system.insights (
We need 2 tables to match the current insights. 1 for transaction and 1 for statements. I think it will be to much data for 1 table and it will make the queries painful because the data available for txn vs stmt is to different.
pkg/sql/catalog/systemschema/system.go
line 998 at r1 (raw file):
InsightsTableSchema = ` CREATE TABLE system.insights ( id UUID DEFAULT gen_random_uuid(),
How about using the fingerprint + start time for the primary key? Using a random guid isn't useful and will cause lookup queries to find the id.
pkg/sql/catalog/systemschema/system.go
line 999 at r1 (raw file):
CREATE TABLE system.insights ( id UUID DEFAULT gen_random_uuid(), insight JSONB NOT NULL,
Why not model the tables off the current insights virtual tables, and have a JSON column for additional context to make it easy to expand in the future? The problem with JSON blob is indexes. For example on the statement_statistics a bunch of virtual columns had to be created to add necessary indexes.
indexes_usage JSONB AS (` + indexUsageComputeExpr + `) VIRTUAL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @gtr, @j82w, @maryliag, @THardy98, @xinhaoz, and @zachlite)
pkg/sql/catalog/systemschema/system.go
line 997 at r1 (raw file):
Previously, j82w (Jake) wrote…
We need to tables to match the current insights. 1 for transaction and 1 for statements. I think it will be to much data for 1 table and it will make the queries painful because the data available for txn vs stmt is to different.
Keeping insights for transactions and statements in separate tables loses context of what statements were executed within a particular transaction. To retain this info, we would need to join these tables and that's also
affects performance.
I'd prefer to reflect the table structure based on the Insight
proto message (pkg/sql/sqlstats/insights/insights.proto
) rather than based on current usage on the UI.
I agree that storing everything in JSON format looks odd and doesn't look right, but feel it might be okay for read only information that periodically removed and provides flexibility for possible changes.
pkg/sql/catalog/systemschema/system.go
line 998 at r1 (raw file):
Previously, j82w (Jake) wrote…
How about using the fingerprint + start time for the primary key? Using a random guid isn't useful and will cause lookup queries to find the id.
Agree, id
field should be better defined.
pkg/sql/catalog/systemschema/system.go
line 999 at r1 (raw file):
Do we need to index other fields in addition to start
/end
time?
start_time
and end_time
defined as computed stored fields rather than virtual fields to avoid problems with indexes.
Why not model the tables off the current insights virtual tables,
- do we need well normalised data here? Created views still lose some strictness on fields like
Problem
,Cause
(they're treated as strings). - I suggest that both transaction and statements insights are used all together and accomplish each other, ie to not lose the context they were executed in.
Why would need to join the tables? The transaction and statement pages are separate. Each table should have the necessary context to load the individual pages. The statement has the transaction id so a simple filter can get all the stmt info if necessary for a transaction id if needed for the details page.
insights.proto has two separate messages. One for statement and transaction id. Storing all the transaction and statement info into a single json blob will likely cause issues. It's a lot of data because it has the query text for each statement, and not sure what the max size is.
It's about discoverability and usability. There is no way to see what information is in that JSON blob. It makes it hard to discover what information is available and it makes writing queries expensive, painful, and error prone because it's not clear if a field is optional or always available. |
Yes, because it's to much data. Returning 5k rows isn't helpful to users. The UI only show the latest insight per a fingerprint if I remember correctly. I can see user wanting to filter by the problem or cause like they only want to see scenarios with high contention or missing indexes. It will be to much data to return all of it to the UI just like it was for the statement/transaction statistics.
For known columns that won't change like fingerprint_id I think it's worth having a separate column for the ability to index and the additional validation from the column constraints like not null. I do agree that some of the fields like problem/causes might be worth grouping into a json column for more flexibility. |
372c906
to
e8d454b
Compare
03260f4
to
2c8c165
Compare
|
2c8c165
to
45abd15
Compare
45abd15
to
d47335a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @maryliag, @rafiss, @THardy98, @xinhaoz, @yuzefovich, and @zachlite)
45811d6
to
fa3a5cf
Compare
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
2 similar comments
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
d2772fc
to
bb87c48
Compare
please squash all commits before merging |
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
655a906
to
0522d46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just make sure there are tests that query all the new tables.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @maryliag, @rafiss, @THardy98, @xinhaoz, @yuzefovich, and @zachlite)
This commit is the combination of two separate work streams, brought together for resolving logic test fallout simultaneously. The first, authored by @koorosh is the creation of system.transaction_exec_insights and system.statement_exec_insights. The second, authored by @zachlite in cockroachdb#111365 is the creation of system.mvcc_statistics and the MVCCStatisticsUpdate job. Regarding persisted insights: Before, this data was kept in memory only and tracked limited number of latest insights. These tables will be used to persist this data periodically. Tables allow to store the same information as in memory insights without aggregation. To control the amount of data stored in tables, there will be follow up PR to run GC job and prune old records. To make tables flexible to changes when some columns might become obsolete, most of the columns defined as nullable. Regarding persisted MVCC Statistics: The system.mvcc_statistics table stores historical mvcc data for a tenant's SQL objects. It's purpose it to serve mvcc data for a SQL object quickly - The span stats API is too slow to use in a hot path. Storing data over time unlocks new use cases like showing a table or index's accumulated garbage over time. The MVCCStatisticsUpdate Job is responsible for managing the contents of the table, decoupled from the read-hotpath. Both the table and job are baked when a cluster bootstraps itself, or upgrades itself from a previous version. This PR supersedes cockroachdb#111365 with the following changes: - Descriptor fixes to the mvcc_statistics table. No logical changes, just housekeeping to make sure that the create table schema and descriptors produce the same table. - Fixes to the job to make sure the job system can wind down. Partially resolves: cockroachdb#104582 Epic: CRDB-25491 Release note: None
0522d46
to
a87f0c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
- Added tests to query
mvcc_statistics
table and presence ofMVCCStatisticsJob
job insystem.jobs
table; - Tests for insights tables were added before.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @j82w, @maryliag, @rafiss, @THardy98, @xinhaoz, @yuzefovich, and @zachlite)
bors r+ |
Build succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of drive-by comments.
@@ -3946,7 +4090,7 @@ var ( | |||
KeyColumnNames: []string{"id"}, | |||
KeyColumnDirections: []catenumpb.IndexColumn_Direction{ | |||
catenumpb.IndexColumn_ASC, | |||
//catenumpb.IndexColumn_ASC, | |||
// catenumpb.IndexColumn_ASC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this line should be removed.
@@ -130,8 +130,6 @@ FROM system.span_configurations | |||
WHERE end_key > (SELECT crdb_internal.table_span($t_id)[1]) | |||
ORDER BY start_key | |||
---- | |||
/Table/110 {"gcPolicy": {"ttlSeconds": 90001}, "numReplicas": 3, "rangeMaxBytes": "67108864", "rangeMinBytes": "1048576"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of these two lines seems suspicious - why did it happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this! --rewrite
logic tests sometimes removes unrelated parts of test data and I didn't notice this one. It will brought back within this PR #111920
- revert removed rows by `--rewrite` from `zone_config_system_tenant` test data file; - Clean up unnecessary commented code; Related to PR cockroachdb#104714 Release note: None
111920: sql: revert removed auto generated test data r=koorosh a=koorosh - revert removed rows by `--rewrite` from `zone_config_system_tenant` test data file; - clean up unnecessary commented code; Related to PR #104714 Release note: None Release justification: non-production code changes Epic: None 111922: gcjob_test: use medium pool for RBE r=rail a=rickystewart Epic: CRDB-8308 Release note: None Co-authored-by: Andrii Vorobiov <and.vorobiov@gmail.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
This commit is the combination of two separate work streams,
brought together for resolving logic test fallout simultaneously.
The first, authored by @koorosh is the creation of
system.transaction_exec_insights and system.statement_exec_insights.
The second, authored by @zachlite in #111365 is the creation of
system.mvcc_statistics and the MVCCStatisticsUpdate job.
Regarding persisted insights:
Before, this data was kept in memory only and tracked limited
number of latest insights. These tables will be used to persist
this data periodically.
Tables allow to store the same information as in memory insights
without aggregation.
To control the amount of data stored in tables, there will be
follow up PR to run GC job and prune old records.
To make tables flexible to changes when some columns might become
obsolete, most of the columns defined as nullable.
Regarding persisted MVCC Statistics:
The system.mvcc_statistics table stores historical mvcc data
for a tenant's SQL objects. It's purpose it to serve mvcc data for a
SQL object quickly - The span stats API is too slow to use in a hot path.
Storing data over time unlocks new use cases like showing a table or
index's accumulated garbage over time.
The MVCCStatisticsUpdate Job is responsible for managing the contents of
the table, decoupled from the read-hotpath.
Both the table and job are baked when a cluster bootstraps itself, or upgrades
itself from a previous version.
This PR supersedes #111365 with the following changes:
just housekeeping to make sure that the create table schema and descriptors
produce the same table.
Partially resolves: #104582
Epic: CRDB-25491
Release note: None