sql: add query, query_summary, and database columns to statement statistics views#168357
Conversation
|
😎 Merged successfully - details. |
6894bff to
432a77d
Compare
|
Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link) |
1 similar comment
|
Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link) |
432a77d to
a45e6d3
Compare
|
Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link) |
1 similar comment
|
Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link) |
a45e6d3 to
75e33ad
Compare
|
Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link) |
1 similar comment
|
Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link) |
75e33ad to
46d3c6d
Compare
…istics views Add three new columns (query, query_summary, database) to the following crdb_internal virtual tables and views: - cluster_statement_statistics: populated from in-memory stats Key fields - statement_statistics: merged view (union of in-memory + persisted) - statement_statistics_persisted: populated via LEFT JOIN with system.statements, falling back to metadata JSON fields - statement_activity: populated via LEFT JOIN with system.statements, falling back to metadata JSON fields These columns surface the query text, summary, and originating database directly in the virtual table schema, avoiding the need for callers to parse the metadata JSONB column. The persisted and activity views use COALESCE to prefer data from system.statements (populated during flush) and fall back to extracting from the metadata JSONB when the statements table entry doesn't exist yet. Resolves: CRDB-62871 Epic: CRDB-62868 Release note (sql change): Added query, query_summary, and database columns to crdb_internal.cluster_statement_statistics, crdb_internal.statement_statistics, crdb_internal.statement_statistics_persisted, and crdb_internal.statement_activity virtual tables. These columns expose the statement text, summary, and originating database directly without requiring callers to parse the metadata JSONB column.
46d3c6d to
ea32fd6
Compare
| FROM | ||
| system.statement_statistics`, | ||
| system.statement_statistics AS ss | ||
| LEFT JOIN |
There was a problem hiding this comment.
will this (and the same LEFT JOIN below for crdb_internal.statement_activity run into problems for mixed-version / a cluster doing a rolling upgrade? i.e. if the migration that adds the table hasn't run yet - then nodes on the new binary will fail to query these virtual tables i think.
There was a problem hiding this comment.
This migration for the new table is in 26.2, so there shouldn't be an issue with this join in 26.3. There can be mixed version issues if you try to query these new columns on a mixed version cluster, since these new columns wont exist on the virtual tables / views, but right now there isn't anything that queries these columns. I plan on updating some APIs in the status server, but these will be version gated accordingly.
We do the left join specifically because the statements table likely won't contain rows for stats recorded before 26.3, so we fall back to the metadata->>* columns. In the future, when we stop recording this data to the metadata jsonb column, we will remove the coalesce all thogether
| database | ||
| FROM | ||
| system.statement_statistics | ||
| crdb_internal.statement_statistics_persisted |
There was a problem hiding this comment.
I'm guessing this is necessary because we want to select from the view / virtual table that did the join with system.statements - are we concerned at all that queries against cluster_statement_statistics now do this additional join?
I also think this runs into the same mixed version concern i noted below.
There was a problem hiding this comment.
I'm guessing this is necessary because we want to select from the view / virtual table that did the join with system.statements
Exactly. There is no point in duplicating the same join query. crdb_internal.statement_statistics_persisted is already a virtual view on top of system.statement_statistics so its just a drop in replacement.
are we concerned at all that queries against cluster_statement_statistics now do this additional join?
cluster_statement_statistics doesn't do a join, this view just exposes query, query_summary, and database as top level columns from the in-memory stats so there shouldn't be an impact . This is on L7154-7156
| @@ -1445,15 +1445,48 @@ statement ok | |||
| REVOKE SYSTEM VIEWACTIVITY FROM testuser | |||
|
|
|||
| # Regression tests for not setting SQL stats provider in the backfill (#76710). | |||
There was a problem hiding this comment.
is this comment still valid ? do we need to do anything else for this logic test?
There was a problem hiding this comment.
The comment still applies for the materialized view created below. I removed this table creation because the conditional creation of this table changes the output of the tests below, where we create procedures below, and I think these regression tests are't strictly necessary anymore, so deleting 1 of them is okay.
do we need to do anything else for this logic test
I think these tests cover enough. We have more tests in sqlstats_test.go that actually test the output of the V tables when querying them
| ss.p99_latency, | ||
| COALESCE(s.fingerprint, ss.metadata->>'query', '') AS query, | ||
| COALESCE(s.summary, ss.metadata->>'querySummary', '') AS query_summary, | ||
| COALESCE(s.db, ss.metadata->>'db', '') AS database |
There was a problem hiding this comment.
in hindsight i guess we should have named the field on system.statements the same (i.e. database vs db)
|
/trunk merge tftr! |
Summary
Add three new columns (
query,query_summary,database) to the followingcrdb_internal virtual tables and views:
system.statements, falling back to metadata JSON fieldssystem.statements,falling back to metadata JSON fields
These columns surface the query text, summary, and originating database
directly in the virtual table schema, avoiding the need for callers to
parse the metadata JSONB column. The persisted and activity views use
COALESCE to prefer data from
system.statements(populated during flush)and fall back to extracting from the metadata JSONB when the statements
table entry doesn't exist yet.
Resolves: CRDB-62871
Epic: CRDB-62868
Release note (sql change): Added query, query_summary, and database
columns to crdb_internal.cluster_statement_statistics,
crdb_internal.statement_statistics, crdb_internal.statement_statistics_persisted,
and crdb_internal.statement_activity virtual tables. These columns expose
the statement text, summary, and originating database directly without
requiring callers to parse the metadata JSONB column.