New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: fix sql activity cache for large cardinality #112350
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @maryliag)
-- commits
line 34 at r1:
This fixes #111224, right?
pkg/server/combined_statement_stats.go
line 1016 at r1 (raw file):
// This does not use the activity tables because the statement information is // aggregated to remove the transaction fingerprint id to keep the size of the // statement_activity manageable when transaction have over 1k+ statement ids.
nit: "transaction have".
pkg/sql/sql_activity_update_job.go
line 368 at r1 (raw file):
// each of execution_count, total execution time, service_latency,cpu_sql_nanos, // contention_time and insert into transaction_activity table. // Up to 3000 rows (sql.stats.activity.top.max * 5) may be added to
Why did this change? Because we only need 5 columns? Also, do we need to reduce numberOfTopColumns
to 5
too? Also, s/3000/2500/
.
pkg/sql/sql_activity_update_job.go
line 388 at r1 (raw file):
merge_stats, '' AS query, (merge_stats -> 'statistics' ->> 'cnt')::int,
nit: seems like a few extra spaces upfront.
pkg/sql/sql_activity_update_job.go
line 522 at r1 (raw file):
merged_stats, max_plan, (select COALESCE(array_agg(o.rec::string), (array[]::string[])) FROM jsonb_array_elements_text(merged_stats -> 'index_recommendations') o(rec)) as idx_rec,
nit: indentation.
pkg/sql/sql_activity_update_job.go
line 537 at r1 (raw file):
max(ss.agg_interval) AS max_agg_interval, max(ss.plan) AS max_plan, merge_stats_metadata(ss.metadata) AS metadata,
nit: indentation.
pkg/sql/sql_activity_update_job.go
line 540 at r1 (raw file):
merge_statement_stats(ss.statistics) AS merged_stats FROM system.statement_statistics ss inner join limit_stmt_stats using (aggregated_ts, fingerprint_id, app_name)
nit: we usually capitalize INNER JOIN and GROUP BY
pkg/sql/opt/exec/execbuilder/testdata/observability
line 347 at r1 (raw file):
│ group by: fingerprint_id, plan_hash, app_name │ └── • apply join (left outer)
This apply join seems a bit concerning.
pkg/sql/opt/exec/execbuilder/testdata/observability
line 491 at r1 (raw file):
app_name, fingerprint_id), limit_stmt_stats AS (SELECT aggregated_ts,
nit: indentation.
pkg/sql/opt/exec/execbuilder/testdata/observability
line 606 at r1 (raw file):
│ group by: fingerprint_id, plan_hash, app_name │ └── • apply join (left outer)
Ditto.
pkg/sql/opt/exec/execbuilder/testdata/observability
line 885 at r1 (raw file):
---- distribution: local vectorized: true
nit: indentation.
pkg/sql/sql_activity_update_job_test.go
line 379 at r1 (raw file):
WHERE app_name = $3;`, 10000, 1, "topTransaction") // Help check for primary key conflicts
nit: missing period.
1e10c95
to
4702fde
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This fixes #111224, right?
Right, added it
pkg/server/combined_statement_stats.go
line 1016 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: "transaction have".
Done
pkg/sql/sql_activity_update_job.go
line 368 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Why did this change? Because we only need 5 columns? Also, do we need to reduce
numberOfTopColumns
to5
too? Also,s/3000/2500/
.
For Stmt we have 6 and for Txn we not have 5, so I updated the comment here to 2500 since is about Txn. I also created separate numberOfTopColumns
for stmt and txn to make the distinction clear
pkg/sql/sql_activity_update_job.go
line 388 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: seems like a few extra spaces upfront.
Fixed
pkg/sql/sql_activity_update_job.go
line 522 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: indentation.
Fixed
pkg/sql/sql_activity_update_job.go
line 537 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: indentation.
Fixed
pkg/sql/sql_activity_update_job.go
line 540 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: we usually capitalize INNER JOIN and GROUP BY
Fixed
pkg/sql/opt/exec/execbuilder/testdata/observability
line 347 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This apply join seems a bit concerning.
any suggestions?
pkg/sql/opt/exec/execbuilder/testdata/observability
line 491 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: indentation.
Fixed
pkg/sql/opt/exec/execbuilder/testdata/observability
line 606 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Ditto.
any suggestions?
pkg/sql/opt/exec/execbuilder/testdata/observability
line 885 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: indentation.
Fixed
pkg/sql/sql_activity_update_job_test.go
line 379 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: missing period.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @maryliag)
pkg/sql/opt/exec/execbuilder/testdata/observability
line 606 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
any suggestions?
(select COALESCE(array_agg(o.rec::string), (array[]::string[])) FROM jsonb_array_elements_text(merged_stats -> 'index_recommendations') o(rec)) as idx_rec,
is what's causing us to use the apply join here. What we're trying to do is to array aggregate all index recommendations.
However, if we look at how merge_statement_stats
does aggregation of index recommendations, StatementStatistics.Add
simply keeps the latest IndexRecommendations
as the code is written currently. Thus, I think we can change the query to simply use
merged_stats -> 'index_recommendations' as idx_rec,
which removes the need for apply join. WDYT?
dfc830e
to
15545b5
Compare
b48c477
to
285bc78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @yuzefovich)
pkg/sql/opt/exec/execbuilder/testdata/observability
line 606 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
(select COALESCE(array_agg(o.rec::string), (array[]::string[])) FROM jsonb_array_elements_text(merged_stats -> 'index_recommendations') o(rec)) as idx_rec,is what's causing us to use the apply join here. What we're trying to do is to array aggregate all index recommendations.
However, if we look at how
merge_statement_stats
does aggregation of index recommendations,StatementStatistics.Add
simply keeps the latestIndexRecommendations
as the code is written currently. Thus, I think we can change the query to simply usemerged_stats -> 'index_recommendations' as idx_rec,which removes the need for apply join. WDYT?
Query is now using the new builtin and the plan is updated. Let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r4, 6 of 6 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @maryliag)
-- commits
line 27 at r5:
nit: probably this sentence is no longer correct - we now use the last index recommendations that we happen to see when merging.
pkg/sql/sql_activity_update_job.go
line 68 at r5 (raw file):
settings.WithPublic) const numberOfStmtTopColumns = 6
Should one of these be 5?
pkg/sql/appstatspb/app_stats.go
line 200 at r5 (raw file):
// Use the LastExecTimestamp to decide which object has the last error // and index recommendations.
nit: the comment needs an update.
pkg/sql/opt/exec/execbuilder/testdata/observability
line 606 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
Query is now using the new builtin and the plan is updated. Let me know what you think
The plans no longer seem concerning.
Ideally, we'd run EXPLAIN ANALYZE of the read portions of these queries (similar to what I did here) on a reasonably active database to see the performance. I guess it requires usage of multiple different fingerprints - perhaps we could use sqlsmith
to generate such random queries.
pkg/sql/opt/exec/execbuilder/testdata/observability
line 251 at r5 (raw file):
query T retry EXPLAIN (VERBOSE) UPSERT
nit: perhaps squash UPSERT
with the next line?
pkg/sql/opt/exec/execbuilder/testdata/observability
line 806 at r5 (raw file):
merge_stats, '' AS query, (merge_stats -> 'statistics' ->> 'cnt')::int,
nit: indentation is a bit off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: probably this sentence is no longer correct - we now use the last index recommendations that we happen to see when merging.
Fixed
pkg/sql/sql_activity_update_job.go
line 68 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Should one of these be 5?
🤦 duh, fixed
pkg/sql/appstatspb/app_stats.go
line 200 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: the comment needs an update.
Done
pkg/sql/opt/exec/execbuilder/testdata/observability
line 606 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
The plans no longer seem concerning.
Ideally, we'd run EXPLAIN ANALYZE of the read portions of these queries (similar to what I did here) on a reasonably active database to see the performance. I guess it requires usage of multiple different fingerprints - perhaps we could use
sqlsmith
to generate such random queries.
isn't this what the above part of INJECT STATISTICS
is trying to simulate? We are adding stats to pretend those tables are full
pkg/sql/opt/exec/execbuilder/testdata/observability
line 251 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: perhaps squash
UPSERT
with the next line?
Done
pkg/sql/opt/exec/execbuilder/testdata/observability
line 806 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: indentation is a bit off.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r6, 6 of 6 files at r7, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @maryliag)
pkg/sql/opt/exec/execbuilder/testdata/observability
line 606 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
isn't this what the above part of
INJECT STATISTICS
is trying to simulate? We are adding stats to pretend those tables are full
Yes, this shows the right logical plan, but I'm wondering whether the actual performance (thus EXPLAIN ANALYZE
) is fast enough. It's hard to say that simply by glancing at the plan. I mean, it looks like the plan has improved, but it would still be nice to see the actual runtime of the queries, especially so given that we'd be backporting this to 23.1.x I think.
pkg/sql/sql_activity_update_job_test.go
line 99 at r7 (raw file):
verifyActivityTablesAreEmpty(t, db) // Us random name to keep isolated during stress tests.
nit: s/Us/Use/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to backport this to 23.2 and 23.1, right?
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dhartunian and @maryliag)
pkg/sql/opt/exec/execbuilder/testdata/observability
line 606 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Yes, this shows the right logical plan, but I'm wondering whether the actual performance (thus
EXPLAIN ANALYZE
) is fast enough. It's hard to say that simply by glancing at the plan. I mean, it looks like the plan has improved, but it would still be nice to see the actual runtime of the queries, especially so given that we'd be backporting this to 23.1.x I think.
Oh, I forgot that we already addressed - probably - the main reason for slowness we observed in a customer ticket (#111303), and this PR seems to improve the plan further, so I no longer think that we need to run EXPLAIN ANALYZE to see the actual performance, so I'm ok with merging this.
That said, if you do see an easy / quick way to replicate the customer ticket that found these problems, getting a stmt bundle to see whether further improvements can be made would be a nice-to-have.
e6c24ba
to
398c76e
Compare
This commit introduces the function `jsonb_array_to_string_array` that converts a JSONB array to a string array format. Epic: none Release note (sql change): New builtin `jsonb_array_to_string_array` that converts JSONB array to string array.
Recreated from cockroachdb#112071 tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats. 1. Fix activity update failure from upserting same row multiple times. This was caused by 2 nodes having the same statement, but one only executed it 2 or 3 time and the other node executed it enough to generate an index recommendation. The different index recommendation caused duplicate rows based on primary key to be inserted. It now uses index recommendation from the merged stats which uses the latest existing recommendation (we don't update if there is no recommendation). 2. Customers can have transactions that have over 1k unique statement fingerprint ids. This makes it unreasonable to cache all of the statement information for all the top transactions. This commit removes the logic to cache all the statement information for the top transactions which could cause the table to grow to over 100k for a single hour. This requires the call to go the statistics table to get all the necessary statement information for the txn details. 3. The activity cache logic was not correctly selecting the top CPU json column. It was using the computed index column name instead of the json field name. The values shown on the UI were correct because the endpoint still aggregated several columns using the aggregated stats rather than the separate column. 4. The activity cache logic was using the execution count instead of the statistics count to select the top queries. In most cases this still generates the correct top queries, and the UI uses the aggregated stats which showed the correct value. Fixes: cockroachdb#111780 Fixes: cockroachdb#111224 Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.
TFTRs! I'll go ahead and merge this now bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from e252306 to blathers/backport-release-23.1-112350: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. error creating merge commit from e252306 to blathers/backport-release-23.2-112350: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Note to reviewers: first commit being reviewed on #112609
Recreated from #112071
tldr: fix activity update job failing from conflict, caching to much info when their is high cardinatlity in the stats, and fix cache to store the correct top stats.
recommendation (we don't update if there is no recommendation).
Fixes: #111780
Fixes: #111224
Release note (sql change): Fix the SQL activity update job to avoid conflicts on update, reduces the amount of data cached to just what the overview page requires, and fix the correctess of the top queries.