-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postgres Performance Optimization: Cache baseline metrics and apply updates incrementaly #17554
Postgres Performance Optimization: Cache baseline metrics and apply updates incrementaly #17554
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about increased memory impact of _baseline_metrics
because the size of this dict is not bounded by pg_stat_statements.max
. In a high cardinality database with lots of evictions in pg_stat_statements
, the size of _baseline_metrics
could grow quickly.
Re the unbounded growth due to query eviction, I added an expiry time (currently set to 10 minutes) past which the baseline will be re-fetched: decf7e8. |
@lu-zhengda I tweaked the implementation slightly. I realized that aggregating the metric data by query signature in |
08deca3
to
f7b78bd
Compare
Test Results296 tests 256 ✅ 2m 0s ⏱️ For more details on these failures, see this check. Results for commit f7b78bd. |
@@ -77,6 +77,93 @@ def test_dbm_enabled_config(integration_check, dbm_instance, dbm_enabled_key, db | |||
assert check._config.dbm_enabled == dbm_enabled | |||
|
|||
|
|||
@requires_over_10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This test is failing on PG9. We may have to skip this optimization on that version. It seems like not all expected rows are being returned in the QUERY_CALLS_QUERY which prevents correct functioning.
1d9227f
to
1af5fec
Compare
32e8a45
to
2630e2f
Compare
The |
1 similar comment
The |
d35d2b3
to
e44b0b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. minor change on the config default
1e11c25
to
d252c9a
Compare
The |
7f0bd81
to
8bdf3c6
Compare
90a6700
to
dbc72b2
Compare
2a98e2b
to
7523552
Compare
What does this PR do?
#17187 was reverted because we observed that it could lead to inflated metrics. The root cause of that is that
computed_derivative_rows
currently assumes that all rows inpg_stat_statements
are passed as input. It does not compute correct results when a subset ofpg_stat_statements
is passed in, because that can lead to taking the diff of two differentpg_stat_statement
rows, which leads to incoherent results.The approach taken here is to introduce a
baseline_metrics
cache, which holds one entry for eachpg_stat_statements
row and is populated once from the fullpg_stat_statements
dataset. On subsequent check runs, only queries that have been called since the previous run will be returned, and thebaseline_metrics
will be updated for just those queryids.This allows the full set of normalized rows to be constructed with one new function:
_apply_deltas
. This gets run beforecompute_derivative_rows
and uses thebaseline_metrics
to reconstruct the full set of metrics rows. This relies on the fact that a query that hasn't been called between check runs has no change in metrics, so we can reuse the cached values.Motivation
Additional Notes
Review checklist (to be filled by reviewers)
qa/skip-qa
label if the PR doesn't need to be tested during QA.backport/<branch-name>
label to the PR and it will automatically open a backport PR once this one is merged