New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/sql/sqlstats: move failures out of stmt key/meta, into stats #120236
pkg/sql/sqlstats: move failures out of stmt key/meta, into stats #120236
Conversation
fef4fa3
to
3756c81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to point out a few conditionals below that deserve a look, as we're switching away from the failed
bool in the stmt key, meaning conditionals that used the old boolean had to be reworked.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @koorosh, and @xinhaoz)
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go
line 252 at r1 (raw file):
if stmtStats.mu.data.SensitiveInfo.LastErr == "" && statistics[i].Stats.SensitiveInfo.LastErr != "" { stmtStats.mu.data.SensitiveInfo.LastErr = statistics[i].Stats.SensitiveInfo.LastErr }
NB: Without key.Failed
, this seemed to me like an appropriate way to rework the conditional. PTAL.
Code quote:
// Setting all metadata fields.
if stmtStats.mu.data.SensitiveInfo.LastErr == "" && statistics[i].Stats.SensitiveInfo.LastErr != "" {
stmtStats.mu.data.SensitiveInfo.LastErr = statistics[i].Stats.SensitiveInfo.LastErr
}
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go
line 446 at r1 (raw file):
if s.mu.data.SensitiveInfo.LastErr == "" && statistics.Stats.SensitiveInfo.LastErr != "" { s.mu.data.SensitiveInfo.LastErr = statistics.Stats.SensitiveInfo.LastErr }
Same as above, LMK if anyone disagrees with this reworked conditional.
Code quote:
// Setting all metadata fields.
if s.mu.data.SensitiveInfo.LastErr == "" && statistics.Stats.SensitiveInfo.LastErr != "" {
s.mu.data.SensitiveInfo.LastErr = statistics.Stats.SensitiveInfo.LastErr
}
pkg/server/application_api/sql_stats_test.go
line 453 at r1 (raw file):
var statementsInResponse []string for _, respStatement := range resp.Statements { if respStatement.Stats.FailureCount > 0 {
In this instance, instead of looking at Failed
in the key, we replace with a check against the failure count.
Code quote:
if respStatement.Stats.FailureCount > 0 {
pkg/server/application_api/sql_stats_test.go
line 799 at r1 (raw file):
// to avoid counting such failures. If a statement that we expect to be // successful is not found in the response, the test will fail later. if respStatement.Key.KeyData.App == testAppName && respStatement.Stats.FailureCount == 0 {
Ditto as above - we use failure count instead of the old Failed
bool from the key as a replacement.
Code quote:
if respStatement.Key.KeyData.App == testAppName && respStatement.Stats.FailureCount == 0 {
Ah... A bunch of tests need to be updated because the fingerprint hashes have changed with the removal of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, and @xinhaoz)
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go
line 252 at r1 (raw file):
Previously, abarganier (Alex Barganier) wrote…
NB: Without
key.Failed
, this seemed to me like an appropriate way to rework the conditional. PTAL.
I would omit this check && statistics[i].Stats.SensitiveInfo.LastErr != ""
for simplicity reasons. In case statistics[i].Stats.SensitiveInfo.LastErr
is empty then it will reassign empty string with another empty string and preserve the same logic.
pkg/server/application_api/sql_stats_test.go
line 799 at r1 (raw file):
Previously, abarganier (Alex Barganier) wrote…
Ditto as above - we use failure count instead of the old
Failed
bool from the key as a replacement.
Curious, if it makes sense to have .Stats.HasFailures
bool stat?
afbf760
to
997fdf4
Compare
997fdf4
to
893bab1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, the majority of tests have been cleaned up! RFAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @koorosh, and @xinhaoz)
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go
line 252 at r1 (raw file):
Previously, koorosh (Andrii Vorobiov) wrote…
I would omit this check
&& statistics[i].Stats.SensitiveInfo.LastErr != ""
for simplicity reasons. In casestatistics[i].Stats.SensitiveInfo.LastErr
is empty then it will reassign empty string with another empty string and preserve the same logic.
Good idea! Done.
pkg/server/application_api/sql_stats_test.go
line 799 at r1 (raw file):
Previously, koorosh (Andrii Vorobiov) wrote…
Curious, if it makes sense to have
.Stats.HasFailures
bool stat?
I think it would be duplicate information that's already represented by Stats.FailureCount
, no?
893bab1
to
de73f8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, and @xinhaoz)
pkg/server/application_api/sql_stats_test.go
line 799 at r1 (raw file):
Previously, abarganier (Alex Barganier) wrote…
I think it would be duplicate information that's already represented by
Stats.FailureCount
, no?
You're right, my initial thoughts were to avoid checks like this and above with dedicated prop, but it definitely duplicates logic.
de73f8e
to
aa18f90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work on this. lots of details to manage. Might be worth keeping as a single commit to keep it all in sync.
Reviewed 23 of 27 files at r1, 4 of 4 files at r2, 13 of 13 files at r5, 16 of 16 files at r6, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @xinhaoz)
pkg/sql/crdb_internal.go
line 1811 at r2 (raw file):
latency_seconds_p90 FLOAT, latency_seconds_p99 FLOAT, failure_count INT NOT NULL
nit: trailing whitespace
pkg/sql/logictest/testdata/logic_test/crdb_internal
line 845 at r6 (raw file):
# - one for the DROP SEQUENCE statement. # # We expect the SELECT entry to have max_retries > 0 because auto-retries
nit: there are two select entires below.
In the SQL stats feature, we consider whether the statement failed as part of the statement key/statement fingerprint ID, which is used to bucket statement statistics aggregations. This contributes to the cardinality of statement statistics by a constant factor of 2x. A statement that has failed will show up twice in the SQL Activity page, once for the successful executions, and once for the failed executions. Despite not being very useful, the contributions it makes to cardinality are detrimental to the SQL activity feature, as it has a non-negligible impact on data volume. To improve upon this, we should remove the failure bool from the statement key, and instead include failure counts as part of the statistics object. This patch does the following: * Removes the failure boolean from the statement key * Removes failure counts from the statement statistics metadata * Moves the failure counts into the aggrgeated statement statistics * Updates the Statement Details page to display "Failure Count" instead of "Failed?", with the associated numeric count instead of a bool val. Addresses: cockroachdb#97176 Release note (ui change): In the DB Console's SQL Activity page, the Statement Details page will now show "Failure Count" in place of "Failed?", showing the number of failed executions for the given statement fingerprint instead of a "Yes"/"No" value. Furthermore, previously in the SQL Activity table the same statement fingeprint would appear twice - once for failed executions, and once for successful executions. With these changes, we no longer consider these to be separate rows. Instead, we combine them into a single statement fingerprint. Release note (sql change): A new column has been added to `crdb_internal.node_statement_statistics`, `failure_count INT NOT NULL`. It represents the number of recorded statement execution failures for the given statement, as a new component of the overall statistics.
aa18f90
to
a44e43c
Compare
TFTR! bors r=dhartunian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, forgot to submit my replies to your comments before I bors
'd 😅
Good call squashing the commits, I went ahead and did so.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dhartunian and @xinhaoz)
pkg/sql/crdb_internal.go
line 1811 at r2 (raw file):
Previously, dhartunian (David Hartunian) wrote…
nit: trailing whitespace
Done.
pkg/sql/logictest/testdata/logic_test/crdb_internal
line 845 at r6 (raw file):
Previously, dhartunian (David Hartunian) wrote…
nit: there are two select entires below.
Thanks for the catch. Fixed.
Build succeeded: |
In cockroachdb#120236, we removed the `failed` metadata from the `crdb_internal.statement_statistics` table. Fixes cockroachdb#120586. Fixes cockroachdb#120587. Fixes cockroachdb#120588. Fixes cockroachdb#120589. Release note: None
120857: compose: remove PG ComposeCompare test r=rafiss a=rafiss This test does not provide us much value and is too flaky to be useful. Most of the time it fails are due to minor differences in things like names, formatting, or precision, and accommodating each of these differences is not worth it. fixes #109400 fixes #116150 fixes #112154 Release note: None 121052: base,testutils,kvserver: disable lease queue in replication manual r=arulajmani a=kvoli A replica lease queue was introduced in #119155, which processes lease transfers for replicas. Previously, the replicate queue handled lease transfers. The linked change omitted updating the `ReplicationManual` test cluster mode to disable the lease queue, resulting in unexpected lease transfers in some tests. Disable the lease queue under `ReplicationManual` replication mode. Misc tests are also updated to disable/enable the lease queue appropriately. Epic: None Touches: #120605 Release note: None 121120: roachtest: deflake admission-control/multitenant-fairness r=sumeerbhola a=aadityasondhi In #120236, we removed the `failed` metadata from the `crdb_internal.statement_statistics` table. Fixes #120586. Fixes #120587. Fixes #120588. Fixes #120589. Release note: None 121132: batcheval: move test to `large` RBE pool r=rail a=rickystewart Epic: CRDB-8308 Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: Austen McClernon <austen@cockroachlabs.com> Co-authored-by: Aaditya Sondhi <20070511+aadityasondhi@users.noreply.github.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Addresses: #97176
Epic: CRDB-24527
In the SQL stats feature, we consider whether the statement failed as part of the statement key/statement fingerprint ID, which is used to bucket statement statistics aggregations.
This contributes to the cardinality of statement statistics by a constant factor of 2x. A statement that has failed will show up twice in the SQL Activity page, once for the successful executions, and once for the failed executions. Despite not being very useful, the contributions it makes to cardinality are detrimental to the SQL activity feature, as it has a non-negligible impact on data volume.
To improve upon this, we should remove the failure bool from the statement key, and instead include failure counts as part of the statistics object.
This patch does the following:
Release note (ui change): In the DB Console's SQL Activity page, the Statement Details page will now show "Failure Count" in place of "Failed?", showing the number of failed executions for the given statement fingerprint instead of a "Yes"/"No" value. Furthermore, previously in the SQL Activity table the same statement fingeprint would appear twice - once for failed executions, and once for successful executions. With these changes, we no longer consider these to be separate rows. Instead, we combine them into a single statement fingerprint.
Release note (sql change): A new column has been added to
crdb_internal.node_statement_statistics
,failure_count INT NOT NULL
. It represents the number of recorded statement execution failures for the given statement, as a new component of the overall statistics.