-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: optimize persistedsqlstats flush size check #110173
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w)
pkg/sql/sqlstats/persistedsqlstats/cluster_settings.go
line 140 at r1 (raw file):
) // sqlStatsLimitTableCheckInterval is the cluster setting that controls the
this comment is not very clear, since it doesn't mention this is controlling the frequency of the check and not if we allow or not to pass the limit
pkg/sql/sqlstats/persistedsqlstats/cluster_settings.go
line 146 at r1 (raw file):
settings.TenantWritable, "sql.stats.limit_table_size.check_interval", "controls what interval the check is done on if the statement and "+
this sentence is a little confusing (especially this "on")
pkg/sql/sqlstats/persistedsqlstats/flush_test.go
line 505 at r1 (raw file):
}) // Set table size check interval to 1 second.
comment needs to be updated
pkg/sql/sqlstats/persistedsqlstats/flush_test.go
line 611 at r1 (raw file):
require.False(t, limitReached) // Set table size check interval to .00001 second. So the next check doesn't
comment needs to be updated
Problem: The `persistedsqlstats` size check to make sure the table is not 1.5x the max size is done on every flush which is done on every node every 10 minutes by default. This can cause serialization issues as it is over the entire table. The check is unnecessary most of the time, because it should only fail if the compaction job is failing. Solution: 1. Reduce the check interval to only be done once an hour by default, and make it configurable. 2. The system table is split in to 8 shards. Instead of checking the entire table count limit it to only one shard. This reduces the scope of the check and reduces the chance of serialization issues. This was preivously reverted because of a flakey test because the size check is only done on a single shard. The tests are updated to increase the limit and the number of statements to make sure every shard has data. Fixes: #109619 Release note (sql change): The persistedsqlstats table max size check is now done once an hour instead of every 10 minutes. This reduces the risk of serialization errors on the statistics tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @maryliag)
pkg/sql/sqlstats/persistedsqlstats/cluster_settings.go
line 140 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
this comment is not very clear, since it doesn't mention this is controlling the frequency of the check and not if we allow or not to pass the limit
Done.
pkg/sql/sqlstats/persistedsqlstats/cluster_settings.go
line 146 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
this sentence is a little confusing (especially this "on")
Done.
pkg/sql/sqlstats/persistedsqlstats/flush_test.go
line 505 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
comment needs to be updated
Done.
pkg/sql/sqlstats/persistedsqlstats/flush_test.go
line 611 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
comment needs to be updated
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @j82w)
bors r+ |
Build failed (retrying...): |
Build succeeded: |
Problem:
The
persistedsqlstats
size check to make sure the table is not 1.5x the max size is done on every flush which is done on every node every 10 minutes by default. This can cause serialization issues as it is over the entire table. The check is unnecessary most of the time, because it should only fail if the compaction job is failing.Solution:
This was preivously reverted because of a flakey test because the size check is only done on a single shard. The tests are updated to increase the limit and the number of statements to make sure every shard has data.
Fixes: #109619
Release note (sql change): The persistedsqlstats table max size check is now done once an hour instead of every 10 minutes. This reduces the risk of serialization errors on the statistics tables.