sql: optimize persistedsqlstats flush size check #110173

j82w · 2023-09-07T13:43:41Z

Problem:
The persistedsqlstats size check to make sure the table is not 1.5x the max size is done on every flush which is done on every node every 10 minutes by default. This can cause serialization issues as it is over the entire table. The check is unnecessary most of the time, because it should only fail if the compaction job is failing.

Solution:

Reduce the check interval to only be done once an hour by default, and make it configurable.
The system table is split in to 8 shards. Instead of checking the entire table count limit it to only one shard. This reduces the scope of the check and reduces the chance of serialization issues.

This was preivously reverted because of a flakey test because the size check is only done on a single shard. The tests are updated to increase the limit and the number of statements to make sure every shard has data.

Fixes: #109619

Release note (sql change): The persistedsqlstats table max size check is now done once an hour instead of every 10 minutes. This reduces the risk of serialization errors on the statistics tables.

cockroach-teamcity · 2023-09-07T13:43:53Z

This change is

maryliag

Reviewed 3 of 4 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w)

pkg/sql/sqlstats/persistedsqlstats/cluster_settings.go line 140 at r1 (raw file):

)

// sqlStatsLimitTableCheckInterval is the cluster setting that controls the

this comment is not very clear, since it doesn't mention this is controlling the frequency of the check and not if we allow or not to pass the limit

pkg/sql/sqlstats/persistedsqlstats/cluster_settings.go line 146 at r1 (raw file):

	settings.TenantWritable,
	"sql.stats.limit_table_size.check_interval",
	"controls what interval the check is done on if the statement and "+

this sentence is a little confusing (especially this "on")

pkg/sql/sqlstats/persistedsqlstats/flush_test.go line 505 at r1 (raw file):

	})

	// Set table size check interval to 1 second.

comment needs to be updated

pkg/sql/sqlstats/persistedsqlstats/flush_test.go line 611 at r1 (raw file):

	require.False(t, limitReached)

	// Set table size check interval to .00001 second. So the next check doesn't

comment needs to be updated

Problem: The `persistedsqlstats` size check to make sure the table is not 1.5x the max size is done on every flush which is done on every node every 10 minutes by default. This can cause serialization issues as it is over the entire table. The check is unnecessary most of the time, because it should only fail if the compaction job is failing. Solution: 1. Reduce the check interval to only be done once an hour by default, and make it configurable. 2. The system table is split in to 8 shards. Instead of checking the entire table count limit it to only one shard. This reduces the scope of the check and reduces the chance of serialization issues. This was preivously reverted because of a flakey test because the size check is only done on a single shard. The tests are updated to increase the limit and the number of statements to make sure every shard has data. Fixes: #109619 Release note (sql change): The persistedsqlstats table max size check is now done once an hour instead of every 10 minutes. This reduces the risk of serialization errors on the statistics tables.

j82w

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @maryliag)

pkg/sql/sqlstats/persistedsqlstats/cluster_settings.go line 140 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

this comment is not very clear, since it doesn't mention this is controlling the frequency of the check and not if we allow or not to pass the limit

Done.

pkg/sql/sqlstats/persistedsqlstats/cluster_settings.go line 146 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

this sentence is a little confusing (especially this "on")

Done.

pkg/sql/sqlstats/persistedsqlstats/flush_test.go line 505 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

comment needs to be updated

Done.

pkg/sql/sqlstats/persistedsqlstats/flush_test.go line 611 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

comment needs to be updated

Done.

maryliag

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @j82w)

j82w · 2023-09-12T15:56:18Z

bors r+

craig · 2023-09-12T16:44:07Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2023-09-12T18:31:58Z

Build succeeded:

Bazel Essential CI (Cockroach)

j82w requested a review from a team September 7, 2023 13:43

maryliag reviewed Sep 11, 2023

View reviewed changes

j82w commented Sep 12, 2023

View reviewed changes

maryliag approved these changes Sep 12, 2023

View reviewed changes

craig bot merged commit 40dd180 into cockroachdb:master Sep 12, 2023
3 checks passed

cockroach-teamcity mentioned this pull request Sep 13, 2023

PR #110173 - sql: optimize persistedsqlstats flush size check cockroachdb/docs#17877

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: optimize persistedsqlstats flush size check #110173

sql: optimize persistedsqlstats flush size check #110173

j82w commented Sep 7, 2023

cockroach-teamcity commented Sep 7, 2023

maryliag left a comment

j82w left a comment

maryliag left a comment

j82w commented Sep 12, 2023

craig bot commented Sep 12, 2023

craig bot commented Sep 12, 2023

sql: optimize persistedsqlstats flush size check #110173

sql: optimize persistedsqlstats flush size check #110173

Conversation

j82w commented Sep 7, 2023

cockroach-teamcity commented Sep 7, 2023

maryliag left a comment

Choose a reason for hiding this comment

j82w left a comment

Choose a reason for hiding this comment

maryliag left a comment

Choose a reason for hiding this comment

j82w commented Sep 12, 2023

craig bot commented Sep 12, 2023

craig bot commented Sep 12, 2023