Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'DISCARD ALL' performance regression on v22.2 #95864

Closed
JeffSwenson opened this issue Jan 25, 2023 · 4 comments · Fixed by #95876
Closed

'DISCARD ALL' performance regression on v22.2 #95864

JeffSwenson opened this issue Jan 25, 2023 · 4 comments · Fixed by #95876
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@JeffSwenson
Copy link
Collaborator

JeffSwenson commented Jan 25, 2023

Describe the problem

DISCARD ALL is more expensive to execute on v22.2.2 when compared to v22.1.11. Some Serverless clients appear to run DISCARD ALL when a connection is returned to the connection pool. The extra overhead causes a noticeable load increase in lightly loaded clusters that are serving mostly health checks. It also increases the overhead of each operation using the pool. See https://cockroachlabs.atlassian.net/browse/CC-9132 for context.

To Reproduce

Connect to a v22.2 cluster and run SHOW DISCARD. The execution will take 5-30ms. On v22.1, it consistently took <1 ms.

I ran SHOW DISCARD with tracing enabled on v22.2.2 and v22.1.11.

# v22.2.2
jeff@sprobe-3284.6h5c.crdb.io:26257/sprobe> SHOW CLUSTER SETTING version;
  version
-----------
  22.2
(1 row)


Time: 42ms total (execution 3ms / network 39ms)

jeff@sprobe-3284.6h5c.crdb.io:26257/sprobe> SET tracing = 'on'
                                         -> ;
SET TRACING


Time: 37ms total (execution 0ms / network 37ms)

jeff@sprobe-3284.6h5c.crdb.io:26257/sprobe> DISCARD ALL;
NOTICE: datestyle_enabled no longer has any effect
NOTICE: experimental_enable_hash_sharded_indexes no longer has any effect
NOTICE: intervalstyle_enabled no longer has any effect
NOTICE: enable_drop_enum_value no longer has any effect
DISCARD ALL


Time: 54ms total (execution 14ms / network 39ms)

jeff@sprobe-3284.6h5c.crdb.io:26257/sprobe> SHOW COMPACT KV TRACE FOR SESSION;
        age       |                                                                                                                 message                                                                                                                 |                                               tag                                                |    operation
------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------
  00:00:00.118497 | sql/conn_executor_exec.go:697 rows affected: 1                                                                                                                                                                                          | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff]                                 | sql query
  00:00:02.553212 | sql/row/kv_batch_fetcher.go:406 Scan /Tenant/3284/Table/4/1/"jeff"{-/PrefixEnd}                                                                                                                                                         | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,intExec=read-users]              | colbatchscan
  00:00:02.553337 | kv/kvclient/kvcoord/range_iter.go:183 querying next range at /Tenant/3284/Table/4/1/"jeff"                                                                                                                                              | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,intExec=read-users,txn=264e72ec] | dist sender send
  00:00:02.553387 | kv/kvclient/kvcoord/range_iter.go:216 key: /Tenant/3284/Table/4/1/"jeff", desc: r10290:/Tenant/3284{-/Table/106} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=3361, sticky=1669765149.511403611,0]                                     | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,intExec=read-users,txn=264e72ec] | dist sender send
  00:00:02.553423 | kv/kvclient/kvcoord/dist_sender.go:2046 r10290: sending batch 1 Scan to (n1,s1):1                                                                                                                                                       | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,intExec=read-users,txn=264e72ec] | dist sender send
  00:00:02.555214 | sql/conn_executor_exec.go:697 rows affected: 1                                                                                                                                                                                          | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,intExec=read-users]              | sql query
  00:00:02.555896 | kv/kvclient/kvcoord/range_iter.go:183 querying next range at /Tenant/3284/Table/30/1/0/0                                                                                                                                                | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.555949 | kv/kvclient/kvcoord/range_iter.go:216 key: /Tenant/3284/Table/30/1/0/0, desc: r10290:/Tenant/3284{-/Table/106} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=3361, sticky=1669765149.511403611,0]                                       | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.55598  | kv/kvclient/kvcoord/dist_sender.go:2046 r10290: sending batch 1 Scan to (n1,s1):1                                                                                                                                                       | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.557178 | kv/kvclient/kvcoord/range_iter.go:183 querying next range at /Tenant/3284/Table/3/1/100/2/1                                                                                                                                             | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.557221 | kv/kvclient/kvcoord/range_iter.go:216 key: /Tenant/3284/Table/3/1/100/2/1, desc: r10290:/Tenant/3284{-/Table/106} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=3361, sticky=1669765149.511403611,0]                                    | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.557244 | kv/kvclient/kvcoord/dist_sender.go:2046 r10290: sending batch 4 Get to (n1,s1):1                                                                                                                                                        | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.558796 | kv/kvclient/kvcoord/range_iter.go:183 querying next range at /Tenant/3284/Table/3/1/101/2/1                                                                                                                                             | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.55884  | kv/kvclient/kvcoord/range_iter.go:216 key: /Tenant/3284/Table/3/1/101/2/1, desc: r10290:/Tenant/3284{-/Table/106} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=3361, sticky=1669765149.511403611,0]                                    | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.558863 | kv/kvclient/kvcoord/dist_sender.go:2046 r10290: sending batch 4 Get to (n1,s1):1                                                                                                                                                        | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.560559 | kv/kvclient/kvcoord/range_iter.go:183 querying next range at /Tenant/3284/Table/30/1/100/0/"pg_temp_1674680085229196954_2"/4/1                                                                                                          | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.560625 | kv/kvclient/kvcoord/range_iter.go:216 key: /Tenant/3284/Table/30/1/100/0/"pg_temp_1674680085229196954_2"/4/1, desc: r10290:/Tenant/3284{-/Table/106} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=3361, sticky=1669765149.511403611,0] | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.560646 | kv/kvclient/kvcoord/dist_sender.go:2046 r10290: sending batch 1 Get to (n1,s1):1                                                                                                                                                        | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.561551 | kv/kvclient/kvcoord/range_iter.go:183 querying next range at /Tenant/3284/Table/30/1/102/0/"pg_temp_1674680085229196954_2"/4/1                                                                                                          | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.561593 | kv/kvclient/kvcoord/range_iter.go:216 key: /Tenant/3284/Table/30/1/102/0/"pg_temp_1674680085229196954_2"/4/1, desc: r10290:/Tenant/3284{-/Table/106} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=3361, sticky=1669765149.511403611,0] | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.561613 | kv/kvclient/kvcoord/dist_sender.go:2046 r10290: sending batch 1 Get to (n1,s1):1                                                                                                                                                        | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.56246  | kv/kvclient/kvcoord/range_iter.go:183 querying next range at /Tenant/3284/Table/30/1/124/0/"pg_temp_1674680085229196954_2"/4/1                                                                                                          | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.562501 | kv/kvclient/kvcoord/range_iter.go:216 key: /Tenant/3284/Table/30/1/124/0/"pg_temp_1674680085229196954_2"/4/1, desc: r10290:/Tenant/3284{-/Table/106} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=3361, sticky=1669765149.511403611,0] | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.562521 | kv/kvclient/kvcoord/dist_sender.go:2046 r10290: sending batch 1 Get to (n1,s1):1                                                                                                                                                        | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.563434 | kv/kvclient/kvcoord/range_iter.go:183 querying next range at /Tenant/3284/Table/30/1/104/0/"pg_temp_1674680085229196954_2"/4/1                                                                                                          | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.563481 | kv/kvclient/kvcoord/range_iter.go:216 key: /Tenant/3284/Table/30/1/104/0/"pg_temp_1674680085229196954_2"/4/1, desc: r10290:/Tenant/3284{-/Table/106} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=3361, sticky=1669765149.511403611,0] | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.563505 | kv/kvclient/kvcoord/dist_sender.go:2046 r10290: sending batch 1 Get to (n1,s1):1                                                                                                                                                        | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.564333 | kv/kvclient/kvcoord/range_iter.go:183 querying next range at /Tenant/3284/Table/30/1/1/0/"pg_temp_1674680085229196954_2"/4/1                                                                                                            | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.564374 | kv/kvclient/kvcoord/range_iter.go:216 key: /Tenant/3284/Table/30/1/1/0/"pg_temp_1674680085229196954_2"/4/1, desc: r10290:/Tenant/3284{-/Table/106} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=3361, sticky=1669765149.511403611,0]   | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.564394 | kv/kvclient/kvcoord/dist_sender.go:2046 r10290: sending batch 1 Get to (n1,s1):1                                                                                                                                                        | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff,txn=264e72ec]                    | dist sender send
  00:00:02.565443 | sql/conn_executor_exec.go:697 rows affected: 0                                                                                                                                                                                          | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff]                                 | sql query
  00:00:02.720143 | sql/conn_executor_exec.go:697 rows affected: 1                                                                                                                                                                                          | [nsql2,peer=10.0.3.57:55876,client=71.190.30.48:51536,user=jeff]                                 | sql query
# v22.1.11
jeff@long-herder-3.jeffsw-883z.crdb.io:26257/defaultdb> SHOW CLUSTER SETTING version;
  version
-----------
  22.1
(1 row)


Time: 39ms total (execution 2ms / network 37ms)

jeff@long-herder-3.jeffsw-883z.crdb.io:26257/defaultdb> SET tracing = 'on';
SET TRACING


Time: 37ms total (execution 0ms / network 37ms)

jeff@long-herder-3.jeffsw-883z.crdb.io:26257/defaultdb> DISCARD ALL;
DISCARD


Time: 40ms total (execution 1ms / network 39ms)

jeff@long-herder-3.jeffsw-883z.crdb.io:26257/defaultdb> SHOW COMPACT KV TRACE FOR SESSION;
        age       |                    message                     |                               tag                                | operation
------------------+------------------------------------------------+------------------------------------------------------------------+------------
  00:00:00.107343 | sql/conn_executor_exec.go:747 rows affected: 1 | [nsql1,peer=10.0.2.13:54784,client=71.190.30.48:52433,user=jeff] | sql query
  00:00:04.403253 | sql/conn_executor_exec.go:747 rows affected: 0 | [nsql1,peer=10.0.2.13:54784,client=71.190.30.48:52433,user=jeff] | sql query
  00:00:04.521179 | sql/conn_executor_exec.go:747 rows affected: 1 | [nsql1,peer=10.0.2.13:54784,client=71.190.30.48:52433,user=jeff] | sql query
(3 rows)

Jira issue: CRDB-23779

@JeffSwenson JeffSwenson added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Jan 25, 2023
@ajwerner
Copy link
Contributor

The root cause is #86246. This code should consult in-memory data structures to determine what work there is to do and then should act accordingly.

@JeffSwenson
Copy link
Collaborator Author

Running DISCARD ALL after a connection is returned appears to be the default behavior for pg bouncer: https://www.pgbouncer.org/config.html#connection-sanity-checks-timeouts

craig bot pushed a commit that referenced this issue Jan 26, 2023
94153: sql: Remove type annotations from `column_default` in `information_schema.columns` r=ZhouXing19 a=e-mbrown

Informs: #87774

The type annotation on `column_default` caused confusion in postgres compatible apps. This change formats the `column_default` without the type annotation. It does not address the incorrect type annotation.

Release note (bug fix): The content of `column_default` in `information_schema.columns` no longer have type annotations.

95855: backupccl: run restore/tpce/32tb/aws/nodes=15/cpus=16 once a week r=benbardin a=msbutler

This patch adds the restore/tpce/32TB/aws/nodes=15/cpus=16 test to our weekly
test suite. The backup fixture was generated by initing a tpce cluster with 2
Million customers, and running this workload while 48 incremental backups were
taken at 15 minute increments. The roachtest restores from a system time
captured in the 24th backup.

The cluster used to restore the backup will run on aws with 15 nodes each with
16 vcpus.

To verify this test exists, run:
`roachtest list tag:weekly`

Release note: None

Epic: none

95861: kv: remove lastToReplica and lastFromReplica from raftMu r=nvanbenschoten a=nvanbenschoten

Extracted from #94165 without modification.

----

In 410ef29, we moved these fields from under the Replica.mu to under the Replica.raftMu. This was done to avoid lock contention.

In this commit, we move these fields under their own mutex so that they can be accessed without holding the raftMu. This allows us to send Raft messages from other goroutines.

The commit also switches from calling RawNode.ReportUnreachable directly in sendRaftMessage to using the more flexible unreachablesMu set, which defers the call to ReportUnreachable until the next Raft tick. As a result, the commit closes #84246.

Release note: None
Epic: None

95876: sql: remove round-trips from common DISCARD ALL r=ajwerner a=ajwerner

#### sql: avoid checking if a role exists when setting to the current role

This is an uncached round-trip. It makes `DISCARD` expensive.
In #86485 we implemented
`SET SESSION AUTHORIZATION DEFAULT`, this added an extra sql query to
`DISCARD ALL` to check if the role we'd become exists. This is almost
certainly unintentional in the case where the role we'd become is the
current session role. I think this just happened because of code
consolidation.

We now no longer check if the current session role exists. This removes
the last round-trip from DISCARD ALL.

#### sql: use in-memory session data to decide what to do in DISCARD

In #86246 we introduced logic to discard schemas when running DISCARD ALL and DISCARD TEMP. This logic did expensive kv operations unconditionally; if the session knew it had never created a temporary schema, we'd still fetch all databases and proceed to search all databases for a temp schema. This is very expensive.

Fixes: #95864

Release note (performance improvement): In 22.2, logic was added to make
`SET SESSION AUTHORIZATION DEFAULT` not a no-op. This implementation used
more general code for setting the role for a session which made sure that
the role exists. The check for whether a role exists is currently uncached.
We don't need to check if the role we already are exists. This improves the
performance of `DISCARD ALL` in addition to `SET SESSION AUTHORIZATION
DEFAULT`.


Release note (performance improvement): In 22.2, we introduced support for `DISCARD TEMP` and made `DISCARD ALL` actually discard temp tables. This implementation ran expensive logic to discover temp schemas rather than consulting in-memory data structures. As a result, `DISCARD ALL`, which is issued regularly by connection pools, became an expensive operation when it should be cheap. This problem is now resolved.

95997: ui: hide list of statement for non-admins r=maryliag a=maryliag

To get the list of fingerprints used by an index, we use the `system.statement_statistics` directly, but non-admins don't have access to system table.
Until #95756 or #95770 are done, we need to hide
this feature for non-admins.

Part Of #93087

Release note (ui change): Hide list of used fingerprint per index on Index Details page, for non-admin users.

95998: roachtest: update activerecord blocklist r=ZhouXing19 a=andyyang890

This patch removes a few tests from the blocklist that do not
exist in the test suite for rails 7.0.3, which were being run
prior to the fix for version pinning.

Informs #94211

Release note: None

96003: kv: add MVCC local timestamp details to ReadWithinUncertaintyIntervalError r=nvanbenschoten a=nvanbenschoten

This PR adds details about MVCC local timestamps to ReadWithinUncertaintyIntervalErrors. This provides more information about the cause of uncertainty errors.

The PR also includes a series of minor refactors to clean up this code.

Release note: None
Epic: None

Co-authored-by: e-mbrown <ebsonari@gmail.com>
Co-authored-by: Michael Butler <butler@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Andrew Werner <awerner32@gmail.com>
Co-authored-by: maryliag <marylia@cockroachlabs.com>
Co-authored-by: Andy Yang <yang@cockroachlabs.com>
@craig craig bot closed this as completed in 73befd9 Jan 26, 2023
SQL Sessions - Deprecated automation moved this from Triage to Done Jan 26, 2023
@ajwerner ajwerner reopened this Jan 26, 2023
SQL Sessions - Deprecated automation moved this from Done to Triage Jan 26, 2023
@ajwerner
Copy link
Contributor

Re-opening until the backport merges.

@ajwerner ajwerner self-assigned this Jan 26, 2023
ajwerner added a commit to ajwerner/cockroach that referenced this issue Jan 27, 2023
In cockroachdb#86246 we introduced logic to discard schemas when running DISCARD ALL and
DISCARD TEMP. This logic did expensive kv operations unconditionally; if the
session knew it had never created a temporary schema, we'd still fetch all
databases and proceed to search all databases for a temp schema. This is very
expensive.

Fixes: cockroachdb#95864

Release note (performance improvement): In 22.2, we introduced support for
`DISCARD TEMP` and made `DISCARD ALL` actually discard temp tables. This
implementation ran expensive logic to discover temp schemas rather than
consulting in-memory data structures. As a result, `DISCARD ALL`, which
is issued regularly by connection pools, became an expensive operation
when it should be cheap. This problem is now resolved.
@ajwerner
Copy link
Contributor

Fixed by #96102.

SQL Sessions - Deprecated automation moved this from Triage to Done Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants