sql: optimize query for find-running-jobs-of-type #107405

mgartner · 2023-07-21T22:43:10Z

The RunningJobExists function issues this query to check for running auto-stats jobs:

SELECT id FROM public.jobs
WHERE job_type IN ('CREATE STATS', 'AUTO CREATE STATS')
  AND status IN ('running', 'pending', 'cancel-requested', 'pause-requested', 'reverting', 'paused')
ORDER BY created
LIMIT 1;

The query plan includes an inefficient index join:

  planning time: 750µs
  execution time: 617ms
  distribution: local
  vectorized: true
  rows read from KV: 101,520 (9.9 MiB, 25 gRPC calls)
  cumulative time spent in KV: 614ms
  maximum memory usage: 60 MiB
  network usage: 0 B (0 messages)
  sql cpu time: 84ms
  regions: aws-us-east-1

  • top-k
  │ nodes: n6
  │ regions: aws-us-east-1
  │ actual row count: 0
  │ estimated max memory allocated: 0 B
  │ estimated max sql temp disk usage: 0 B
  │ sql cpu time: 18µs
  │ order: +created
  │ k: 1
  │
  └── • filter
      │ nodes: n6
      │ regions: aws-us-east-1
      │ actual row count: 0
      │ sql cpu time: 2ms
      │ filter: status IN ('cancel-requested', 'pause-requested', 'paused', 'pending', 'reverting', 'running')
      │
      └── • index join
          │ nodes: n6
          │ regions: aws-us-east-1
          │ actual row count: 50,760
          │ KV time: 413ms
          │ KV contention time: 0µs
          │ KV rows read: 50,760
          │ KV bytes read: 7.3 MiB
          │ KV gRPC calls: 24
          │ estimated max memory allocated: 57 MiB
          │ estimated max sql temp disk usage: 0 B
          │ sql cpu time: 72ms
          │ table: jobs@primary
          │
          └── • scan
                nodes: n6
                regions: aws-us-east-1
                actual row count: 50,760
                KV time: 201ms
                KV contention time: 0µs
                KV rows read: 50,760
                KV bytes read: 2.7 MiB
                KV gRPC calls: 1
                estimated max memory allocated: 2.7 MiB
                sql cpu time: 11ms
                missing stats
                table: jobs@jobs_job_type_idx
                spans: [/'AUTO CREATE STATS' - /'AUTO CREATE STATS'] [/'CREATE STATS' - /'CREATE STATS']

This can cause excessive resource utilization when there are many jobs in the jobs table and the query is run frequently. There's a few potential ways we can make this more efficient:

Omit ORDER BY created from the query plan. This should make the optimizer prefer a full table scan, which, as long as the table isn't HUGE, should be faster than the plan with the index join.
Add an index on (job_type, status) and omit ORDER BY created from the query.
Add an index on (job_type, status) STORING (created) to avoid the index join.

Jira issue: CRDB-30024

The text was updated successfully, but these errors were encountered:

mgartner · 2023-07-21T22:46:16Z

Removing the ORDER BY created may not be possible due to this: https://github.com/cockroachdb/cockroach/blob/92b05ab40ecf023b9809593e65f0f20302d1a82e/pkg/jobs/utils.go#L86C1-L90.

But it might be ok to do so when ignoreJobID is jobspb.InvalidJobID.

This commit improves the query plan for the `find-running-jobs-of-type` query by removing an `ORDER BY` clause if a job ID to ignore was not given. This can eliminate an index join from the query plan in some cases, making the query play more efficient. Informs cockroachdb#107405 Release note: None

This commit fixes an oversight in cockroachdb#102637 which intended to enable stats collection on the jobs table, but was not successful. I've manually confirmed that stats are now collected on the jobs table in a local cluster: ``` 888074673664065537 | AUTO CREATE STATS | Table statistics refresh for system.public.jobs | CREATE STATISTICS __auto__ FROM [15] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | NULL | 2023-08-03 19:01:22.343 ``` Informs cockroachdb#107405 Release note (performance improvement): We now automatically collect table statistics on the system.jobs table, which will enable the optimizer to produce better query plans for internal queries that access the system.jobs table. This may result in better performance of the system. Note: a previous attempt to enable stats on system.jobs starting in 23.1.0 was unsuccessful, but we have now fixed the oversight.

105477: sql: add plan gist matching to stmt diagnostics feature r=yuzefovich a=yuzefovich This commit extends the stmt diagnostics feature to add optional plan-gist-based matching. Previously, we filtered statements based only on the fingerprint but now we can optionally ask for a particular plan (by specifying the target plan gist). All other aspects of the feature (minimum execution latency, sampling probability) are unaffected. The caveat to the implementation is that the plan gist of the running statement is available after the optimizer has done its part, so whenever plan-gist-based matching is desired, the trace will not include the optimizer part as well as the plan string won't be available. This commit also made a minor change to always store the memo and the opt planning catalog in `planTop`. Previously, this was stored only when the bundle collection is enabled, but we now can enable it after the optimizer, at which point the memo and the catalog might be lost. The optimizer now stores it unconditionally, but then if we choose to not collect the bundle once the plan gist is available, we release these things. This allows us to still get `opt` files in the bundle. Epic: None Addresses: #96765. Addresses: #103018. Release note (sql change): Statement diagnostics feature has been extended to support collecting a bundle for a particular plan. Namely, the existing fingerprint-based matching has been extended to also include plan-gist-based matching. Such bundle will miss a couple of things: `plan.txt` file as well as the tracing of the optimizer. At the moment, the feature is only exposed via an overload to `crdb_internal.request_statement_bundle` builtin function. We now also support "anti-match" - i.e. collecting a bundle for any plan other than the provided plan gist. 108139: sql: fix logic to collect stats on system.jobs r=rytaft a=rytaft This commit fixes an oversight in #102637 which intended to enable stats collection on the jobs table, but was not successful. I've manually confirmed that stats are now collected on the jobs table in a local cluster: ``` 888074673664065537 | AUTO CREATE STATS | Table statistics refresh for system.public.jobs | CREATE STATISTICS __auto__ FROM [15] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | NULL | 2023-08-03 19:01:22.343 ``` Informs #107405 Release note (performance improvement): We now automatically collect table statistics on the `system.jobs` table, which will enable the optimizer to produce better query plans for internal queries that access the `system.jobs` table. This may result in better performance of the system. Note: a previous attempt to enable stats on `system.jobs` starting in 23.1.0 was unsuccessful, but we have now fixed the oversight. Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>

This commit fixes an oversight in cockroachdb#102637 which intended to enable stats collection on the jobs table, but was not successful. I've manually confirmed that stats are now collected on the jobs table in a local cluster: ``` 888074673664065537 | AUTO CREATE STATS | Table statistics refresh for system.public.jobs | CREATE STATISTICS __auto__ FROM [15] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | NULL | 2023-08-03 19:01:22.343 ``` Informs cockroachdb#107405 Release note (performance improvement): We now automatically collect table statistics on the system.jobs table, which will enable the optimizer to produce better query plans for internal queries that access the system.jobs table. This may result in better performance of the system. Note: a previous attempt to enable stats on system.jobs starting in 23.1.0 was unsuccessful, but we have now fixed the oversight.

mgartner · 2023-09-18T20:21:26Z

I'm applying the release-blocker label so this won't get lost. I really think we need to do something here before the release.

blathers-crl · 2023-09-18T20:21:39Z

Hi @mgartner, please add branch-* labels to identify which branch(es) this release-blocker affects.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

michae2 · 2023-10-05T21:05:28Z

[triage] we discussed during our team meeting today, plan is to merge this PR with an additional query hint, and then make the index change in 24.1.

This commit improves the query plan for the `find-running-jobs-of-type` query by adding a hint to use the `jobs_status_created_idx` and by removing an `ORDER BY` clause if a job ID to ignore was not given. This can eliminate an index join from the query plan in some cases, making the query play more efficient. Informs cockroachdb#107405 Release note: None

107589: sql: improve query plan for find-running-job-of-type query r=mgartner a=mgartner This commit improves the query plan for the `find-running-jobs-of-type` query by adding a hint to use the `jobs_status_created_idx` and by removing an `ORDER BY` clause if a job ID to ignore was not given. This can eliminate an index join from the query plan in some cases, making the query plan more efficient. Informs #107405 Release note: None 111905: streamingccl: physical replication stream from given timestamp r=adityamaru a=stevendanna This adds a RESUME TIMESTAMP option to CREATE VIRTUAL CLUSTER FROM REPLICATION. When provided, we allow the user to start a replication stream into an _existing_ virtual cluster. When the resume timestamp is provided, the replication stream will be started from that timestamp, with no initial scan. To facilitate this, we add a new argument to crdb_internal.start_replication_stream that allows us to pass a destination-choosen start timestamp. To avoid various catastrophic mistakes, we only allow this when: - The destination tenant must be in service mode None - The provided resume timestamp equals the last recorded "revert timestamp" of the destination tenant. The revert timestamp is set when the tenant has been forcibly reverted to a particular timestamp and is cleared when the tenant is modified in a way that may invalidate a resumption from that timestamp. - If the source tenant has a PreviousSourceTenant set, the new destination must match that previous source tenant field. WARNING: Using this correctly requires that the stream is resumed before garbage collection has progressed past the given resume timestamp. During normal operation, the replication stream maintains a protected timestamp to ensure this is the case. However, when resuming using this new feature, we have no such guarantee. Epic: none Release note: None Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com>

mgartner · 2023-10-19T13:55:39Z

#107589 has been merged so I'm removing the release-blocker label and moving this to the 24.1 bucket.

mgartner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 21, 2023

mgartner added this to Triage in SQL Queries via automation Jul 21, 2023

blathers-crl bot added the T-sql-queries SQL Queries Team label Jul 21, 2023

mgartner mentioned this issue Jul 26, 2023

sql: improve query plan for find-running-job-of-type query #107589

Merged

DrewKimball assigned mgartner Aug 1, 2023

rytaft mentioned this issue Aug 3, 2023

sql: fix logic to collect stats on system.jobs #108139

Merged

mgartner added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Sep 18, 2023

mgartner added the branch-master Failures on the master branch. label Sep 18, 2023

mgartner removed their assignment Oct 10, 2023

mgartner removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Oct 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: optimize query for find-running-jobs-of-type #107405

sql: optimize query for find-running-jobs-of-type #107405

mgartner commented Jul 21, 2023 •

edited by cockroach-jira-scripts

mgartner commented Jul 21, 2023 •

edited

mgartner commented Sep 18, 2023

blathers-crl bot commented Sep 18, 2023

michae2 commented Oct 5, 2023

mgartner commented Oct 19, 2023

sql: optimize query for find-running-jobs-of-type #107405

sql: optimize query for find-running-jobs-of-type #107405

Comments

mgartner commented Jul 21, 2023 • edited by cockroach-jira-scripts

mgartner commented Jul 21, 2023 • edited

mgartner commented Sep 18, 2023

blathers-crl bot commented Sep 18, 2023

michae2 commented Oct 5, 2023

mgartner commented Oct 19, 2023

mgartner commented Jul 21, 2023 •

edited by cockroach-jira-scripts

mgartner commented Jul 21, 2023 •

edited