Skip to content

Commit

Permalink
Update optimizer docs for 20190115 alpha
Browse files Browse the repository at this point in the history
Fixes #3996, #3998, #4238.

Summary of changes:

- Add section about table statistics, including:

   - How to generate statistics manually

   - How to use the new automatic statistics feature

- Add section describing the new query plan cache and how to use it.

- Add section listing the types of statements supported by the
  optimizer.  This is not exhaustive, but is meant to be a quick list of
  the types of statements supported by the optimizer.  It points the
  user to the "View query plan" section which has instructions for
  checking whether their query will be run with the optimizer.  When the
  heuristic planner is removed, this section can go away.

- Further deemphasize the heuristic planner by:

   - No longer mentioning that the optimizer is "Enabled by default",
     since its use is assumed in 2.2+.

   - Moving the instructions for turning off the CBO to the bottom of
     the page.
  • Loading branch information
rmloveland committed Jan 16, 2019
1 parent 85681f3 commit 255886d
Showing 1 changed file with 78 additions and 5 deletions.
83 changes: 78 additions & 5 deletions v2.2/cost-based-optimizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ redirect_from: sql-optimizer.html

The cost-based optimizer seeks the lowest cost for a query, usually related to time.

In versions 2.1 and later, CockroachDB's **cost-based optimizer is enabled by default**. In versions prior to v2.1, a heuristic planner was used to generate query execution plans. The heuristic planner will only be used in the following cases:
In versions prior to 2.1, a heuristic planner was used to generate query execution plans. The heuristic planner is only used in the following cases:

- If your query uses functionality that is not supported by the cost-based optimizer
- If you explicitly [turn off the cost-based optimizer](#how-to-turn-the-optimizer-off)
- If your query uses functionality that is not yet supported by the cost-based optimizer. For more information about the types of queries that are supported, see [Types of statements supported by the cost-based optimizer](#types-of-statements-supported-by-the-cost-based-optimizer).
- If you explicitly turn off the optimizer. For more information, see [How to turn the optimizer off](#how-to-turn-the-optimizer-off).

{% include {{ page.version.version }}/misc/beta-warning.md %}

Expand Down Expand Up @@ -69,6 +69,80 @@ In contrast, this query returns `pq: unsupported statement: *tree.Insert`, which
pq: unsupported statement: *tree.Insert
~~~

## Types of statements supported by the cost-based optimizer

The cost-based optimizer supports most SQL statements. Specifically, the following types of statements are supported:

- [`CREATE TABLE`](create-table.html)
- [`INSERT`](insert.html)
- [Sequences](create-sequence.html)
- [Views](views.html)

The following additional statements are supported by the optimizer if you set the `experimental_optimizer_updates` [cluster setting](set-cluster-setting.html) to `true`:

- [`UPDATE`](update.html)
- [`UPSERT`](upsert.html)

For instructions showing how to check whether a particular query will be run with the cost-based optimizer, see the [View query plan](#view-query-plan) section.

## Table statistics

The cost-based optimizer can often find more performant query execution plans if it has access to statistical data on the contents of your database's tables. This statistical data needs to be generated from scratch for new tables, and regenerated periodically for existing tables.

There are several ways to generate table statistics:

1. Run the [`CREATE STATISTICS`](create-statistics.html) statement manually.
2. <span class="version-tag">New in v2.2</span> Enable the automatic table statistics feature.

Each method is described below.

### Manually generating table statistics

To manually generate statistics for a table, run a [`CREATE STATISTICS`](create-statistics.html) statement like the one shown below. It automatically figures out which columns to get statistics on -- specifically, it chooses columns which are part of the primary key or an index.

{% include copy-clipboard.html %}
~~~ sql
> CREATE STATISTICS __auto__ FROM employees;
~~~

### Automatic table statistics

<span class="version-tag">New in v2.2</span>: CockroachDB can generate table statistics automatically as tables are updated.

To turn on this feature:

1. For each table in the database, run [`CREATE STATISTICS`](create-statistics.html) manually **before** enabling the automatic statistics flag. This is necessary to prevent the system from getting too overloaded right after the feature is enabled.

{% include copy-clipboard.html %}
~~~ sql
> CREATE STATISTICS __auto__ FROM table1; -- Repeat for table2, table3, ..., tableN.
~~~

2. Run the following statement to turn on the automatic statistics system:

{% include copy-clipboard.html %}
~~~ sql
> SET sql.defaults.experimental_automatic_statistics=true
~~~

## Query plan cache

<span class="version-tag">New in v2.2</span>: CockroachDB can use a cache for the query plans generated by the optimizer. This can lead to faster query execution since the database can reuse a query plan that was previously calculated, rather than computing a new plan each time a query is executed.

The query plan cache is disabled by default. To enable it, execute the following statement:

{% include copy-clipboard.html %}
~~~ sql
> SET sql.query_cache.enabled=true;
~~~

{{site.data.alerts.callout_info}}
The query plan cache is still under development and has the following limitations:
- The cache is only used for non-prepared statements (i.e., queries that correspond to the ["simple query" pgwire message](https://www.postgresql.org/docs/10/protocol-flow.html#id-1.10.5.7.4)).
- The cache has simplistic memory management: it uses a fixed number of "slots" and rejects all plans above a certain size.
- If you use the query plan cache in conjunction with table statistics, cached plans do not yet get invalidated when new statistics are created.
{{site.data.alerts.end}}

## How to turn the optimizer off

With the optimizer turned on, the performance of some workloads may change. If your workload performs worse than expected (e.g., lower throughput or higher latency), you can turn off the cost-based optimizer and use the heuristic planner.
Expand All @@ -93,8 +167,7 @@ Changing the cluster setting does not immediately turn the optimizer off; instea

## Known limitations

- The cost-based optimizer will not support automated use of statistics during this time period. To manually generate table statistics, use the [`CREATE STATISTICS` statement](create-statistics.html).
- Some features present in v2.0 are not supported by the cost-based optimizer; however, the optimizer will fall back to the v2.0 code path for this functionality. If performance in the new alpha is worse than v2.0, you can [turn the optimizer off](#how-to-turn-the-optimizer-off) to manually force it to fallback to the heuristic planner.
- Some features are not supported by the cost-based optimizer; however, the optimizer will fall back to the heuristic planner for this functionality. If performance is worse than in previous versions of CockroachDB, you can [turn the optimizer off](#how-to-turn-the-optimizer-off) to manually force it to fallback to the heuristic planner.
- Some [correlated subqueries](subqueries.html#correlated-subqueries) are not supported by the cost-based optimizer yet. If you come across an unsupported correlated subquery, please [file a Github issue](file-an-issue.html).

## See also
Expand Down

0 comments on commit 255886d

Please sign in to comment.