Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update optimizer docs for 20190114 alpha #4239

Merged
merged 1 commit into from
Jan 16, 2019

Conversation

rmloveland
Copy link
Contributor

Fixes #3996, #3998, #4238.

Summary of changes:

  • Add section about table statistics, including:

    • How to generate statistics manually

    • How to use the new automatic statistics feature

  • Add section describing the new query plan cache and how to use it.

  • Add section listing the types of statements supported by the optimizer. This is not exhaustive, but is meant to be a quick list that a user can scan to get a sense of what is supported. It points the user to the "View query plan" section which has instructions for checking whether their specific query of interest will be run with the optimizer. When the heuristic planner is eventually removed, this section can go away.

  • Further deemphasize the heuristic planner by:

    • No longer mentioning that the optimizer is "Enabled by default", since its use is assumed in 2.2+.

    • Moving the instructions for turning off the CBO to the bottom of the page.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rmloveland rmloveland force-pushed the optimizer-january-alpha-updates branch from 3e3d38b to 6fcaad0 Compare January 14, 2019 22:30
@rmloveland
Copy link
Contributor Author

@rmloveland
Copy link
Contributor Author

@andy-kimball I added a section to the doc entitled Types of statements supported by the optimizer. It attempts to document the optimizer's CRUD support at a very high level with a basic list of statements, and points the user to the EXPLAIN (OPT) instructions to check their specific query.

Please take a look and let me know what you think. I'm sure it can be improved; at the least it provides something to point users to for basic guidance. Any feedback is appreciated.

@rmloveland
Copy link
Contributor Author

@RaduBerinde I added a section to the doc entitled Query plan cache. Please take a look and let me know what you think. Suggestions for improvement are welcome.

@rmloveland
Copy link
Contributor Author

@andy-kimball or @RaduBerinde :

Would one of you be willing to review the new section on statistics (manual and automatic) since Rebecca is out this week?

(Sorry for putting several things into this PR all at once - figured it would be quicker to match the timing of the 20190114 alpha.)

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

I reviewed all of it. Looks good to me, just a few minor comments.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


v2.2/cost-based-optimizer.md, line 80 at r1 (raw file):

- [`INSERT`](insert.html)
- [`UPSERT`](upsert.html)
- [Window functions](window-functions.html)

window functions are not supported yet


v2.2/cost-based-optimizer.md, line 99 at r1 (raw file):

### Manually generating table statistics

For each column in your schema, run a [`CREATE STATISTICS`](create-statistics.html) statement like the one shown below.

In v2.2, there is a version of this statement CREATE STATISTICS FROM table which automatically figures out which columns to get stats on (specifically: columns that are part of the primary key or an index). I would present that first and then note the other form for more advanced users. Note that in most cases we wouldn't want to run it on all columns in the table.


v2.2/cost-based-optimizer.md, line 116 at r1 (raw file):

    {% include copy-clipboard.html %}
    ~~~ sql
    > CREATE STATISTICS __auto__ FROM table1;  -- Repeat for table2, table3, ..., tableN.

This is the form I mentioned above


v2.2/cost-based-optimizer.md, line 126 at r1 (raw file):

    ~~~

## Query plan cache

This section looks great!


v2.2/cost-based-optimizer.md, line 128 at r1 (raw file):

## Query plan cache

<span class="version-tag">New in v2.2</span>: CockroachDB can use a cache for the query plans generated by the optimizer.  This can lead to faster query execution since the database can reuse a query plan that was previously calculated, rather than computing a new plan for a query that has already been executed.

[nit] maybe "computing a new plan each time a query is executed"


v2.2/cost-based-optimizer.md, line 168 at r1 (raw file):

## Known limitations

- Some features are not supported by the cost-based optimizer; however, the optimizer will fall back to the heuristic planner for this functionality. If performance in the new alpha is worse than v2.you can [turn the optimizer off](#how-to-turn-the-optimizer-off) to manually force it to fallback to the heuristic planner.

v2.0?

Copy link

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few more comments as well, but overall :lgtm:.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained


v2.2/cost-based-optimizer.md, line 76 at r1 (raw file):

The cost-based optimizer supports most SQL statements. Specifically, the following types of statements are supported:

- [`CREATE TABLE`](create-table.html)

DELETE can also be on this list.


v2.2/cost-based-optimizer.md, line 79 at r1 (raw file):

- [`UPDATE`](update.html)
- [`INSERT`](insert.html)
- [`UPSERT`](upsert.html)

UPDATE, UPSERT, and DELETE currently require setting experimental_optimizer_mutations=true. The hope is that this will no longer be necessary after the current milestone.


v2.2/cost-based-optimizer.md, line 88 at r1 (raw file):

## Table statistics

In order for the cost-based optimizer to find a performant execution plan for a given query, it needs access to statistical data on the contents of your database's tables.  This statistical data needs to be generated from scratch for new tables, and regenerated periodically for existing tables.

I'd soften this, since a lot of time the optimizer can find a good plan without stats. Maybe something like The cost-based optimizer can often find more performant query execution plans if it has access to the statistical...

@rmloveland rmloveland force-pushed the optimizer-january-alpha-updates branch from 6fcaad0 to 62d2731 Compare January 15, 2019 20:51
Copy link
Contributor Author

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained


v2.2/cost-based-optimizer.md, line 76 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

DELETE can also be on this list.

Fixed.


v2.2/cost-based-optimizer.md, line 79 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

UPDATE, UPSERT, and DELETE currently require setting experimental_optimizer_mutations=true. The hope is that this will no longer be necessary after the current milestone.

Andy, on my local build running what I believe is the alpha SHA 5058e4a, this variable appears to be called experimental_optimizer_updates (at least, experimental_optimizer_mutations does not appear in SHOW ALL). Can you confirm that is the correct name of the setting? Or perhaps I've build the wrong SHA?

Fixed by updated the doc with this info (modulo confirming the setting var name).


v2.2/cost-based-optimizer.md, line 80 at r1 (raw file):

Previously, RaduBerinde wrote…

window functions are not supported yet

Fixed.


v2.2/cost-based-optimizer.md, line 88 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I'd soften this, since a lot of time the optimizer can find a good plan without stats. Maybe something like The cost-based optimizer can often find more performant query execution plans if it has access to the statistical...

Fixed by using your suggested edit - thanks!


v2.2/cost-based-optimizer.md, line 99 at r1 (raw file):

Previously, RaduBerinde wrote…

In v2.2, there is a version of this statement CREATE STATISTICS FROM table which automatically figures out which columns to get stats on (specifically: columns that are part of the primary key or an index). I would present that first and then note the other form for more advanced users. Note that in most cases we wouldn't want to run it on all columns in the table.

Fixed, thanks for the heads up about that variant of CREATE STATISTICS. Turns out it isn't documented yet, so I in addition to the change here, I filed #4241.


v2.2/cost-based-optimizer.md, line 116 at r1 (raw file):

Previously, RaduBerinde wrote…

This is the form I mentioned above

Aha!


v2.2/cost-based-optimizer.md, line 126 at r1 (raw file):

Previously, RaduBerinde wrote…

This section looks great!

Thanks!


v2.2/cost-based-optimizer.md, line 128 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] maybe "computing a new plan each time a query is executed"

Fixed by updating to your suggested phrasing.


v2.2/cost-based-optimizer.md, line 168 at r1 (raw file):
Weird typo!

Fixed by updating to be a bit more generic (since we're already in a 2.2 world) to say

If performance is worse than in previous versions of CockroachDB, you can turn the optimizer off...

Of course I'd prefer to remove this line altogether as soon as possible :-)

Copy link

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale)


v2.2/cost-based-optimizer.md, line 76 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Fixed.

Actually, I think DELETE must not be there; it must have gotten in after we cut the build.


v2.2/cost-based-optimizer.md, line 79 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Andy, on my local build running what I believe is the alpha SHA 5058e4a, this variable appears to be called experimental_optimizer_updates (at least, experimental_optimizer_mutations does not appear in SHOW ALL). Can you confirm that is the correct name of the setting? Or perhaps I've build the wrong SHA?

Fixed by updated the doc with this info (modulo confirming the setting var name).

Ah, I must have changed this after we cut the build. So you're correct.

Fixes #3996, #3998, #4238.

Summary of changes:

- Add section about table statistics, including:

   - How to generate statistics manually

   - How to use the new automatic statistics feature

- Add section describing the new query plan cache and how to use it.

- Add section listing the types of statements supported by the
  optimizer.  This is not exhaustive, but is meant to be a quick list of
  the types of statements supported by the optimizer.  It points the
  user to the "View query plan" section which has instructions for
  checking whether their query will be run with the optimizer.  When the
  heuristic planner is removed, this section can go away.

- Further deemphasize the heuristic planner by:

   - No longer mentioning that the optimizer is "Enabled by default",
     since its use is assumed in 2.2+.

   - Moving the instructions for turning off the CBO to the bottom of
     the page.
@rmloveland rmloveland force-pushed the optimizer-january-alpha-updates branch from 62d2731 to 255886d Compare January 16, 2019 15:27
Copy link
Contributor Author

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale)


v2.2/cost-based-optimizer.md, line 76 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Actually, I think DELETE must not be there; it must have gotten in after we cut the build.

OK, removed, thank you.


v2.2/cost-based-optimizer.md, line 79 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Ah, I must have changed this after we cut the build. So you're correct.

OK thanks!

@rmloveland rmloveland merged commit 5b9ed73 into master Jan 16, 2019
@rmloveland rmloveland deleted the optimizer-january-alpha-updates branch January 16, 2019 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants