-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update optimizer docs for 20190114 alpha #4239
Conversation
3e3d38b
to
6fcaad0
Compare
Direct link to the updated optimizer doc for reference: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/6fcaad08147d401fbb1a459617371694475c0529/dev/cost-based-optimizer.html |
@andy-kimball I added a section to the doc entitled Types of statements supported by the optimizer. It attempts to document the optimizer's CRUD support at a very high level with a basic list of statements, and points the user to the Please take a look and let me know what you think. I'm sure it can be improved; at the least it provides something to point users to for basic guidance. Any feedback is appreciated. |
@RaduBerinde I added a section to the doc entitled Query plan cache. Please take a look and let me know what you think. Suggestions for improvement are welcome. |
@andy-kimball or @RaduBerinde : Would one of you be willing to review the new section on statistics (manual and automatic) since Rebecca is out this week? (Sorry for putting several things into this PR all at once - figured it would be quicker to match the timing of the 20190114 alpha.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed all of it. Looks good to me, just a few minor comments.
Reviewable status: complete! 1 of 0 LGTMs obtained
v2.2/cost-based-optimizer.md, line 80 at r1 (raw file):
- [`INSERT`](insert.html) - [`UPSERT`](upsert.html) - [Window functions](window-functions.html)
window functions are not supported yet
v2.2/cost-based-optimizer.md, line 99 at r1 (raw file):
### Manually generating table statistics For each column in your schema, run a [`CREATE STATISTICS`](create-statistics.html) statement like the one shown below.
In v2.2, there is a version of this statement CREATE STATISTICS FROM table
which automatically figures out which columns to get stats on (specifically: columns that are part of the primary key or an index). I would present that first and then note the other form for more advanced users. Note that in most cases we wouldn't want to run it on all columns in the table.
v2.2/cost-based-optimizer.md, line 116 at r1 (raw file):
{% include copy-clipboard.html %} ~~~ sql > CREATE STATISTICS __auto__ FROM table1; -- Repeat for table2, table3, ..., tableN.
This is the form I mentioned above
v2.2/cost-based-optimizer.md, line 126 at r1 (raw file):
~~~ ## Query plan cache
This section looks great!
v2.2/cost-based-optimizer.md, line 128 at r1 (raw file):
## Query plan cache <span class="version-tag">New in v2.2</span>: CockroachDB can use a cache for the query plans generated by the optimizer. This can lead to faster query execution since the database can reuse a query plan that was previously calculated, rather than computing a new plan for a query that has already been executed.
[nit] maybe "computing a new plan each time a query is executed"
v2.2/cost-based-optimizer.md, line 168 at r1 (raw file):
## Known limitations - Some features are not supported by the cost-based optimizer; however, the optimizer will fall back to the heuristic planner for this functionality. If performance in the new alpha is worse than v2.you can [turn the optimizer off](#how-to-turn-the-optimizer-off) to manually force it to fallback to the heuristic planner.
v2.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few more comments as well, but overall .
Reviewable status: complete! 2 of 0 LGTMs obtained
v2.2/cost-based-optimizer.md, line 76 at r1 (raw file):
The cost-based optimizer supports most SQL statements. Specifically, the following types of statements are supported: - [`CREATE TABLE`](create-table.html)
DELETE
can also be on this list.
v2.2/cost-based-optimizer.md, line 79 at r1 (raw file):
- [`UPDATE`](update.html) - [`INSERT`](insert.html) - [`UPSERT`](upsert.html)
UPDATE
, UPSERT
, and DELETE
currently require setting experimental_optimizer_mutations=true
. The hope is that this will no longer be necessary after the current milestone.
v2.2/cost-based-optimizer.md, line 88 at r1 (raw file):
## Table statistics In order for the cost-based optimizer to find a performant execution plan for a given query, it needs access to statistical data on the contents of your database's tables. This statistical data needs to be generated from scratch for new tables, and regenerated periodically for existing tables.
I'd soften this, since a lot of time the optimizer can find a good plan without stats. Maybe something like The cost-based optimizer can often find more performant query execution plans if it has access to the statistical...
6fcaad0
to
62d2731
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained
v2.2/cost-based-optimizer.md, line 76 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
DELETE
can also be on this list.
Fixed.
v2.2/cost-based-optimizer.md, line 79 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
UPDATE
,UPSERT
, andDELETE
currently require settingexperimental_optimizer_mutations=true
. The hope is that this will no longer be necessary after the current milestone.
Andy, on my local build running what I believe is the alpha SHA 5058e4a
, this variable appears to be called experimental_optimizer_updates
(at least, experimental_optimizer_mutations
does not appear in SHOW ALL
). Can you confirm that is the correct name of the setting? Or perhaps I've build the wrong SHA?
Fixed by updated the doc with this info (modulo confirming the setting var name).
v2.2/cost-based-optimizer.md, line 80 at r1 (raw file):
Previously, RaduBerinde wrote…
window functions are not supported yet
Fixed.
v2.2/cost-based-optimizer.md, line 88 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I'd soften this, since a lot of time the optimizer can find a good plan without stats. Maybe something like
The cost-based optimizer can often find more performant query execution plans if it has access to the statistical...
Fixed by using your suggested edit - thanks!
v2.2/cost-based-optimizer.md, line 99 at r1 (raw file):
Previously, RaduBerinde wrote…
In v2.2, there is a version of this statement
CREATE STATISTICS FROM table
which automatically figures out which columns to get stats on (specifically: columns that are part of the primary key or an index). I would present that first and then note the other form for more advanced users. Note that in most cases we wouldn't want to run it on all columns in the table.
Fixed, thanks for the heads up about that variant of CREATE STATISTICS
. Turns out it isn't documented yet, so I in addition to the change here, I filed #4241.
v2.2/cost-based-optimizer.md, line 116 at r1 (raw file):
Previously, RaduBerinde wrote…
This is the form I mentioned above
Aha!
v2.2/cost-based-optimizer.md, line 126 at r1 (raw file):
Previously, RaduBerinde wrote…
This section looks great!
Thanks!
v2.2/cost-based-optimizer.md, line 128 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] maybe "computing a new plan each time a query is executed"
Fixed by updating to your suggested phrasing.
v2.2/cost-based-optimizer.md, line 168 at r1 (raw file):
Weird typo!
Fixed by updating to be a bit more generic (since we're already in a 2.2 world) to say
If performance is worse than in previous versions of CockroachDB, you can turn the optimizer off...
Of course I'd prefer to remove this line altogether as soon as possible :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)
v2.2/cost-based-optimizer.md, line 76 at r1 (raw file):
Previously, rmloveland (Rich Loveland) wrote…
Fixed.
Actually, I think DELETE
must not be there; it must have gotten in after we cut the build.
v2.2/cost-based-optimizer.md, line 79 at r1 (raw file):
Previously, rmloveland (Rich Loveland) wrote…
Andy, on my local build running what I believe is the alpha SHA
5058e4a
, this variable appears to be calledexperimental_optimizer_updates
(at least,experimental_optimizer_mutations
does not appear inSHOW ALL
). Can you confirm that is the correct name of the setting? Or perhaps I've build the wrong SHA?Fixed by updated the doc with this info (modulo confirming the setting var name).
Ah, I must have changed this after we cut the build. So you're correct.
Fixes #3996, #3998, #4238. Summary of changes: - Add section about table statistics, including: - How to generate statistics manually - How to use the new automatic statistics feature - Add section describing the new query plan cache and how to use it. - Add section listing the types of statements supported by the optimizer. This is not exhaustive, but is meant to be a quick list of the types of statements supported by the optimizer. It points the user to the "View query plan" section which has instructions for checking whether their query will be run with the optimizer. When the heuristic planner is removed, this section can go away. - Further deemphasize the heuristic planner by: - No longer mentioning that the optimizer is "Enabled by default", since its use is assumed in 2.2+. - Moving the instructions for turning off the CBO to the bottom of the page.
62d2731
to
255886d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)
v2.2/cost-based-optimizer.md, line 76 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Actually, I think
DELETE
must not be there; it must have gotten in after we cut the build.
OK, removed, thank you.
v2.2/cost-based-optimizer.md, line 79 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Ah, I must have changed this after we cut the build. So you're correct.
OK thanks!
Fixes #3996, #3998, #4238.
Summary of changes:
Add section about table statistics, including:
How to generate statistics manually
How to use the new automatic statistics feature
Add section describing the new query plan cache and how to use it.
Add section listing the types of statements supported by the optimizer. This is not exhaustive, but is meant to be a quick list that a user can scan to get a sense of what is supported. It points the user to the "View query plan" section which has instructions for checking whether their specific query of interest will be run with the optimizer. When the heuristic planner is eventually removed, this section can go away.
Further deemphasize the heuristic planner by:
No longer mentioning that the optimizer is "Enabled by default", since its use is assumed in 2.2+.
Moving the instructions for turning off the CBO to the bottom of the page.