-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: segregate query plan preparation in a new expandPlan interface. #6622
Conversation
The change is missing proper documentation for
|
@RaduBerinde agreed. I have improved the documentation in the comments accordingly. Also the interface is now named "FinalizePlan" for additional clarity.
|
92749ca
to
f3e7154
Compare
bc79d4c
to
bfd9c35
Compare
I suggest we always rebase on a fresh master when exporting PRs with many commits like this, otherwise it's hard to tell which commits are obsolete and which aren't. (if you rebase on a new master, all the current commits are gonna be a suffix) I understand why we need to have a way to perform just 1-3 (for prepare), but why does
|
Reviewed 1 of 1 files at r1, 22 of 22 files at r2, 2 of 2 files at r3, 23 of 23 files at r4, 1 of 1 files at r5, 5 of 5 files at r6, 3 of 26 files at r8, 24 of 24 files at r9. sql/database.go, line 91 [r6] (raw file):
"naked" returns are generally discouraged in Go unless there is a particularly good reason for them. sql/limit.go, line 98 [r5] (raw file):
Why un-embed the sql/plan.go, line 438 [r2] (raw file):
I like how this comment specifies that sql/table.go, line 101 [r6] (raw file):
What is the point of defining these interfaces if they are never used for indirection? Just to logically group behavior together? sql/update.go, line 292 [r9] (raw file):
Will this change enable us to clean this second Comments from Reviewable |
There are only 2 commits in this PR. I will try rebasing to clean up the mess as you suggest.
|
Review status: 0 of 27 files reviewed at latest revision, 17 unresolved discussions, all commit checks successful. sql/plan.go, line 60 [r14] (raw file):
index selection could affect sql/plan.go, line 64 [r14] (raw file):
This may depend on the index, so it's not really available until after sql/plan.go, line 91 [r14] (raw file):
Thanks for the new comments! They look great. Also agree with the One suggestion: could we reorder the functions according to how "early" they are available? First Comments from Reviewable |
I changed the interface name again, this time to expandPlan(), thanks to a suggestion by @petermattis that the word "Finalize" reminds the reader of finalizers and suggests the method is to be called after the node is not used any more. Also I implemented @andreimatei's suggestions to make expandPlan() invisible to the users of makePlan().
|
Sshould we review only the last 2 of the 5 commits? Are the other three part of another PR (or they will be)?
|
d77f84d
to
2064cdc
Compare
No actually all these commits belong here. You can review them separately though.
|
Ok cool.
|
c3820ee
to
fc19236
Compare
|
d63369d
to
7230426
Compare
Review status: 1 of 27 files reviewed at latest revision, 8 unresolved discussions, some commit checks pending. sql/plan.go, line 45 [r26] (raw file):
|
Please include rationale in the following commit messages:
The interfaces-for-planner commit is incredibly hard to review. Is it all code movement? the commit message suggests there is some refactoring involved, but the size of the diff makes it quite hard to spot the relevant parts. I'm also 👎 on that commit generally for reasons pointed out by @andreimatei.
|
I have added the requested text in the commit messages. Also clarified the "interfaces-for-planner" commit that it does not contain any logic changes. I did not know that the word "refactoring" necessarily implies logic changes; I removed that word entirely from the commit summary. I emphasize with the fact this is somewhat harder to review and I am willing to make steps to alleviate that. Meanwhile, introducing interfaces purely for documentation purposes has net benefits (improving code navigability and grokability), zero performance impact, so ... I am strongly in favor of keeping this in and even extending the practice to other places. I gather that this looks & feels new to you but I still haven't heard arguments against.
|
Well, the primary argument against is the presence of more code which is effectively dead. Worse, the presence of this code may encourage people to use it, which would have a performance impact.
|
The code is not dead because the next 2 lines assert that &planner{} Then about using the interface instead of the object... I'm not sure What is the problem you are trying to prevent? |
I don't have an objection to the interfaces in this case. It's not a common pattern but I see it as a "blueprint" for how to later split planner into multiple components. I like that the functions are all documented in one place (and it won't bitrot like a huge comment or document). I would be weary of extending this practice though, as it has a few disadvantages:
Just my .02.
|
- makePlan() is used by the Executor and other places to translate a statement to a query plan. - newPlan() is used recursively by the plan node constructors for intermediate syntax nodes.
Prior to this patch distinctNode would (ab)use the Go implicit delegation syntax to automatically import the methods of its sub-node. This makes reading the code slightly less straightforward (nearly all other nodes use an explicit delegation, so this comes as an exception) and makes debugging objectively harder since the distinctNode disappears from the stack trace entirely for the methods so "inherited". This patch alleviates the issue by ensuring that distinctNode implements all the planNode interface itself.
Prior to this patch limitNode would (ab)use the Go implicit delegation syntax to automatically import the methods of its sub-node. This makes reading the code slightly less straightforward (nearly all other nodes use an explicit delegation, so this comes as an exception) and makes debugging objectively harder since the limitNode disappears from the stack trace entirely for the methods so "inherited". This patch alleviates the issue by ensuring that limitNode implements all the planNode interface itself.
NB: This patch only moves code around to different files, changes comments and adds non-functional interfaces (= no logic changes). The "planner" object in the sql package is a bit a showcase of the "God class" antipattern. This patch alleviates this situation by re-organizing the services of the planner in separate files that highlight its separate roles: - its role as manager of DB descriptors, via a DescriptorAccessor interface which documents all the related methods (in descriptor.go). - its role as accessor for database descriptors via a DatabaseAccessor interface which documents all the related methods (in database.go). - its role as accessor for table descriptors and table schema updates, via a SchemaAccessor interface which documents all the related methods (in table.go). The planner constructor and struct definition is moved away from `plan.go` into a new `planner.go`. `plan.go` remains to define the planNode interface and assert its implementation by the various concrete structs. The methods corresponding to each sub-interface are also moved to the primary source file for that interface.
The makePlan() interface was not well designed, it had too much responsibility. There are really 6 semantic phases on SQL statements: 1. structural syntax checking. 2. gathering the table descriptors and resolving qualified names. 3. type inference and type checking. 4. normalization and substituting placeholders with actual values. 5. building the query plan, including index selection. 6. running the query plan. The pgwire Prepare phase needs to do steps 1-3 only. The EXPLAIN statement needs to do steps 1-5 only. Prior to this patch, the makePlan() interface does steps 1-5 in one go. This is arguably too much, although the overhead is not visible to users. Prior to this patch, the Start() interface re-did steps 2 to 5 again, and also performed step 6 for some statements, in particular DELETE which has a "fast path" for when there is no WHERE clause or index updates. Now EXPLAIN needs a full query plan to work properly, ie it needs to observe completion of step 5. This means EXPLAIN could not merely use makePlan() and had to call Start() to get the fully build query plan. But then because Start() also performs side effects (eg. for DELETE), this would cause EXPLAIN to do too much work, in particular delete data for DELETE. This is a bug. This patch addresses this issue by clearly separating: - the role of newPlan(), which is to achieve steps 1-3, in some cases step 4. - a new role played by a new interface planNode.expandPlan(), which is to achieve step 5, possibly re-running step 4, without ever starting step 6. - the role of Start(), which may start step 6. Using this, makePlan() invokes newPlan() then expandPlan(); EXPLAIN uses makePlan() and prepare() uses only newPlan().
@RaduBerinde thanks for the perspective. I think I agree; I will be careful not to push this further before we have more ample discussions on the topic. For now I'll merge this, and if we learn it was a terrible idea I'll volunteer to take the interfaces out later. |
##The makePlan() interface was not well designed, it had too much
responsibility. There are really 6 semantic phases on SQL statements:
The pgwire Prepare phase needs to do steps 1-3 only.
The EXPLAIN statement needs to do steps 1-5 only.
Prior to this patch, the makePlan() interface does steps 1-5 in one
go. This is arguably too much, although the overhead is not visible to
users.
Prior to this patch, the Start() interface re-did steps 2 to 5 again,
and also does step 6 for some statements, in particular DELETE which
has a "fast path" for when there is no WHERE clause or index updates.
Now EXPLAIN needs a full query plan to work properly, ie it needs to
observe completion of step 5. This means EXPLAIN could not merely use
makePlan() and had to call Start() to get the fully build query
plan. But then because Start() also performs side effects (eg. for
DELETE), this would cause EXPLAIN to do too much work, in particular
delete data for DELETE. This is a bug.
This patch addresses this issue by clearly separating:
to achieve step 5, possibly re-running step 4, without ever starting
step 6.
Then makePlan() invokes newPlan() and then expandPlan().
Fixes #6613.
This change is