sql: segregate query plan preparation in a new expandPlan interface. #6622

knz · 2016-05-11T11:28:33Z

##The makePlan() interface was not well designed, it had too much
responsibility. There are really 6 semantic phases on SQL statements:

structural syntax checking.
gathering the table descriptors and resolving qualified names.
type inference and type checking.
normalization and substituting placeholders with actual values.
building the query plan, including index selection.
running the query plan.

The pgwire Prepare phase needs to do steps 1-3 only.
The EXPLAIN statement needs to do steps 1-5 only.

Prior to this patch, the makePlan() interface does steps 1-5 in one
go. This is arguably too much, although the overhead is not visible to
users.

Prior to this patch, the Start() interface re-did steps 2 to 5 again,
and also does step 6 for some statements, in particular DELETE which
has a "fast path" for when there is no WHERE clause or index updates.

Now EXPLAIN needs a full query plan to work properly, ie it needs to
observe completion of step 5. This means EXPLAIN could not merely use
makePlan() and had to call Start() to get the fully build query
plan. But then because Start() also performs side effects (eg. for
DELETE), this would cause EXPLAIN to do too much work, in particular
delete data for DELETE. This is a bug.

This patch addresses this issue by clearly separating:

the role of newPlan(), which is to achieve steps 1-3, in some cases step 4.
a new role played by a new interface planNode.expandPlan(), which is
to achieve step 5, possibly re-running step 4, without ever starting
step 6.
the role of Start(), which may start step 6.

Then makePlan() invokes newPlan() and then expandPlan().

Fixes #6613.

This change is

knz · 2016-05-11T11:28:50Z

cc @nvanbenschoten @RaduBerinde.

RaduBerinde · 2016-05-11T13:20:36Z

The change is missing proper documentation for makePlan. We need a description of what exactly it returns (something that AFAICT is completely undefined at this point, even in the commit message), and we should mention the expectation that BuildPlan is to be called on the result.

Previously, knz (kena) wrote…

cc @nvanbenschoten @RaduBerinde.

Review status: 0 of 22 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.

sql/plan.go, line 431 [r3] (raw file):

  DebugValues() debugValues

  // BuildPlan finalizes type checking of placeholders and builds

"build a plan" completely ignores the fact that it is being run on a plan node/tree. The doc should explain how the plan it is run on changes.

sql/plan.go, line 434 [r3] (raw file):

  // the query plan. Returns an error if the initialization fails.
  // The SQL "prepare" phase, as well as the EXPLAIN statement, should
  // merely build the plan node(s) and call BuildPlan().

Related to the above - "build the plan node(s)" here is not really defined. We are saying that BuildPlan builds the plan..

Comments from Reviewable

knz · 2016-05-11T15:08:33Z

@RaduBerinde agreed. I have improved the documentation in the comments accordingly. Also the interface is now named "FinalizePlan" for additional clarity.

Previously, RaduBerinde wrote…

The change is missing proper documentation for makePlan. We need a description of what exactly it returns (something that AFAICT is completely undefined at this point, even in the commit message), and we should mention the expectation that BuildPlan is to be called on the result.

Review status: 0 of 27 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.

sql/plan.go, line 431 [r3] (raw file):

Previously, RaduBerinde wrote…

"build a plan" completely ignores the fact that it is being run on a plan node/tree. The doc should explain how the plan it is run on changes.

Agreed. See the new comments.

sql/plan.go, line 434 [r3] (raw file):

Previously, RaduBerinde wrote…

Related to the above - "build the plan node(s)" here is not really defined. We are saying that BuildPlan builds the plan..

Agreed. See the new comments.

Comments from Reviewable

andreimatei · 2016-05-11T16:15:28Z

I suggest we always rebase on a fresh master when exporting PRs with many commits like this, otherwise it's hard to tell which commits are obsolete and which aren't. (if you rebase on a new master, all the current commits are gonna be a suffix)

I understand why we need to have a way to perform just 1-3 (for prepare), but why does Finalize need to be a method on a node? Why can't it be embedded in makePlan (but not in planner.prepare())?

Previously, knz (kena) wrote…

@RaduBerinde agreed. I have improved the documentation in the comments accordingly. Also the interface is now named "FinalizePlan" for additional clarity.

Review status: 0 of 27 files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.

sql/database.go, line 95 [r6] (raw file):

// DatabaseAccessor provides helper methods for using SQL database descriptors.
type DatabaseAccessor interface {

I don't know if I like all these new interfaces? What's the point, if they're still implemented by planner? You still have to look in a bunch of files to see all the things that the does. I don't think they help with segmenting anything off, they could be replaced with comments.
How about making these interfaces actual structs, sub-objects of the planner? Then you could say you've done something to split responsibilities better.

sql/plan.go, line 85 [r6] (raw file):

}

type planNodeFastPath interface {

can you hint as to who's gonna implement this and how it's used? Can't it be part of planNode?

sql/plan.go, line 25 [r9] (raw file):

)

type planMaker interface {

again, I don't know if this interface accomplishes anything. But it'd be nice if these two methods were isolated to some singleton (which should be called planner or planMaker, different from everything else on the current planner that's used during query execution.

sql/plan.go, line 39 [r9] (raw file):

  // Note: The autoCommit parameter enables operations to enable the
  // 1PC optimization. This is a bit hackish/preliminary at present.
  makePlan(stmt parser.Statement, desiredTypes []parser.Datum, autoCommit bool) (planNode, error)

can you suggest why FinalizePlan() is not part of makePlan()? Is it because "prepare" doesn't need it?
If so, it seems that autoCommit should be passed to FinalizedPlan() and not to this guy? Or even to Start()?
Or does it have something to do with subqueries? If so, is there a good reason why we can't handle subqueries during makePlan but we can during Finalize?

sql/plan.go, line 39 [r9] (raw file):

  // Note: The autoCommit parameter enables operations to enable the
  // 1PC optimization. This is a bit hackish/preliminary at present.
  makePlan(stmt parser.Statement, desiredTypes []parser.Datum, autoCommit bool) (planNode, error)

can we please add a comment for desiredTypes

sql/plan.go, line 46 [r9] (raw file):

  // will do just enough work to check the structural validity of the
  // SQL statement and determine types for placeholders. However it is
  // not appropriate to call BuildPlan(), Next() or Values() on a plan

s/BuildPlan/FinalizePlan ?

sql/planner.go, line 70 [r6] (raw file):

// Planner abstracts the services provided by a planner object
// to the other SQL front-end components.
type Planner interface {

what's up with this (new?) interface? Why do we need it / why does it need to be public?

Comments from Reviewable

nvanbenschoten · 2016-05-11T16:15:35Z

Reviewed 1 of 1 files at r1, 22 of 22 files at r2, 2 of 2 files at r3, 23 of 23 files at r4, 1 of 1 files at r5, 5 of 5 files at r6, 3 of 26 files at r8, 24 of 24 files at r9.
Review status: all files reviewed at latest revision, 14 unresolved discussions, all commit checks successful.

sql/database.go, line 91 [r6] (raw file):

  nameKey = sqlbase.MakeNameMetadataKey(keys.RootNamespaceID, dbDesc.GetName())
  descKey = sqlbase.MakeDescMetadataKey(dbDesc.ID)
  return

"naked" returns are generally discouraged in Go unless there is a particularly good reason for them.

sql/limit.go, line 98 [r5] (raw file):

type limitNode struct {
  plan      planNode

Why un-embed the planNode here and in a few other places if you're still going to defer primarily to the wrapped planNode?

sql/plan.go, line 438 [r2] (raw file):

  // Start begins the processing of the query/statement and starts
  // performing side effects for data-modifying statements. Returns an

I like how this comment specifies that Start will have side effects. You might want to add to the contract above that the others will not have side effects.

sql/table.go, line 101 [r6] (raw file):

// SchemaAccessor provides helper methods for using the SQL schema.
type SchemaAccessor interface {

What is the point of defining these interfaces if they are never used for indirection? Just to logically group behavior together?

sql/update.go, line 292 [r9] (raw file):

  // Create the workhorse select clause for rows that need updating.
  // TODO(knz): See comment above in Update().

Will this change enable us to clean this second SelectClause as well?

Comments from Reviewable

knz · 2016-05-11T16:28:05Z

There are only 2 commits in this PR. I will try rebasing to clean up the mess as you suggest.

Previously, andreimatei (Andrei Matei) wrote…

I suggest we always rebase on a fresh master when exporting PRs with many commits like this, otherwise it's hard to tell which commits are obsolete and which aren't. (if you rebase on a new master, all the current commits are gonna be a suffix)

I understand why we need to have a way to perform just 1-3 (for prepare), but why does Finalize need to be a method on a node? Why can't it be embedded in makePlan (but not in planner.prepare())?

Review status: all files reviewed at latest revision, 14 unresolved discussions, all commit checks successful.

sql/database.go, line 91 [r6] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"naked" returns are generally discouraged in Go unless there is a particularly good reason for them.

I'll queue this for a separate commit. This commit is only rearranging comments/methods and does not change any code. Let's keep it like that.

sql/database.go, line 95 [r6] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I don't know if I like all these new interfaces? What's the point, if they're still implemented by planner? You still have to look in a bunch of files to see all the things that the does. I don't think they help with segmenting anything off, they could be replaced with comments.
How about making these interfaces actual structs, sub-objects of the planner? Then you could say you've done something to split responsibilities better.

The interface is introduced purely for documentation purposes. It helps by grouping all the method comments in one place so they can be groked more quickly in the code editor. It has nothing to do with code semantics.

sql/limit.go, line 98 [r5] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why un-embed the planNode here and in a few other places if you're still going to defer primarily to the wrapped planNode?

Because it makes the code more regular and easier to read.

sql/plan.go, line 438 [r2] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I like how this comment specifies that Start will have side effects. You might want to add to the contract above that the others will not have side effects.

I think this is amply documented by the existing comments already.

sql/plan.go, line 85 [r6] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can you hint as to who's gonna implement this and how it's used? Can't it be part of planNode?

Done.

sql/plan.go, line 25 [r9] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

again, I don't know if this interface accomplishes anything. But it'd be nice if these two methods were isolated to some singleton (which should be called planner or planMaker, different from everything else on the current planner that's used during query execution.

This interface is introduced purely for documentation purposes. A later PR may push this agenda further doing what you suggest.

sql/plan.go, line 39 [r9] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can you suggest why FinalizePlan() is not part of makePlan()? Is it because "prepare" doesn't need it?
If so, it seems that autoCommit should be passed to FinalizedPlan() and not to this guy? Or even to Start()?
Or does it have something to do with subqueries? If so, is there a good reason why we can't handle subqueries during makePlan but we can during Finalize?

You are asking too many questions :)
Why FinalizePlan() is not part of makePlan() -> because prepare does not need it, indeed.

autoCommit can be moved, you are right, but this would belong to a different commit (which I could implement as a followup, I'll see).

sql/plan.go, line 39 [r9] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can we please add a comment for desiredTypes

Done.

sql/plan.go, line 46 [r9] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/BuildPlan/FinalizePlan ?

Done.

sql/planner.go, line 70 [r6] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

what's up with this (new?) interface? Why do we need it / why does it need to be public?

it does not need to be public. However I cannot use the name "planner" in lowercase because the struct already has it. Feel free to suggest a better name.

The interface is otherwise introduced purely for documentation purposes.

sql/table.go, line 101 [r6] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What is the point of defining these interfaces if they are never used for indirection? Just to logically group behavior together?

The interface is introduced purely for documentation purposes. It groups all the method comments in one place, making them easy to read in the code editor.

sql/update.go, line 292 [r9] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Will this change enable us to clean this second SelectClause as well?

Yeah basically all this is a prerequisite for this other PR I have in the stove already (#6594)

Comments from Reviewable

RaduBerinde · 2016-05-11T17:06:55Z

Review status: 0 of 27 files reviewed at latest revision, 17 unresolved discussions, all commit checks successful.

sql/plan.go, line 60 [r14] (raw file):

  //
  // Available after makePlan(). (FIXME: also prepare()? )
  Columns() []ResultColumn

index selection could affect Columns() if we are looking at an internal node. E.g. SELECT y .. ORDER BY x. If we use an index on x, the selectNode won't need to return x. I'm not sure if we are referring here to just the root of the tree, which indeed I don't think can change after makePlan.

sql/plan.go, line 64 [r14] (raw file):

  // The indexes of the columns the output is ordered by.
  //
  // Available after makePlan(). (FIXME: also prepare()? )

This may depend on the index, so it's not really available until after FinalizePlan. Or maybe it is available but with the caveat that it might change.

sql/plan.go, line 91 [r14] (raw file):

  //
  // Available after makePlan().
  FinalizePlan() error

Thanks for the new comments! They look great. Also agree with the FinalizePlan name.

One suggestion: could we reorder the functions according to how "early" they are available? First FinalizePlan then Columns' (?), 'Ordering', 'Explain*, then 'MarkDebug', 'SetLimitHint, thenStartthenNext, thenValues/Debug`.

Comments from Reviewable

knz · 2016-05-11T17:16:17Z

I changed the interface name again, this time to expandPlan(), thanks to a suggestion by @petermattis that the word "Finalize" reminds the reader of finalizers and suggests the method is to be called after the node is not used any more.

Also I implemented @andreimatei's suggestions to make expandPlan() invisible to the users of makePlan().

Previously, knz (kena) wrote…

There are only 2 commits in this PR. I will try rebasing to clean up the mess as you suggest.

Review status: 0 of 27 files reviewed at latest revision, 14 unresolved discussions, some commit checks pending.

sql/plan.go, line 60 [r14] (raw file):

Previously, RaduBerinde wrote…

index selection could affect Columns() if we are looking at an internal node. E.g. SELECT y .. ORDER BY x. If we use an index on x, the selectNode won't need to return x. I'm not sure if we are referring here to just the root of the tree, which indeed I don't think can change after makePlan.

Done.

sql/plan.go, line 64 [r14] (raw file):

Previously, RaduBerinde wrote…

This may depend on the index, so it's not really available until after FinalizePlan. Or maybe it is available but with the caveat that it might change.

Done.

sql/plan.go, line 91 [r14] (raw file):

Previously, RaduBerinde wrote…

Thanks for the new comments! They look great. Also agree with the FinalizePlan name.

One suggestion: could we reorder the functions according to how "early" they are available? First FinalizePlan then Columns' (?), 'Ordering', 'Explain*, then 'MarkDebug', 'SetLimitHint, thenStartthenNext, thenValues/Debug`.

Done.

Comments from Reviewable

RaduBerinde · 2016-05-11T17:21:35Z

Sshould we review only the last 2 of the 5 commits? Are the other three part of another PR (or they will be)?

Previously, knz (kena) wrote…

There are only 2 commits in this PR. I will try rebasing to clean up the mess as you suggest.

Review status: 0 of 27 files reviewed at latest revision, 10 unresolved discussions, some commit checks pending.

sql/plan.go, line 58 [r20] (raw file):

  // the result column types.
  //
  // Available after newPlan().

newPlan isn't defined in these interfaces.. Even in the code it's not really documented, it just says "newPlan constructs a planNode from a statement"

Comments from Reviewable

knz · 2016-05-11T17:28:52Z

No actually all these commits belong here. You can review them separately though.

Previously, RaduBerinde wrote…

Sshould we review only the last 2 of the 5 commits? Are the other three part of another PR (or they will be)?

Review status: 0 of 27 files reviewed at latest revision, 9 unresolved discussions, some commit checks pending.

sql/plan.go, line 58 [r20] (raw file):

Previously, RaduBerinde wrote…

newPlan isn't defined in these interfaces.. Even in the code it's not really documented, it just says "newPlan constructs a planNode from a statement"

Done.

Comments from Reviewable

RaduBerinde · 2016-05-11T17:31:52Z

Ok cool.

Previously, knz (kena) wrote…

No actually all these commits belong here. You can review them separately though.

Comments from Reviewable

RaduBerinde · 2016-05-12T13:39:40Z

Previously, RaduBerinde wrote…

Ok cool.

Reviewed 1 of 27 files at r15, 3 of 5 files at r25, 23 of 23 files at r26, 1 of 1 files at r27.
Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks pending.

sql/plan.go, line 45 [r26] (raw file):

  // will do just enough work to check the structural validity of the
  // SQL statement and determine types for placeholders. However it is
  // not appropriate to call BuildPlan(), Next() or Values() on a plan

s/BuildPlan/expandPlan/

Comments from Reviewable

knz · 2016-05-12T13:43:57Z

Review status: 1 of 27 files reviewed at latest revision, 8 unresolved discussions, some commit checks pending.

sql/plan.go, line 45 [r26] (raw file):

Previously, RaduBerinde wrote…

s/BuildPlan/expandPlan/

Done.

Comments from Reviewable

tamird · 2016-05-12T14:24:19Z

Please include rationale in the following commit messages:

Make distinctNode a fully-fledged planNode.
nMake the limit node a fully fledged planNode.

The interfaces-for-planner commit is incredibly hard to review. Is it all code movement? the commit message suggests there is some refactoring involved, but the size of the diff makes it quite hard to spot the relevant parts. I'm also 👎 on that commit generally for reasons pointed out by @andreimatei.

Previously, RaduBerinde wrote…

:lgtm:

Reviewed 1 of 27 files at r15, 26 of 26 files at r28, 1 of 1 files at r29, 2 of 2 files at r30, 1 of 1 files at r31, 5 of 5 files at r32, 23 of 23 files at r33.
Review status: all files reviewed at latest revision, 13 unresolved discussions, some commit checks failed.

sql/empty.go, line 39 [r32] (raw file):

func (*emptyNode) MarkDebug(_ explainMode)             {}

// func (*emptyNode) BuildPlan() error                    { return nil }

nit: remove the addition of this code/comment from whichever commit added it

sql/limit.go, line 108 [r32] (raw file):

func (n *limitNode) ExplainTypes(f func(string, string)) { n.plan.ExplainTypes(f) }

// func (n *limitNode) BuildPlan() error                    { return n.plan.BuildPlan() }

ditto

sql/planner.go, line 70 [r6] (raw file):

Previously, knz (kena) wrote…

it does not need to be public. However I cannot use the name "planner" in lowercase because the struct already has it. Feel free to suggest a better name.

The interface is otherwise introduced purely for documentation purposes.

you could name it plannerI, but I still think these interfaces don't add value.

sql/planner.go, line 196 [r32] (raw file):

      return nil, err
  }
  //if err := plan.BuildPlan(); err != nil {

ditto

sql/planner.go, line 224 [r32] (raw file):

      return 0, err
  }
  //if err := plan.BuildPlan(); err != nil {

ditto

sql/testdata/delete, line 112 [r33] (raw file):


query ITT colnames
EXPLAIN DELETE FROM unindexed

what's being tested here?

Comments from Reviewable

knz · 2016-05-12T15:01:28Z

I have added the requested text in the commit messages. Also clarified the "interfaces-for-planner" commit that it does not contain any logic changes. I did not know that the word "refactoring" necessarily implies logic changes; I removed that word entirely from the commit summary.

I emphasize with the fact this is somewhat harder to review and I am willing to make steps to alleviate that.

Meanwhile, introducing interfaces purely for documentation purposes has net benefits (improving code navigability and grokability), zero performance impact, so ... I am strongly in favor of keeping this in and even extending the practice to other places. I gather that this looks & feels new to you but I still haven't heard arguments against.

Previously, tamird (Tamir Duberstein) wrote…

Please include rationale in the following commit messages:

Make distinctNode a fully-fledged planNode.

nMake the limit node a fully fledged planNode.

The interfaces-for-planner commit is incredibly hard to review. Is it all code movement? the commit message suggests there is some refactoring involved, but the size of the diff makes it quite hard to spot the relevant parts. I'm also 👎 on that commit generally for reasons pointed out by @andreimatei.

Review status: all files reviewed at latest revision, 12 unresolved discussions, some commit checks failed.

sql/empty.go, line 39 [r32] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: remove the addition of this code/comment from whichever commit added it

Done.

sql/limit.go, line 108 [r32] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

ditto

Done.

sql/planner.go, line 70 [r6] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

you could name it plannerI, but I still think these interfaces don't add value.

ok I went for queryRunner which is closer to what this does anyways.

sql/planner.go, line 196 [r32] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

ditto

Done.

sql/planner.go, line 224 [r32] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

ditto

Done.

sql/testdata/delete, line 112 [r33] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

what's being tested here?

Added a comment to clarify.

Comments from Reviewable

tamird · 2016-05-12T15:10:07Z

Well, the primary argument against is the presence of more code which is effectively dead. Worse, the presence of this code may encourage people to use it, which would have a performance impact.

Previously, knz (kena) wrote…

I have added the requested text in the commit messages. Also clarified the "interfaces-for-planner" commit that it does not contain any logic changes. I did not know that the word "refactoring" necessarily implies logic changes; I removed that word entirely from the commit summary.

I emphasize with the fact this is somewhat harder to review and I am willing to make steps to alleviate that.

Meanwhile, introducing interfaces purely for documentation purposes has net benefits (improving code navigability and grokability), zero performance impact, so ... I am strongly in favor of keeping this in and even extending the practice to other places. I gather that this looks & feels new to you but I still haven't heard arguments against.

Reviewed 3 of 5 files at r37, 23 of 23 files at r38.
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.

Comments from Reviewable

knz · 2016-05-12T15:16:19Z

The code is not dead because the next 2 lines assert that &planner{}
implement the interface. Should either the interface or the method
definitions change, the assertion would fail.

Then about using the interface instead of the object... I'm not sure
what the argument really is here. To me 1) it clearly does not make
sense to use this interface now 2) even if it did, the performance
impact of the interface indirection on the planner would be negligible
compared to that of, say, Statement or Expr 3) at some point in the
future we may want to split planner into multiple more independent
components, and it seems natural to me that whoever will make this step
will revisit these interfaces and refactor them towards their
refactoring accordingly.

What is the problem you are trying to prevent?

RaduBerinde · 2016-05-12T19:13:05Z

I don't have an objection to the interfaces in this case. It's not a common pattern but I see it as a "blueprint" for how to later split planner into multiple components. I like that the functions are all documented in one place (and it won't bitrot like a huge comment or document).

I would be weary of extending this practice though, as it has a few disadvantages:

you have to write prototypes twice
when you navigate code and jump to the implementation of a function, the documentation is not there (you have to go to the interface)
a first time reader will probably assume that there are multiple implementations to the interface.
it's reminiscent of C/C++ headers, not really consistent with go

Just my .02.

Previously, knz (kena) wrote…

The code is not dead because the next 2 lines assert that &planner{} implement the interface. Should either the interface or the method definitions change, the assertion would fail.

Then about using the interface instead of the object... I'm not sure what the argument really is here. To me 1) it clearly does not make sense to use this interface now 2) even if it did, the performance impact of the interface indirection on the planner would be negligible compared to that of, say, Statement or Expr 3) at some point in the future we may want to split planner into multiple more independent components, and it seems natural to me that whoever will make this step will revisit these interfaces and refactor them towards their refactoring accordingly.

What is the problem you are trying to prevent?

Review status: 4 of 27 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.

Comments from Reviewable

- makePlan() is used by the Executor and other places to translate a statement to a query plan. - newPlan() is used recursively by the plan node constructors for intermediate syntax nodes.

Prior to this patch distinctNode would (ab)use the Go implicit delegation syntax to automatically import the methods of its sub-node. This makes reading the code slightly less straightforward (nearly all other nodes use an explicit delegation, so this comes as an exception) and makes debugging objectively harder since the distinctNode disappears from the stack trace entirely for the methods so "inherited". This patch alleviates the issue by ensuring that distinctNode implements all the planNode interface itself.

Prior to this patch limitNode would (ab)use the Go implicit delegation syntax to automatically import the methods of its sub-node. This makes reading the code slightly less straightforward (nearly all other nodes use an explicit delegation, so this comes as an exception) and makes debugging objectively harder since the limitNode disappears from the stack trace entirely for the methods so "inherited". This patch alleviates the issue by ensuring that limitNode implements all the planNode interface itself.

NB: This patch only moves code around to different files, changes comments and adds non-functional interfaces (= no logic changes). The "planner" object in the sql package is a bit a showcase of the "God class" antipattern. This patch alleviates this situation by re-organizing the services of the planner in separate files that highlight its separate roles: - its role as manager of DB descriptors, via a DescriptorAccessor interface which documents all the related methods (in descriptor.go). - its role as accessor for database descriptors via a DatabaseAccessor interface which documents all the related methods (in database.go). - its role as accessor for table descriptors and table schema updates, via a SchemaAccessor interface which documents all the related methods (in table.go). The planner constructor and struct definition is moved away from `plan.go` into a new `planner.go`. `plan.go` remains to define the planNode interface and assert its implementation by the various concrete structs. The methods corresponding to each sub-interface are also moved to the primary source file for that interface.

The makePlan() interface was not well designed, it had too much responsibility. There are really 6 semantic phases on SQL statements: 1. structural syntax checking. 2. gathering the table descriptors and resolving qualified names. 3. type inference and type checking. 4. normalization and substituting placeholders with actual values. 5. building the query plan, including index selection. 6. running the query plan. The pgwire Prepare phase needs to do steps 1-3 only. The EXPLAIN statement needs to do steps 1-5 only. Prior to this patch, the makePlan() interface does steps 1-5 in one go. This is arguably too much, although the overhead is not visible to users. Prior to this patch, the Start() interface re-did steps 2 to 5 again, and also performed step 6 for some statements, in particular DELETE which has a "fast path" for when there is no WHERE clause or index updates. Now EXPLAIN needs a full query plan to work properly, ie it needs to observe completion of step 5. This means EXPLAIN could not merely use makePlan() and had to call Start() to get the fully build query plan. But then because Start() also performs side effects (eg. for DELETE), this would cause EXPLAIN to do too much work, in particular delete data for DELETE. This is a bug. This patch addresses this issue by clearly separating: - the role of newPlan(), which is to achieve steps 1-3, in some cases step 4. - a new role played by a new interface planNode.expandPlan(), which is to achieve step 5, possibly re-running step 4, without ever starting step 6. - the role of Start(), which may start step 6. Using this, makePlan() invokes newPlan() then expandPlan(); EXPLAIN uses makePlan() and prepare() uses only newPlan().

knz · 2016-05-12T21:39:26Z

@RaduBerinde thanks for the perspective. I think I agree; I will be careful not to push this further before we have more ample discussions on the topic. For now I'll merge this, and if we learn it was a terrible idea I'll volunteer to take the interfaces out later.

knz force-pushed the fix-start branch from 8ca6391 to cc9658a Compare May 11, 2016 11:32

knz force-pushed the fix-start branch from cc9658a to f93ce86 Compare May 11, 2016 15:05

knz force-pushed the fix-start branch 2 times, most recently from 92749ca to f3e7154 Compare May 11, 2016 15:15

knz changed the title ~~sql: segregate query plan preparation in a new BuildPlan interface.~~ sql: segregate query plan preparation in a new FinalizePlan interface. May 11, 2016

knz force-pushed the fix-start branch 2 times, most recently from bc79d4c to bfd9c35 Compare May 11, 2016 15:38

knz force-pushed the fix-start branch from bfd9c35 to 7dd6022 Compare May 11, 2016 16:29

knz force-pushed the fix-start branch from 7dd6022 to 41b364c Compare May 11, 2016 17:14

knz changed the title ~~sql: segregate query plan preparation in a new FinalizePlan interface.~~ sql: segregate query plan preparation in a new expandPlan interface. May 11, 2016

knz force-pushed the fix-start branch 3 times, most recently from d77f84d to 2064cdc Compare May 11, 2016 17:27

knz force-pushed the fix-start branch 2 times, most recently from c3820ee to fc19236 Compare May 12, 2016 13:38

knz force-pushed the fix-start branch 2 times, most recently from d63369d to 7230426 Compare May 12, 2016 13:43

knz force-pushed the fix-start branch from 7230426 to cadb77b Compare May 12, 2016 15:01

knz force-pushed the fix-start branch from cadb77b to 9e0c1b5 Compare May 12, 2016 15:19

knz added 6 commits May 12, 2016 20:20

Separate the makePlan() interface from the newPlan() constructor.

d78e57f

- makePlan() is used by the Executor and other places to translate a statement to a query plan. - newPlan() is used recursively by the plan node constructors for intermediate syntax nodes.

Move the empty planNode to its own source file.

8edc66b

knz force-pushed the fix-start branch from 9e0c1b5 to 2c6184c Compare May 12, 2016 20:25

knz merged commit e9be815 into cockroachdb:master May 12, 2016

knz deleted the fix-start branch May 12, 2016 21:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: segregate query plan preparation in a new expandPlan interface. #6622

sql: segregate query plan preparation in a new expandPlan interface. #6622

knz commented May 11, 2016 •

edited

Loading

knz commented May 11, 2016

RaduBerinde commented May 11, 2016

knz commented May 11, 2016

andreimatei commented May 11, 2016

nvanbenschoten commented May 11, 2016

knz commented May 11, 2016

RaduBerinde commented May 11, 2016

knz commented May 11, 2016

RaduBerinde commented May 11, 2016

knz commented May 11, 2016

RaduBerinde commented May 11, 2016

RaduBerinde commented May 12, 2016

knz commented May 12, 2016

tamird commented May 12, 2016

knz commented May 12, 2016

tamird commented May 12, 2016

knz commented May 12, 2016

RaduBerinde commented May 12, 2016

knz commented May 12, 2016

sql: segregate query plan preparation in a new expandPlan interface. #6622

sql: segregate query plan preparation in a new expandPlan interface. #6622

Conversation

knz commented May 11, 2016 • edited Loading

knz commented May 11, 2016

RaduBerinde commented May 11, 2016

knz commented May 11, 2016

andreimatei commented May 11, 2016

nvanbenschoten commented May 11, 2016

knz commented May 11, 2016

RaduBerinde commented May 11, 2016

knz commented May 11, 2016

RaduBerinde commented May 11, 2016

knz commented May 11, 2016

RaduBerinde commented May 11, 2016

RaduBerinde commented May 12, 2016

knz commented May 12, 2016

tamird commented May 12, 2016

knz commented May 12, 2016

tamird commented May 12, 2016

knz commented May 12, 2016

RaduBerinde commented May 12, 2016

knz commented May 12, 2016

knz commented May 11, 2016 •

edited

Loading