Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: segregate query plan preparation in a new expandPlan interface. #6622

Merged
merged 6 commits into from
May 12, 2016

Conversation

knz
Copy link
Contributor

@knz knz commented May 11, 2016

##The makePlan() interface was not well designed, it had too much
responsibility. There are really 6 semantic phases on SQL statements:

  1. structural syntax checking.
  2. gathering the table descriptors and resolving qualified names.
  3. type inference and type checking.
  4. normalization and substituting placeholders with actual values.
  5. building the query plan, including index selection.
  6. running the query plan.

The pgwire Prepare phase needs to do steps 1-3 only.
The EXPLAIN statement needs to do steps 1-5 only.

Prior to this patch, the makePlan() interface does steps 1-5 in one
go. This is arguably too much, although the overhead is not visible to
users.

Prior to this patch, the Start() interface re-did steps 2 to 5 again,
and also does step 6 for some statements, in particular DELETE which
has a "fast path" for when there is no WHERE clause or index updates.

Now EXPLAIN needs a full query plan to work properly, ie it needs to
observe completion of step 5. This means EXPLAIN could not merely use
makePlan() and had to call Start() to get the fully build query
plan. But then because Start() also performs side effects (eg. for
DELETE), this would cause EXPLAIN to do too much work, in particular
delete data for DELETE. This is a bug.

This patch addresses this issue by clearly separating:

  • the role of newPlan(), which is to achieve steps 1-3, in some cases step 4.
  • a new role played by a new interface planNode.expandPlan(), which is
    to achieve step 5, possibly re-running step 4, without ever starting
    step 6.
  • the role of Start(), which may start step 6.

Then makePlan() invokes newPlan() and then expandPlan().

Fixes #6613.


This change is Reviewable

@knz
Copy link
Contributor Author

knz commented May 11, 2016

cc @nvanbenschoten @RaduBerinde.

@RaduBerinde
Copy link
Member

The change is missing proper documentation for makePlan. We need a description of what exactly it returns (something that AFAICT is completely undefined at this point, even in the commit message), and we should mention the expectation that BuildPlan is to be called on the result.

Previously, knz (kena) wrote…

cc @nvanbenschoten @RaduBerinde.


Review status: 0 of 22 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


sql/plan.go, line 431 [r3] (raw file):

  DebugValues() debugValues

  // BuildPlan finalizes type checking of placeholders and builds

"build a plan" completely ignores the fact that it is being run on a plan node/tree. The doc should explain how the plan it is run on changes.


sql/plan.go, line 434 [r3] (raw file):

  // the query plan. Returns an error if the initialization fails.
  // The SQL "prepare" phase, as well as the EXPLAIN statement, should
  // merely build the plan node(s) and call BuildPlan().

Related to the above - "build the plan node(s)" here is not really defined. We are saying that BuildPlan builds the plan..


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented May 11, 2016

@RaduBerinde agreed. I have improved the documentation in the comments accordingly. Also the interface is now named "FinalizePlan" for additional clarity.

Previously, RaduBerinde wrote…

The change is missing proper documentation for makePlan. We need a description of what exactly it returns (something that AFAICT is completely undefined at this point, even in the commit message), and we should mention the expectation that BuildPlan is to be called on the result.


Review status: 0 of 27 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


sql/plan.go, line 431 [r3] (raw file):

Previously, RaduBerinde wrote…

"build a plan" completely ignores the fact that it is being run on a plan node/tree. The doc should explain how the plan it is run on changes.


Agreed. See the new comments.


sql/plan.go, line 434 [r3] (raw file):

Previously, RaduBerinde wrote…

Related to the above - "build the plan node(s)" here is not really defined. We are saying that BuildPlan builds the plan..


Agreed. See the new comments.


Comments from Reviewable

@knz knz force-pushed the fix-start branch 2 times, most recently from 92749ca to f3e7154 Compare May 11, 2016 15:15
@knz knz changed the title sql: segregate query plan preparation in a new BuildPlan interface. sql: segregate query plan preparation in a new FinalizePlan interface. May 11, 2016
@knz knz force-pushed the fix-start branch 2 times, most recently from bc79d4c to bfd9c35 Compare May 11, 2016 15:38
@andreimatei
Copy link
Contributor

I suggest we always rebase on a fresh master when exporting PRs with many commits like this, otherwise it's hard to tell which commits are obsolete and which aren't. (if you rebase on a new master, all the current commits are gonna be a suffix)

I understand why we need to have a way to perform just 1-3 (for prepare), but why does Finalize need to be a method on a node? Why can't it be embedded in makePlan (but not in planner.prepare())?

Previously, knz (kena) wrote…

@RaduBerinde agreed. I have improved the documentation in the comments accordingly. Also the interface is now named "FinalizePlan" for additional clarity.


Review status: 0 of 27 files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.


sql/database.go, line 95 [r6] (raw file):

// DatabaseAccessor provides helper methods for using SQL database descriptors.
type DatabaseAccessor interface {

I don't know if I like all these new interfaces? What's the point, if they're still implemented by planner? You still have to look in a bunch of files to see all the things that the does. I don't think they help with segmenting anything off, they could be replaced with comments.
How about making these interfaces actual structs, sub-objects of the planner? Then you could say you've done something to split responsibilities better.


sql/plan.go, line 85 [r6] (raw file):

}

type planNodeFastPath interface {

can you hint as to who's gonna implement this and how it's used? Can't it be part of planNode?


sql/plan.go, line 25 [r9] (raw file):

)

type planMaker interface {

again, I don't know if this interface accomplishes anything. But it'd be nice if these two methods were isolated to some singleton (which should be called planner or planMaker, different from everything else on the current planner that's used during query execution.


sql/plan.go, line 39 [r9] (raw file):

  // Note: The autoCommit parameter enables operations to enable the
  // 1PC optimization. This is a bit hackish/preliminary at present.
  makePlan(stmt parser.Statement, desiredTypes []parser.Datum, autoCommit bool) (planNode, error)

can you suggest why FinalizePlan() is not part of makePlan()? Is it because "prepare" doesn't need it?
If so, it seems that autoCommit should be passed to FinalizedPlan() and not to this guy? Or even to Start()?
Or does it have something to do with subqueries? If so, is there a good reason why we can't handle subqueries during makePlan but we can during Finalize?


sql/plan.go, line 39 [r9] (raw file):

  // Note: The autoCommit parameter enables operations to enable the
  // 1PC optimization. This is a bit hackish/preliminary at present.
  makePlan(stmt parser.Statement, desiredTypes []parser.Datum, autoCommit bool) (planNode, error)

can we please add a comment for desiredTypes


sql/plan.go, line 46 [r9] (raw file):

  // will do just enough work to check the structural validity of the
  // SQL statement and determine types for placeholders. However it is
  // not appropriate to call BuildPlan(), Next() or Values() on a plan

s/BuildPlan/FinalizePlan ?


sql/planner.go, line 70 [r6] (raw file):

// Planner abstracts the services provided by a planner object
// to the other SQL front-end components.
type Planner interface {

what's up with this (new?) interface? Why do we need it / why does it need to be public?


Comments from Reviewable

@nvanbenschoten
Copy link
Member

Reviewed 1 of 1 files at r1, 22 of 22 files at r2, 2 of 2 files at r3, 23 of 23 files at r4, 1 of 1 files at r5, 5 of 5 files at r6, 3 of 26 files at r8, 24 of 24 files at r9.
Review status: all files reviewed at latest revision, 14 unresolved discussions, all commit checks successful.


sql/database.go, line 91 [r6] (raw file):

  nameKey = sqlbase.MakeNameMetadataKey(keys.RootNamespaceID, dbDesc.GetName())
  descKey = sqlbase.MakeDescMetadataKey(dbDesc.ID)
  return

"naked" returns are generally discouraged in Go unless there is a particularly good reason for them.


sql/limit.go, line 98 [r5] (raw file):

type limitNode struct {
  plan      planNode

Why un-embed the planNode here and in a few other places if you're still going to defer primarily to the wrapped planNode?


sql/plan.go, line 438 [r2] (raw file):

  // Start begins the processing of the query/statement and starts
  // performing side effects for data-modifying statements. Returns an

I like how this comment specifies that Start will have side effects. You might want to add to the contract above that the others will not have side effects.


sql/table.go, line 101 [r6] (raw file):

// SchemaAccessor provides helper methods for using the SQL schema.
type SchemaAccessor interface {

What is the point of defining these interfaces if they are never used for indirection? Just to logically group behavior together?


sql/update.go, line 292 [r9] (raw file):

  // Create the workhorse select clause for rows that need updating.
  // TODO(knz): See comment above in Update().

Will this change enable us to clean this second SelectClause as well?


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented May 11, 2016

There are only 2 commits in this PR. I will try rebasing to clean up the mess as you suggest.

Previously, andreimatei (Andrei Matei) wrote…

I suggest we always rebase on a fresh master when exporting PRs with many commits like this, otherwise it's hard to tell which commits are obsolete and which aren't. (if you rebase on a new master, all the current commits are gonna be a suffix)

I understand why we need to have a way to perform just 1-3 (for prepare), but why does Finalize need to be a method on a node? Why can't it be embedded in makePlan (but not in planner.prepare())?


Review status: all files reviewed at latest revision, 14 unresolved discussions, all commit checks successful.


sql/database.go, line 91 [r6] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"naked" returns are generally discouraged in Go unless there is a particularly good reason for them.


I'll queue this for a separate commit. This commit is only rearranging comments/methods and does not change any code. Let's keep it like that.


sql/database.go, line 95 [r6] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I don't know if I like all these new interfaces? What's the point, if they're still implemented by planner? You still have to look in a bunch of files to see all the things that the does. I don't think they help with segmenting anything off, they could be replaced with comments.
How about making these interfaces actual structs, sub-objects of the planner? Then you could say you've done something to split responsibilities better.


The interface is introduced purely for documentation purposes. It helps by grouping all the method comments in one place so they can be groked more quickly in the code editor. It has nothing to do with code semantics.


sql/limit.go, line 98 [r5] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why un-embed the planNode here and in a few other places if you're still going to defer primarily to the wrapped planNode?


Because it makes the code more regular and easier to read.


sql/plan.go, line 438 [r2] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I like how this comment specifies that Start will have side effects. You might want to add to the contract above that the others will not have side effects.


I think this is amply documented by the existing comments already.


sql/plan.go, line 85 [r6] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can you hint as to who's gonna implement this and how it's used? Can't it be part of planNode?


Done.


sql/plan.go, line 25 [r9] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

again, I don't know if this interface accomplishes anything. But it'd be nice if these two methods were isolated to some singleton (which should be called planner or planMaker, different from everything else on the current planner that's used during query execution.


This interface is introduced purely for documentation purposes. A later PR may push this agenda further doing what you suggest.


sql/plan.go, line 39 [r9] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can you suggest why FinalizePlan() is not part of makePlan()? Is it because "prepare" doesn't need it?
If so, it seems that autoCommit should be passed to FinalizedPlan() and not to this guy? Or even to Start()?
Or does it have something to do with subqueries? If so, is there a good reason why we can't handle subqueries during makePlan but we can during Finalize?


You are asking too many questions :)
Why FinalizePlan() is not part of makePlan() -> because prepare does not need it, indeed.

autoCommit can be moved, you are right, but this would belong to a different commit (which I could implement as a followup, I'll see).


sql/plan.go, line 39 [r9] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can we please add a comment for desiredTypes


Done.


sql/plan.go, line 46 [r9] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/BuildPlan/FinalizePlan ?


Done.


sql/planner.go, line 70 [r6] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

what's up with this (new?) interface? Why do we need it / why does it need to be public?


it does not need to be public. However I cannot use the name "planner" in lowercase because the struct already has it. Feel free to suggest a better name.

The interface is otherwise introduced purely for documentation purposes.


sql/table.go, line 101 [r6] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What is the point of defining these interfaces if they are never used for indirection? Just to logically group behavior together?


The interface is introduced purely for documentation purposes. It groups all the method comments in one place, making them easy to read in the code editor.


sql/update.go, line 292 [r9] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Will this change enable us to clean this second SelectClause as well?


Yeah basically all this is a prerequisite for this other PR I have in the stove already (#6594)


Comments from Reviewable

@RaduBerinde
Copy link
Member

Review status: 0 of 27 files reviewed at latest revision, 17 unresolved discussions, all commit checks successful.


sql/plan.go, line 60 [r14] (raw file):

  //
  // Available after makePlan(). (FIXME: also prepare()? )
  Columns() []ResultColumn

index selection could affect Columns() if we are looking at an internal node. E.g. SELECT y .. ORDER BY x. If we use an index on x, the selectNode won't need to return x. I'm not sure if we are referring here to just the root of the tree, which indeed I don't think can change after makePlan.


sql/plan.go, line 64 [r14] (raw file):

  // The indexes of the columns the output is ordered by.
  //
  // Available after makePlan(). (FIXME: also prepare()? )

This may depend on the index, so it's not really available until after FinalizePlan. Or maybe it is available but with the caveat that it might change.


sql/plan.go, line 91 [r14] (raw file):

  //
  // Available after makePlan().
  FinalizePlan() error

Thanks for the new comments! They look great. Also agree with the FinalizePlan name.

One suggestion: could we reorder the functions according to how "early" they are available? First FinalizePlan then Columns' (?), 'Ordering', 'Explain*, then 'MarkDebug', 'SetLimitHint, thenStartthenNext, thenValues/Debug`.


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented May 11, 2016

I changed the interface name again, this time to expandPlan(), thanks to a suggestion by @petermattis that the word "Finalize" reminds the reader of finalizers and suggests the method is to be called after the node is not used any more.

Also I implemented @andreimatei's suggestions to make expandPlan() invisible to the users of makePlan().

Previously, knz (kena) wrote…

There are only 2 commits in this PR. I will try rebasing to clean up the mess as you suggest.


Review status: 0 of 27 files reviewed at latest revision, 14 unresolved discussions, some commit checks pending.


sql/plan.go, line 60 [r14] (raw file):

Previously, RaduBerinde wrote…

index selection could affect Columns() if we are looking at an internal node. E.g. SELECT y .. ORDER BY x. If we use an index on x, the selectNode won't need to return x. I'm not sure if we are referring here to just the root of the tree, which indeed I don't think can change after makePlan.


Done.


sql/plan.go, line 64 [r14] (raw file):

Previously, RaduBerinde wrote…

This may depend on the index, so it's not really available until after FinalizePlan. Or maybe it is available but with the caveat that it might change.


Done.


sql/plan.go, line 91 [r14] (raw file):

Previously, RaduBerinde wrote…

Thanks for the new comments! They look great. Also agree with the FinalizePlan name.

One suggestion: could we reorder the functions according to how "early" they are available? First FinalizePlan then Columns' (?), 'Ordering', 'Explain*, then 'MarkDebug', 'SetLimitHint, thenStartthenNext, thenValues/Debug`.


Done.


Comments from Reviewable

@knz knz changed the title sql: segregate query plan preparation in a new FinalizePlan interface. sql: segregate query plan preparation in a new expandPlan interface. May 11, 2016
@RaduBerinde
Copy link
Member

Sshould we review only the last 2 of the 5 commits? Are the other three part of another PR (or they will be)?

Previously, knz (kena) wrote…

There are only 2 commits in this PR. I will try rebasing to clean up the mess as you suggest.


Review status: 0 of 27 files reviewed at latest revision, 10 unresolved discussions, some commit checks pending.


sql/plan.go, line 58 [r20] (raw file):

  // the result column types.
  //
  // Available after newPlan().

newPlan isn't defined in these interfaces.. Even in the code it's not really documented, it just says "newPlan constructs a planNode from a statement"


Comments from Reviewable

@knz knz force-pushed the fix-start branch 3 times, most recently from d77f84d to 2064cdc Compare May 11, 2016 17:27
@knz
Copy link
Contributor Author

knz commented May 11, 2016

No actually all these commits belong here. You can review them separately though.

Previously, RaduBerinde wrote…

Sshould we review only the last 2 of the 5 commits? Are the other three part of another PR (or they will be)?


Review status: 0 of 27 files reviewed at latest revision, 9 unresolved discussions, some commit checks pending.


sql/plan.go, line 58 [r20] (raw file):

Previously, RaduBerinde wrote…

newPlan isn't defined in these interfaces.. Even in the code it's not really documented, it just says "newPlan constructs a planNode from a statement"


Done.


Comments from Reviewable

@RaduBerinde
Copy link
Member

Ok cool.

Previously, knz (kena) wrote…

No actually all these commits belong here. You can review them separately though.


Comments from Reviewable

@knz knz force-pushed the fix-start branch 2 times, most recently from c3820ee to fc19236 Compare May 12, 2016 13:38
@RaduBerinde
Copy link
Member

:lgtm:

Previously, RaduBerinde wrote…

Ok cool.


Reviewed 1 of 27 files at r15, 3 of 5 files at r25, 23 of 23 files at r26, 1 of 1 files at r27.
Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks pending.


sql/plan.go, line 45 [r26] (raw file):

  // will do just enough work to check the structural validity of the
  // SQL statement and determine types for placeholders. However it is
  // not appropriate to call BuildPlan(), Next() or Values() on a plan

s/BuildPlan/expandPlan/


Comments from Reviewable

@knz knz force-pushed the fix-start branch 2 times, most recently from d63369d to 7230426 Compare May 12, 2016 13:43
@knz
Copy link
Contributor Author

knz commented May 12, 2016

Review status: 1 of 27 files reviewed at latest revision, 8 unresolved discussions, some commit checks pending.


sql/plan.go, line 45 [r26] (raw file):

Previously, RaduBerinde wrote…

s/BuildPlan/expandPlan/


Done.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented May 12, 2016

Please include rationale in the following commit messages:

  • Make distinctNode a fully-fledged planNode.
  • nMake the limit node a fully fledged planNode.

The interfaces-for-planner commit is incredibly hard to review. Is it all code movement? the commit message suggests there is some refactoring involved, but the size of the diff makes it quite hard to spot the relevant parts. I'm also 👎 on that commit generally for reasons pointed out by @andreimatei.

Previously, RaduBerinde wrote…

:lgtm:


Reviewed 1 of 27 files at r15, 26 of 26 files at r28, 1 of 1 files at r29, 2 of 2 files at r30, 1 of 1 files at r31, 5 of 5 files at r32, 23 of 23 files at r33.
Review status: all files reviewed at latest revision, 13 unresolved discussions, some commit checks failed.


sql/empty.go, line 39 [r32] (raw file):

func (*emptyNode) MarkDebug(_ explainMode)             {}

// func (*emptyNode) BuildPlan() error                    { return nil }

nit: remove the addition of this code/comment from whichever commit added it


sql/limit.go, line 108 [r32] (raw file):

func (n *limitNode) ExplainTypes(f func(string, string)) { n.plan.ExplainTypes(f) }

// func (n *limitNode) BuildPlan() error                    { return n.plan.BuildPlan() }

ditto


sql/planner.go, line 70 [r6] (raw file):

Previously, knz (kena) wrote…

it does not need to be public. However I cannot use the name "planner" in lowercase because the struct already has it. Feel free to suggest a better name.

The interface is otherwise introduced purely for documentation purposes.


you could name it plannerI, but I still think these interfaces don't add value.


sql/planner.go, line 196 [r32] (raw file):

      return nil, err
  }
  //if err := plan.BuildPlan(); err != nil {

ditto


sql/planner.go, line 224 [r32] (raw file):

      return 0, err
  }
  //if err := plan.BuildPlan(); err != nil {

ditto


sql/testdata/delete, line 112 [r33] (raw file):


query ITT colnames
EXPLAIN DELETE FROM unindexed

what's being tested here?


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented May 12, 2016

I have added the requested text in the commit messages. Also clarified the "interfaces-for-planner" commit that it does not contain any logic changes. I did not know that the word "refactoring" necessarily implies logic changes; I removed that word entirely from the commit summary.

I emphasize with the fact this is somewhat harder to review and I am willing to make steps to alleviate that.

Meanwhile, introducing interfaces purely for documentation purposes has net benefits (improving code navigability and grokability), zero performance impact, so ... I am strongly in favor of keeping this in and even extending the practice to other places. I gather that this looks & feels new to you but I still haven't heard arguments against.

Previously, tamird (Tamir Duberstein) wrote…

Please include rationale in the following commit messages:

  • Make distinctNode a fully-fledged planNode.
  • nMake the limit node a fully fledged planNode.

The interfaces-for-planner commit is incredibly hard to review. Is it all code movement? the commit message suggests there is some refactoring involved, but the size of the diff makes it quite hard to spot the relevant parts. I'm also 👎 on that commit generally for reasons pointed out by @andreimatei.


Review status: all files reviewed at latest revision, 12 unresolved discussions, some commit checks failed.


sql/empty.go, line 39 [r32] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: remove the addition of this code/comment from whichever commit added it


Done.


sql/limit.go, line 108 [r32] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

ditto


Done.


sql/planner.go, line 70 [r6] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

you could name it plannerI, but I still think these interfaces don't add value.


ok I went for queryRunner which is closer to what this does anyways.


sql/planner.go, line 196 [r32] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

ditto


Done.


sql/planner.go, line 224 [r32] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

ditto


Done.


sql/testdata/delete, line 112 [r33] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

what's being tested here?


Added a comment to clarify.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented May 12, 2016

Well, the primary argument against is the presence of more code which is effectively dead. Worse, the presence of this code may encourage people to use it, which would have a performance impact.

Previously, knz (kena) wrote…

I have added the requested text in the commit messages. Also clarified the "interfaces-for-planner" commit that it does not contain any logic changes. I did not know that the word "refactoring" necessarily implies logic changes; I removed that word entirely from the commit summary.

I emphasize with the fact this is somewhat harder to review and I am willing to make steps to alleviate that.

Meanwhile, introducing interfaces purely for documentation purposes has net benefits (improving code navigability and grokability), zero performance impact, so ... I am strongly in favor of keeping this in and even extending the practice to other places. I gather that this looks & feels new to you but I still haven't heard arguments against.


Reviewed 3 of 5 files at r37, 23 of 23 files at r38.
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented May 12, 2016

The code is not dead because the next 2 lines assert that &planner{}
implement the interface. Should either the interface or the method
definitions change, the assertion would fail.

Then about using the interface instead of the object... I'm not sure
what the argument really is here. To me 1) it clearly does not make
sense to use this interface now 2) even if it did, the performance
impact of the interface indirection on the planner would be negligible
compared to that of, say, Statement or Expr 3) at some point in the
future we may want to split planner into multiple more independent
components, and it seems natural to me that whoever will make this step
will revisit these interfaces and refactor them towards their
refactoring accordingly.

What is the problem you are trying to prevent?

@RaduBerinde
Copy link
Member

I don't have an objection to the interfaces in this case. It's not a common pattern but I see it as a "blueprint" for how to later split planner into multiple components. I like that the functions are all documented in one place (and it won't bitrot like a huge comment or document).

I would be weary of extending this practice though, as it has a few disadvantages:

  • you have to write prototypes twice
  • when you navigate code and jump to the implementation of a function, the documentation is not there (you have to go to the interface)
  • a first time reader will probably assume that there are multiple implementations to the interface.
  • it's reminiscent of C/C++ headers, not really consistent with go

Just my .02.

Previously, knz (kena) wrote…

The code is not dead because the next 2 lines assert that &planner{} implement the interface. Should either the interface or the method definitions change, the assertion would fail.

Then about using the interface instead of the object... I'm not sure what the argument really is here. To me 1) it clearly does not make sense to use this interface now 2) even if it did, the performance impact of the interface indirection on the planner would be negligible compared to that of, say, Statement or Expr 3) at some point in the future we may want to split planner into multiple more independent components, and it seems natural to me that whoever will make this step will revisit these interfaces and refactor them towards their refactoring accordingly.

What is the problem you are trying to prevent?


Review status: 4 of 27 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


Comments from Reviewable

knz added 6 commits May 12, 2016 20:20
- makePlan() is used by the Executor and other places to translate a
  statement to a query plan.

- newPlan() is used recursively by the plan node constructors for
  intermediate syntax nodes.
Prior to this patch distinctNode would (ab)use the Go implicit
delegation syntax to automatically import the methods of its
sub-node. This makes reading the code slightly less straightforward
(nearly all other nodes use an explicit delegation, so this comes as
an exception) and makes debugging objectively harder since the
distinctNode disappears from the stack trace entirely for the methods
so "inherited".

This patch alleviates the issue by ensuring that distinctNode
implements all the planNode interface itself.
Prior to this patch limitNode would (ab)use the Go implicit delegation
syntax to automatically import the methods of its sub-node. This makes
reading the code slightly less straightforward (nearly all other nodes
use an explicit delegation, so this comes as an exception) and makes
debugging objectively harder since the limitNode disappears from the
stack trace entirely for the methods so "inherited".

This patch alleviates the issue by ensuring that limitNode
implements all the planNode interface itself.
NB: This patch only moves code around to different files, changes
comments and adds non-functional interfaces (= no logic changes).

The "planner" object in the sql package is a bit a showcase of the
"God class" antipattern. This patch alleviates this situation by
re-organizing the services of the planner in separate files that
highlight its separate roles:

- its role as manager of DB descriptors, via a DescriptorAccessor
  interface which documents all the related methods (in
  descriptor.go).

- its role as accessor for database descriptors via a DatabaseAccessor
  interface which documents all the related methods (in database.go).

- its role as accessor for table descriptors and table schema updates,
  via a SchemaAccessor interface which documents all the related
  methods (in table.go).

The planner constructor and struct definition is moved away from
`plan.go` into a new `planner.go`. `plan.go` remains to define the
planNode interface and assert its implementation by the various
concrete structs.

The methods corresponding to each sub-interface are also moved to the
primary source file for that interface.
The makePlan() interface was not well designed, it had too much
responsibility. There are really 6 semantic phases on SQL statements:

1.  structural syntax checking.
2.  gathering the table descriptors and resolving qualified names.
3.  type inference and type checking.
4.  normalization and substituting placeholders with actual values.
5.  building the query plan, including index selection.
6.  running the query plan.

The pgwire Prepare phase needs to do steps 1-3 only.
The EXPLAIN statement needs to do steps 1-5 only.

Prior to this patch, the makePlan() interface does steps 1-5 in one
go. This is arguably too much, although the overhead is not visible to
users.

Prior to this patch, the Start() interface re-did steps 2 to 5 again,
and also performed step 6 for some statements, in particular DELETE
which has a "fast path" for when there is no WHERE clause or index
updates.

Now EXPLAIN needs a full query plan to work properly, ie it needs to
observe completion of step 5. This means EXPLAIN could not merely use
makePlan() and had to call Start() to get the fully build query
plan. But then because Start() also performs side effects (eg. for
DELETE), this would cause EXPLAIN to do too much work, in particular
delete data for DELETE. This is a bug.

This patch addresses this issue by clearly separating:

- the role of newPlan(), which is to achieve steps 1-3, in some cases step 4.

- a new role played by a new interface planNode.expandPlan(), which is
  to achieve step 5, possibly re-running step 4, without ever
  starting step 6.

- the role of Start(), which may start step 6.

Using this, makePlan() invokes newPlan() then expandPlan(); EXPLAIN
uses makePlan() and prepare() uses only newPlan().
@knz
Copy link
Contributor Author

knz commented May 12, 2016

@RaduBerinde thanks for the perspective. I think I agree; I will be careful not to push this further before we have more ample discussions on the topic. For now I'll merge this, and if we learn it was a terrible idea I'll volunteer to take the interfaces out later.

@knz knz merged commit e9be815 into cockroachdb:master May 12, 2016
@knz knz deleted the fix-start branch May 12, 2016 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants