Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upopt: Change Memo data structures #31173
Conversation
andy-kimball
requested review from
justinj,
rytaft and
RaduBerinde
Oct 10, 2018
andy-kimball
requested review from
cockroachdb/sql-execution-prs
as
code owners
Oct 10, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andy-kimball
requested review from
cockroachdb/admin-ui-prs
as
code owners
Oct 11, 2018
justinj
reviewed
Oct 11, 2018
I looked through optbuilder and execbuilder changes, mostly seems pretty mechanical as far as those two packages go. I'll look through more of the rest of the changes later.
Reviewed 8 of 92 files at r2, 9 of 23 files at r5, 6 of 21 files at r8, 7 of 197 files at r9, 11 of 22 files at r12.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/sql/opt/exec/execbuilder/relational_builder.go, line 398 at r8 (raw file):
} func (b *Builder) buildHashJoin(join memo.RelExpr) (execPlan, error) {
Did you consider making interfaces for these more specific subclasses of expressions, like this and GroupBys? Any reason you opted not to?
pkg/sql/opt/exec/execbuilder/testdata/explain, line 486 at r8 (raw file):
· row 1, expr 2 (6)[int] · · # TODO(rytaft): range operator needed to detect contradiction
is this test change a result of not having rebased yet?
pkg/sql/opt/optbuilder/scalar.go, line 110 at r8 (raw file):
els[i] = b.buildScalar(texpr, inScope, nil, nil, colRefs) } out = b.factory.ConstructArray(els, arrayType)
Did the fancy list storage not end up being a win in the long run? Or difficult to implement in this new representation?
pkg/sql/opt/optbuilder/scalar.go, line 160 at r8 (raw file):
b.factory.ConstructScalarGroupBy( s.node, memo.AggregationsExpr{{
Why is this not Constructed? Even if it's just a list underneath is that not an implementation detail?
pkg/sql/opt/optbuilder/testdata/aggregate, line 894 at r15 (raw file):
│ ├── scan kv │ │ └── columns: kv.k:5(int!null) kv.v:6(int) kv.w:7(int) kv.s:8(string) │ └── filters
nit (doesn't need to be part of this PR): it would probably be helpful if an empty filters was more obviously a constant True, this makes sense if you know that filters is basically and, but someone not familiar with opt inspecting these tests might be confused.
pkg/sql/opt/optbuilder/testdata/aggregate, line 2800 at r15 (raw file):
SELECT 123 r FROM kv ORDER BY max(v) ---- project
why did this plan change? is this a rebasing thing?
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
RaduBerinde
Oct 12, 2018
Member
Hi Andy,
I put an initial set of comments for the memo part here, as a diff (comments usually refer to the line below): https://gist.github.com/RaduBerinde/a3d238f44495b1822bc88fb77f6b1880
Feel free to comment in that gist, I think that's easier than interleaving all discussions in this PR.
|
Hi Andy, I put an initial set of comments for the Feel free to comment in that gist, I think that's easier than interleaving all discussions in this PR. |
andy-kimball
reviewed
Oct 15, 2018
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/sql/opt/exec/execbuilder/relational_builder.go, line 398 at r8 (raw file):
Previously, justinj (Justin Jaffray) wrote…
Did you consider making interfaces for these more specific subclasses of expressions, like this and GroupBys? Any reason you opted not to?
Each interface requires a more casting back and forth, and more complications when generating code. I'd only do it if I had additional methods that were useful to add (like DataType for ScalarExpr or NextExpr for RelExpr). Too much polymorphism can be more trouble than its worth, and I decided to draw the line here as a compromise.
pkg/sql/opt/exec/execbuilder/testdata/explain, line 486 at r8 (raw file):
Previously, justinj (Justin Jaffray) wrote…
is this test change a result of not having rebased yet?
Yes, should be gone now.
pkg/sql/opt/optbuilder/scalar.go, line 110 at r8 (raw file):
Previously, justinj (Justin Jaffray) wrote…
Did the fancy list storage not end up being a win in the long run? Or difficult to implement in this new representation?
I ran into a bunch of issues when considering what to intern. I ended up deciding it's best to not intern the list types or the list item types.
- Lists are now represented as typed slices, which would be painful to intern, since we'd need a different list storage implementation for each list item type (
FiltersItem,ProjectionsItem, etc). - I don't know how to intern slices in the same way as we were interning lists before. Before, we had a level of indirection in the form of a
ListID. Now, we just have a regular Go slice. We wouldn't be able to store slices in one larger slice, because we have no way of updating underlying pointers when the larger slice is resized. - If we attempted to intern list expressions (
FiltersExpr,ProjectionsExpr, etc.) rather than slices of items, then we'd trigger additional allocations, since we'd have to allocate slice headers on the heap (i.e.*FiltersExpr). - We can't intern list item expressions (
FiltersItem,ProjectionsItem), because they are inlined into their parent slice. - We have many rules that rewrite lists, and today it's expensive to have to re-intern the entire list just in order to remove or replace a list item. I felt like we should just exclude list and list item types entirely from interning.
pkg/sql/opt/optbuilder/scalar.go, line 160 at r8 (raw file):
Previously, justinj (Justin Jaffray) wrote…
Why is this not
Constructed? Even if it's just a list underneath is that not an implementation detail?
I don't intern list or list item nodes; see above for explanation. Any Construct method would look different from all other Construct methods, because I want to avoid extra allocations:
ConstructAggregationsExpr(in memo.AggregationsExpr) memo.AggregationsExpr
Notice that these are not pointers to expressions, as with every other Construct method. Using pointers here would likely force the slice header onto the heap, as well as imply an expectation that the expressions would be interned. And if I don't use pointers, what would the Construct method do? Just be a dummy pass through? That possibility seemed strange to me.
pkg/sql/opt/optbuilder/testdata/aggregate, line 894 at r15 (raw file):
Previously, justinj (Justin Jaffray) wrote…
nit (doesn't need to be part of this PR): it would probably be helpful if an empty
filterswas more obviously a constantTrue, this makes sense if you know thatfiltersis basicallyand, but someone not familiar with opt inspecting these tests might be confused.
Yeah, I went back and forth on this. I could make it true, though it might lead to other confusion, since people might then think there's actually a true expression when there isn't. @radu, @rebecca, any opinion on this?
pkg/sql/opt/optbuilder/testdata/aggregate, line 2800 at r15 (raw file):
Previously, justinj (Justin Jaffray) wrote…
why did this plan change? is this a rebasing thing?
Yes, should be "no change" now that I've rebased.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
RaduBerinde
Oct 15, 2018
Member
xform part LGTM, some minor comments: https://gist.github.com/RaduBerinde/1324028c377f85dbd051fcd797062f92
|
xform part LGTM, some minor comments: https://gist.github.com/RaduBerinde/1324028c377f85dbd051fcd797062f92 |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
RaduBerinde
Oct 15, 2018
Member
pkg/sql/opt/optbuilder/testdata/aggregate, line 894 at r15 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Yeah, I went back and forth on this. I could make it
true, though it might lead to other confusion, since people might then think there's actually atrueexpression when there isn't. @radu, @rebecca, any opinion on this?
I would make it show up as no-filters (or filters (none) or filters (true))
|
pkg/sql/opt/optbuilder/testdata/aggregate, line 894 at r15 (raw file): Previously, andy-kimball (Andy Kimball) wrote…
I would make it show up as |
rytaft
requested changes
Oct 15, 2018
Nice work! I've reviewed the Optgen change (r11), now looking at the norm change (r13).
Reviewed 8 of 197 files at r9, 18 of 18 files at r11.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/sql/opt/optgen/cmd/optgen/exprs_gen.go, line 99 at r11 (raw file):
// } // func (g *exprsGen) genGroupExpr(define *lang.DefineExpr) {
[nit] for consistency, maybe this should be genGroupStruct?
pkg/sql/opt/optgen/cmd/optgen/exprs_gen.go, line 186 at r11 (raw file):
for _, field := range define.Fields { // If field's name is "_", then use Go embedding syntax. fieldName := g.md.fieldName(field)
you only need fieldName if isEmbeddedField is false
pkg/sql/opt/optgen/cmd/optgen/exprs_gen.go, line 480 at r11 (raw file):
// Generate the Op method. fmt.Fprintf(g.w, "func (e *%s) Op() opt.Operator {\n", opTyp.name)
Is there a benefit to having pointer receivers for slice types?
pkg/sql/opt/optgen/cmd/optgen/exprs_gen.go, line 491 at r11 (raw file):
// Generate the Child method. fmt.Fprintf(g.w, "func (e *%s) Child(nth int) opt.Expr {\n", opTyp.name) if opTyp.listItemType.isGenerated {
Why does isGenerated mean that it's a pointer type?
pkg/sql/opt/optgen/cmd/optgen/factory_gen.go, line 250 at r11 (raw file):
} else { g.w.writeIndent("before := list[i]\n") g.w.writeIndent("after := replace(before).(opt.ScalarExpr)\n")
[nit] this line is redundant and can be moved out of the if block
pkg/sql/opt/optgen/cmd/optgen/factory_gen.go, line 358 at r11 (raw file):
if itemType.isGenerated { firstFieldName := g.md.fieldName(itemDefine.Fields[0]) g.w.writeIndent("dst[i].%s = f.assignPlaceholders(src[i].%s).(opt.ScalarExpr)\n",
Why are we only assigning placeholders to the first field?
pkg/sql/opt/optgen/cmd/optgen/metadata.go, line 220 at r11 (raw file):
// Remove package prefix from types in the same package. if strings.HasPrefix(typ.fullName, pkg+".") { typ.name = typ.fullName[len(pkg)+1:]
what about pointers? don't you need an extra char in that case?
pkg/sql/opt/optgen/cmd/optgen/metadata.go, line 248 at r11 (raw file):
// fieldName maps the Optgen field name to the corresponding Go field name. In // particular, fields named "_" are mapped to the name of a Go embedded field, // which is equal to the field's type name.
does the type name include the package name?
pkg/sql/opt/optgen/cmd/optgen/metadata.go, line 269 at r11 (raw file):
// a private field and will not be returned. func (m *metadata) childFields(define *lang.DefineExpr) lang.DefineFieldsExpr { // Skip until non-node field is found.
non-expr?
pkg/sql/opt/optgen/cmd/optgen/metadata.go, line 293 at r11 (raw file):
// the operator, then privateField returns nil. func (m *metadata) privateField(define *lang.DefineExpr) *lang.DefineFieldExpr { // Skip until non-node field is found.
ditto
pkg/sql/opt/optgen/cmd/optgen/metadata.go, line 329 at r11 (raw file):
// Since SetPrivate values are passed by reference, they must be dereferenced // before copying them to a target field. func (m *metadata) fieldStorePrefix(field *lang.DefineFieldExpr) string {
Seems like you could have more helper functions like this to encapsulate some of the complexity elsewhere due to the isPointer, isGenerated and passByVal fields
pkg/sql/opt/optgen/cmd/optgen/ops_gen.go, line 21 at r11 (raw file):
"fmt" "io"
[nit] extra line
pkg/sql/opt/optgen/cmd/optgen/rule_gen.go, line 57 at r11 (raw file):
// like this instead: // // _select := _innerJoin.(*SelectExpr)
Maybe you mean _innerJoin.Left.(*SelectExpr)?
pkg/sql/opt/optgen/cmd/optgen/rule_gen.go, line 184 at r11 (raw file):
// // In that example, the context starts out as "elems", which is the name of the // top-level Tuple oeprator fields that's being list matched. Then, the context
oeprator -> operator
fields -> field
pkg/sql/opt/optgen/cmd/optgen/rule_gen.go, line 187 at r11 (raw file):
// recursively becomes _item, which is bound to one of the expressions in the // list. And finally, it becomes _eq.Left, which returns the left operand of the // Eq operator, and which is matched against in the innermost if statement.
I think you need to add one more if statement to the example
pkg/sql/opt/optgen/cmd/optgen/rule_gen.go, line 580 at r11 (raw file):
} else { // Match tag name. fmt.Fprintf(&buf, "opt.Is%sOp(%s)", define.Name, context)
Isn't define nil here?
pkg/sql/opt/optgen/cmd/optgen/rule_gen.go, line 788 at r11 (raw file):
// Only types passed by reference need to use "&". argTyp := g.md.typeOf(opDef.Fields[i]) if !argTyp.passByVal || argTyp.isPointer {
shouldn't this be if argTyp.passByVal || argTyp.isPointer { according to your comment?
pkg/sql/opt/optgen/cmd/optgen/validator.go, line 74 at r11 (raw file):
break } v.addErrorf(t.Source(), "list match operator cannot match field of type %s", extType.Name)
[nit] why not switch this around and test typ == nil || typ.listItemType == nil
pkg/sql/opt/optgen/cmd/optgen/testdata/exprs, line 176 at r11 (raw file):
} func (g *projectGroup) setBestProps(physical *props.Physical, cost Cost) {
I thought the best expr in a group was relative to a particular set of physical props. So you could potentially have multiple best exprs for a given group...
pkg/sql/opt/optgen/cmd/optgen/testdata/exprs, line 279 at r11 (raw file):
type SortExpr struct { Input opt.Expr Input RelExpr
why are there two inputs?
pkg/sql/opt/optgen/cmd/optgen/testdata/factory, line 221 at r11 (raw file):
} func (f *Factory) assignPlaceholders(src opt.Expr) (dst opt.Expr) {
Is it worth checking here if src has HasPlaceholders=true?
pkg/sql/opt/optgen/cmd/optgen/testdata/factory, line 244 at r11 (raw file):
case *memo.FiltersExpr: return t
why are we not assigning placeholders here?
pkg/sql/opt/optgen/cmd/optgen/testdata/rulenames, line 23 at r11 (raw file):
(Join $r:* $s:*) => (Join $s $r) ---- test.opt:14:18: list match operator cannot match field of type Expr
Did you want to rewrite these tests so they aren't creating errors?
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andy-kimball
Oct 15, 2018
Contributor
I would make it show up as
no-filters(orfilters (none)orfilters (true))
Good idea, I like filters (true).
Good idea, I like |
andy-kimball
reviewed
Oct 16, 2018
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/sql/opt/optgen/cmd/optgen/exprs_gen.go, line 99 at r11 (raw file):
Previously, rytaft wrote…
[nit] for consistency, maybe this should be
genGroupStruct?
I changed it to genExprGroupDef to be symmetric with genExprDef, since it's also generating methods.
pkg/sql/opt/optgen/cmd/optgen/exprs_gen.go, line 186 at r11 (raw file):
Previously, rytaft wrote…
you only need
fieldNameifisEmbeddedFieldis false
Done.
pkg/sql/opt/optgen/cmd/optgen/exprs_gen.go, line 480 at r11 (raw file):
Previously, rytaft wrote…
Is there a benefit to having pointer receivers for slice types?
They need to be that way. For example, the Child method of a Select operator needs to return &e.Filters, which is a pointer to a slice, rather than the slice itself. If we were to return the slice itself, it would force a heap allocation.
pkg/sql/opt/optgen/cmd/optgen/exprs_gen.go, line 491 at r11 (raw file):
Previously, rytaft wrote…
Why does
isGeneratedmean that it's a pointer type?
It's the other way around; if it's generated then it's a type like FiltersItem that's inlined in the slice, and so to get an opt.Expr we need to take the address of the slice entry.
However, that's not too clear, so I changed it to check isPointer instead (i.e. when list item type is opt.ScalarExpr).
pkg/sql/opt/optgen/cmd/optgen/factory_gen.go, line 250 at r11 (raw file):
Previously, rytaft wrote…
[nit] this line is redundant and can be moved out of the if block
Done.
pkg/sql/opt/optgen/cmd/optgen/factory_gen.go, line 358 at r11 (raw file):
Previously, rytaft wrote…
Why are we only assigning placeholders to the first field?
I added a comment:
// All list item types generated by Optgen can have at most one input
// field (always the first field). Any other fields must be privates.
// And placeholders only need to be assigned for input fields.
pkg/sql/opt/optgen/cmd/optgen/metadata.go, line 220 at r11 (raw file):
Previously, rytaft wrote…
what about pointers? don't you need an extra char in that case?
None of the pointer types are in the packages into which we generate files. If that changes, it's easy enough to extend this code. I just don't like to handle cases that are impossible to test.
pkg/sql/opt/optgen/cmd/optgen/metadata.go, line 248 at r11 (raw file):
Previously, rytaft wrote…
does the type name include the package name?
Optgen syntax does not allow that, since it only allows alphanumeric type names. I added more commentary on this.
pkg/sql/opt/optgen/cmd/optgen/metadata.go, line 269 at r11 (raw file):
Previously, rytaft wrote…
non-expr?
Done.
pkg/sql/opt/optgen/cmd/optgen/metadata.go, line 293 at r11 (raw file):
Previously, rytaft wrote…
ditto
Done.
pkg/sql/opt/optgen/cmd/optgen/metadata.go, line 329 at r11 (raw file):
Previously, rytaft wrote…
Seems like you could have more helper functions like this to encapsulate some of the complexity elsewhere due to the
isPointer,isGeneratedandpassByValfields
I originally tried this, but it turns out that every case is unique, and so I was just calling them once. It felt even more confusing to have all these methods that looked almost the same, but were each different. I couldn't even figure out good names for them.
pkg/sql/opt/optgen/cmd/optgen/ops_gen.go, line 21 at r11 (raw file):
Previously, rytaft wrote…
[nit] extra line
Done.
pkg/sql/opt/optgen/cmd/optgen/rule_gen.go, line 57 at r11 (raw file):
Previously, rytaft wrote…
Maybe you mean
_innerJoin.Left.(*SelectExpr)?
Done.
pkg/sql/opt/optgen/cmd/optgen/rule_gen.go, line 184 at r11 (raw file):
Previously, rytaft wrote…
oeprator -> operator
fields -> field
Done.
pkg/sql/opt/optgen/cmd/optgen/rule_gen.go, line 187 at r11 (raw file):
Previously, rytaft wrote…
I think you need to add one more if statement to the example
Done.
pkg/sql/opt/optgen/cmd/optgen/rule_gen.go, line 580 at r11 (raw file):
Previously, rytaft wrote…
Isn't define nil here?
Good catch. It will never actually be nil, so the second branch is never taken. This means we're not checking tags as I intended (it will work as-is, but won't be as performant). Fixed.
pkg/sql/opt/optgen/cmd/optgen/rule_gen.go, line 788 at r11 (raw file):
Previously, rytaft wrote…
shouldn't this be
if argTyp.passByVal || argTyp.isPointer {according to your comment?
The comment is wrong. I updated to this:
// Non-pointer types that are passed by value (e.g. FiltersExpr and
// opt.ColumnID) may have "&" applied to them, so extract them.
pkg/sql/opt/optgen/cmd/optgen/validator.go, line 74 at r11 (raw file):
Previously, rytaft wrote…
[nit] why not switch this around and test
typ == nil || typ.listItemType == nil
Done.
pkg/sql/opt/optgen/cmd/optgen/testdata/exprs, line 176 at r11 (raw file):
Previously, rytaft wrote…
I thought the best expr in a group was relative to a particular set of physical props. So you could potentially have multiple best exprs for a given group...
It's good that you thought about that. I added this comment to Optimizer.setLowestCostTree by way of explanation:
// Note that a memo group can have multiple lowest cost expressions, each for a
// different set of physical properties. During optimization, these are retained
// in the groupState map. However, only one of those lowest cost expressions
// will be used in the final tree; the others are simply discarded. This is
// because there is never a case where a relational expression is referenced
// multiple times in the final tree, but with different physical properties
// required by each of those references.
My entire rewrite is based on that statement being true. It turns out that we never reference a relational expression more than once, since that would mean reusing ColumnIDs, which we can't do for other reasons. The only other case I could think of would be a relational expression with no columns (like ScanExpr with all cols pruned). But none of those expressions would ever provide any physical properties, so even if they were reused, an enforcer would be providing the properties (and those are always created as separate expressions).
pkg/sql/opt/optgen/cmd/optgen/testdata/exprs, line 279 at r11 (raw file):
Previously, rytaft wrote…
why are there two inputs?
The definition should not include a field. Fixed.
pkg/sql/opt/optgen/cmd/optgen/testdata/factory, line 221 at r11 (raw file):
Previously, rytaft wrote…
Is it worth checking here if
srchasHasPlaceholders=true?
assignPlaceholders does double-duty. It makes a copy of the memo at the same time it's assigning placeholders. Here's the comment on Factory.AssignPlaceholders:
// AssignPlaceholders is used just before execution of a prepared Memo. It makes
// a copy of the given memo, but with any placeholder values replaced by their
// assigned values. This can trigger additional normalization rules that can
// substantially rewrite the tree. Once all placeholders are assigned, the
// exploration phase can begin.
That said, it might be safe to reuse scalar subexpressions that have no placeholders and do not have subqueries. I'll have to think about that more. I'd do that in a follow-on PR, though, since it'd be a perf item, not a correctness issue.
pkg/sql/opt/optgen/cmd/optgen/testdata/factory, line 244 at r11 (raw file):
Previously, rytaft wrote…
why are we not assigning placeholders here?
We should never reach those cases, since we're special-casing lists in their parent operators (see the FiltersExpr case). I changed the code generator to not generate those cases so we'll error if we hit them.
pkg/sql/opt/optgen/cmd/optgen/testdata/rulenames, line 23 at r11 (raw file):
Previously, rytaft wrote…
Did you want to rewrite these tests so they aren't creating errors?
Done.
rytaft
requested changes
Oct 16, 2018
Reviewed 18 of 18 files at r18.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/sql/opt/optgen/cmd/optgen/metadata.go, line 220 at r11 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
None of the pointer types are in the packages into which we generate files. If that changes, it's easy enough to extend this code. I just don't like to handle cases that are impossible to test.
Makes sense - maybe just add a comment so that's clear?
pkg/sql/opt/optgen/cmd/optgen/metadata.go, line 248 at r11 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Optgen syntax does not allow that, since it only allows alphanumeric type names. I added more commentary on this.
This helps. Just a nit -- I think it would be better to change => to say "gets compiled into" so it doesn't look so much like an optgen rule.
pkg/sql/opt/optgen/cmd/optgen/testdata/exprs, line 176 at r11 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
It's good that you thought about that. I added this comment to
Optimizer.setLowestCostTreeby way of explanation:// Note that a memo group can have multiple lowest cost expressions, each for a // different set of physical properties. During optimization, these are retained // in the groupState map. However, only one of those lowest cost expressions // will be used in the final tree; the others are simply discarded. This is // because there is never a case where a relational expression is referenced // multiple times in the final tree, but with different physical properties // required by each of those references.My entire rewrite is based on that statement being true. It turns out that we never reference a relational expression more than once, since that would mean reusing
ColumnIDs, which we can't do for other reasons. The only other case I could think of would be a relational expression with no columns (likeScanExprwith all cols pruned). But none of those expressions would ever provide any physical properties, so even if they were reused, an enforcer would be providing the properties (and those are always created as separate expressions).
Got it - thanks for the explanation!
pkg/sql/opt/optgen/cmd/optgen/testdata/factory, line 221 at r11 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
assignPlaceholdersdoes double-duty. It makes a copy of the memo at the same time it's assigning placeholders. Here's the comment onFactory.AssignPlaceholders:// AssignPlaceholders is used just before execution of a prepared Memo. It makes // a copy of the given memo, but with any placeholder values replaced by their // assigned values. This can trigger additional normalization rules that can // substantially rewrite the tree. Once all placeholders are assigned, the // exploration phase can begin.That said, it might be safe to reuse scalar subexpressions that have no placeholders and do not have subqueries. I'll have to think about that more. I'd do that in a follow-on PR, though, since it'd be a perf item, not a correctness issue.
rytaft
requested changes
Oct 16, 2018
Still reviewing, but figured I'd post the comments I have so far.
Reviewed 5 of 42 files at r20.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/sql/opt/norm/custom_funcs.go, line 390 at r20 (raw file):
} // IsContradiction returns true if any of the given filter item contains a
any of the given -> the given
pkg/sql/opt/norm/custom_funcs.go, line 485 at r20 (raw file):
filters memo.FiltersExpr, cols opt.ColSet, ) memo.FiltersExpr { newFilters := make(memo.FiltersExpr, 0, len(filters)-1)
Is it true that we only call this function if there is at least one unbound condition? If so, add a comment to explain why the capacity is len(filters)-1.
pkg/sql/opt/norm/custom_funcs.go, line 501 at r20 (raw file):
filters memo.FiltersExpr, cols opt.ColSet, ) memo.FiltersExpr { newFilters := make(memo.FiltersExpr, 0, len(filters)-1)
Same here - the function should only be called if there is at least one bound condition.
pkg/sql/opt/norm/custom_funcs.go, line 543 at r20 (raw file):
for i := range inner { item := &inner[i] if passthrough.Contains(int(item.Col)) {
It would be good to add a comment that the outer synthesized columns do not contain references to item since CanMergeProjectsions should have returned true.
pkg/sql/opt/norm/custom_funcs.go, line 596 at r20 (raw file):
} // MakeEmptyColSet returns a column set with columns in it.
with columns -> with no columns
pkg/sql/opt/norm/custom_funcs.go, line 618 at r20 (raw file):
func (c *CustomFuncs) SimplifyFilters(filters memo.FiltersExpr) memo.FiltersExpr { // This code assumes that the NormalizeNestedAnds rule has already run and // ensured a left deep And tree. If not (maybe because it's a testing
Out of curiosity, why did you decide to change And and Or back to being binary operators?
pkg/sql/opt/norm/custom_funcs.go, line 621 at r20 (raw file):
// scenario), then this rule may rematch, but it should still make forward // progress). cnt := 0
It would help to explicitly say that this first nested for-loop is just used to count the number of conjuncts in a flattened list. I had to stare at it for a bit to figure that out.
pkg/sql/opt/norm/custom_funcs.go, line 646 at r20 (raw file):
// find nested And operators. It adds any conjuncts (ignoring True operators) to // the given FiltersExpr and returns true. If it finds a False or Null operator, // it returns propagates a false return value all the up the call stack, and
returns propagates -> propagates
pkg/sql/opt/norm/custom_funcs.go, line 810 at r20 (raw file):
// A OR (A AND B) => A // (A AND B) OR (A AND C) => A // (A AND B AND C) OR (A AND D OR E) => A
Is this correct? Not sure this works unless it's (A AND D AND E).
pkg/sql/opt/norm/custom_funcs.go, line 826 at r20 (raw file):
left = and.Left } else { if !c.isConjunct(left, right) {
[nit] to make these two cases parallel, why don't you check if c.IsConjunct(left, right) and swap the two return statements
pkg/sql/opt/norm/custom_funcs.go, line 863 at r20 (raw file):
// // These transformations are useful for finding a conjunct that can be pushed // down in the query tree. For example, if the redundant subclause A is
subclause -> conjunct
pkg/sql/opt/norm/custom_funcs.go, line 1045 at r20 (raw file):
// EqualsOpName returns true if the two operators are the same. func (c *CustomFuncs) EqualsOpName(left, right opt.Operator) bool {
[nit] This name is a bit confusing since it sounds like it should return the name of the EqualsOp operator. Why not just call it SameAs or something like like that?
pkg/sql/opt/norm/custom_funcs.go, line 1079 at r20 (raw file):
// ---------------------------------------------------------------------- // EqualsNumber returns true if the given private numeric value (decimal, float,
[nit] can remove the word "private"
pkg/sql/opt/norm/custom_funcs.go, line 1225 at r20 (raw file):
// UnifyComparison attempts to convert a constant expression to the type of the // variables expression, if that conversion can round-trip and is monotonic.
variables -> variable
pkg/sql/opt/norm/decorrelate.go, line 130 at r20 (raw file):
for i := range filters { item := &filters[i] if item.ScalarProps(c.mem).Rule.HasHoistableSubquery {
Do you need to check if this property is available?
pkg/sql/opt/norm/decorrelate.go, line 791 at r20 (raw file):
return r.f.Reconstruct(scalar, func(nd opt.Expr) opt.Expr { // Recursively hoist subqueries in each child that contains them. return r.hoistAll(nd.(opt.ScalarExpr))
No need to check HasHoistableSubquery anymore?
pkg/sql/opt/norm/factory_test.go, line 55 at r20 (raw file):
sel = f.ConstructSelect(vals, filters) if len(sel.(*memo.SelectExpr).Filters) != 2 { t.Fatalf("filters result should have filtered True operator")
shouldn't there only be one filter?
memo.FiltersExpr{{Condition: eq}, {Condition: &memo.TrueFilter}} reduces to memo.FiltersExpr{{Condition: eq}}
andy-kimball
added some commits
Oct 10, 2018
andy-kimball
reviewed
Oct 17, 2018
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/sql/opt/norm/custom_funcs.go, line 390 at r20 (raw file):
Previously, rytaft wrote…
any of the given -> the given
Done.
pkg/sql/opt/norm/custom_funcs.go, line 485 at r20 (raw file):
Previously, rytaft wrote…
Is it true that we only call this function if there is at least one unbound condition? If so, add a comment to explain why the capacity is
len(filters)-1.
Actually, there may be zero unbound conditions. I'll get rid of the -1.
pkg/sql/opt/norm/custom_funcs.go, line 501 at r20 (raw file):
Previously, rytaft wrote…
Same here - the function should only be called if there is at least one bound condition.
Same as above.
pkg/sql/opt/norm/custom_funcs.go, line 543 at r20 (raw file):
Previously, rytaft wrote…
It would be good to add a comment that the outer synthesized columns do not contain references to item since
CanMergeProjectsionsshould have returned true.
I added an extra comment to the method header to that effect.
pkg/sql/opt/norm/custom_funcs.go, line 596 at r20 (raw file):
Previously, rytaft wrote…
with columns -> with no columns
Done.
pkg/sql/opt/norm/custom_funcs.go, line 618 at r20 (raw file):
Previously, rytaft wrote…
Out of curiosity, why did you decide to change
AndandOrback to being binary operators?
- It saves us slice allocations in the common case of
WHEREconjuncts, which just get rolled up into aFilterslist anyway. - It mirrors the AST expression. I hope to unify the two representations in the future.
- Most of the use cases for having a list are now handled by the
Filtersoperator. There were only a coupleAnduse cases left, and I changed those accordingly. The redundant subclause rules aren't quite as elegant now, but I think they're still pretty good.
pkg/sql/opt/norm/custom_funcs.go, line 621 at r20 (raw file):
Previously, rytaft wrote…
It would help to explicitly say that this first nested for-loop is just used to count the number of conjuncts in a flattened list. I had to stare at it for a bit to figure that out.
Done.
pkg/sql/opt/norm/custom_funcs.go, line 646 at r20 (raw file):
Previously, rytaft wrote…
returns propagates -> propagates
Done.
pkg/sql/opt/norm/custom_funcs.go, line 810 at r20 (raw file):
Previously, rytaft wrote…
Is this correct? Not sure this works unless it's
(A AND D AND E).
I added parentheses to fix precedence.
pkg/sql/opt/norm/custom_funcs.go, line 826 at r20 (raw file):
Previously, rytaft wrote…
[nit] to make these two cases parallel, why don't you check
if c.IsConjunct(left, right)and swap the two return statements
Done.
pkg/sql/opt/norm/custom_funcs.go, line 863 at r20 (raw file):
Previously, rytaft wrote…
subclause -> conjunct
Done.
pkg/sql/opt/norm/custom_funcs.go, line 1045 at r20 (raw file):
Previously, rytaft wrote…
[nit] This name is a bit confusing since it sounds like it should return the name of the
EqualsOpoperator. Why not just call itSameAsor something like like that?
I changed to OpsAreSame.
pkg/sql/opt/norm/custom_funcs.go, line 1079 at r20 (raw file):
Previously, rytaft wrote…
[nit] can remove the word "private"
Done.
pkg/sql/opt/norm/custom_funcs.go, line 1225 at r20 (raw file):
Previously, rytaft wrote…
variables -> variable
Done.
pkg/sql/opt/norm/decorrelate.go, line 130 at r20 (raw file):
Previously, rytaft wrote…
Do you need to check if this property is available?
FiltersItem, ProjectionsItem, and AggregationsItem expressions always have scalar props defined. They're lazily populated by the call to ScalarProps(c.mem) (via walking the subtree and then caching the derived props).
pkg/sql/opt/norm/decorrelate.go, line 791 at r20 (raw file):
Previously, rytaft wrote…
No need to check
HasHoistableSubqueryanymore?
We can't do that any longer, b/c every scalar operator no longer has scalar props on it.
pkg/sql/opt/norm/factory_test.go, line 55 at r20 (raw file):
Previously, rytaft wrote…
shouldn't there only be one filter?
memo.FiltersExpr{{Condition: eq}, {Condition: &memo.TrueFilter}}reduces tomemo.FiltersExpr{{Condition: eq}}
Whoops, that's a bug in the test. Fixed.
pkg/sql/opt/optgen/cmd/optgen/metadata.go, line 220 at r11 (raw file):
Previously, rytaft wrote…
Makes sense - maybe just add a comment so that's clear?
Done.
pkg/sql/opt/optgen/cmd/optgen/metadata.go, line 248 at r11 (raw file):
Previously, rytaft wrote…
This helps. Just a nit -- I think it would be better to change
=>to say "gets compiled into" so it doesn't look so much like an optgen rule.
Done.
rytaft
requested changes
Oct 17, 2018
Almost there... still a bit more to review
Reviewed 1 of 112 files at r23, 30 of 42 files at r25.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/sql/opt/norm/custom_funcs.go, line 618 at r20 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
- It saves us slice allocations in the common case of
WHEREconjuncts, which just get rolled up into aFilterslist anyway.- It mirrors the AST expression. I hope to unify the two representations in the future.
- Most of the use cases for having a list are now handled by the
Filtersoperator. There were only a coupleAnduse cases left, and I changed those accordingly. The redundant subclause rules aren't quite as elegant now, but I think they're still pretty good.
Makes sense - thanks!
pkg/sql/opt/norm/inline.go, line 22 at r25 (raw file):
) // HasDuplicateRefs returns true if the target projections expression reference
expression -> expressions
pkg/sql/opt/norm/inline.go, line 23 at r25 (raw file):
// HasDuplicateRefs returns true if the target projections expression reference // any outer column more than one time, or if it has a subquery. For example:
why did you remove the word "correlated"? I think uncorrelated subqueries are ok...
pkg/sql/opt/norm/inline.go, line 46 at r25 (raw file):
// already contains a reference to that column, then there is a duplicate. // findDupRefs returns true if the subtree contains at least one duplicate. var findDupRefs func(nd opt.Expr) bool
[nit] what is nd? This variable name is kind of cryptic
pkg/sql/opt/norm/inline.go, line 127 at r25 (raw file):
} // InlineProjectProject searches the input expression (with must be a Project
with -> which
pkg/sql/opt/norm/inline.go, line 130 at r25 (raw file):
// operator) for any variable references to columns from the given projections // expression. Each variable is replaced by the corresponding inlined projection // expression.
Isn't this backwards? It seems like it's searching projections for any variable references from input's projections.
pkg/sql/opt/norm/inline.go, line 162 at r25 (raw file):
// Recursively walk the tree looking for references to projection expressions // that need to be replaced. func (c *CustomFuncs) inlineProjections(nd opt.Expr, projections memo.ProjectionsExpr) opt.Expr {
[nit] another case of nd. I guess this is like "node"? Could use a better name. Maybe just e, which I've seen you use elsewhere.
pkg/sql/opt/norm/join.go, line 130 at r25 (raw file):
filters memo.FiltersExpr, src *memo.FiltersItem, dst memo.RelExpr, ) bool { // Fast path if condition is already bound by input.
Why did you change this comment from src/dst to condition/input?
pkg/sql/opt/norm/join.go, line 301 at r25 (raw file):
} // Normalize leftCol to come from leftCols.
This comment doesn't really make sense anymore
pkg/sql/opt/norm/join.go, line 560 at r25 (raw file):
// - a and b are not "bare" variables; // - a and b contain no correlated subqueries; // - neither a or b are constants.
Update comment -- a and b have been replaced by scalar
pkg/sql/opt/norm/project_builder.go, line 29 at r25 (raw file):
// e1 := pb.add(some expression) // e2 := pb.add(some other expression) // augmentedInput := pb.buildProject(input)
Missing passthrough columns
pkg/sql/opt/norm/rules/bool.opt, line 42 at r25 (raw file):
$right # SimplifyTrueAnd simplifies the And operator by discarding a True condition on
SimplifyTrueAnd -> SimplifyAndTrue
pkg/sql/opt/norm/rules/bool.opt, line 148 at r25 (raw file):
(And (Not $left) (Not $right)) # ExtractRedundantConjunct matches an OR expression in which the same conjunct
Nice rewriting of this rule
pkg/sql/opt/norm/rules/bool.opt, line 157 at r25 (raw file):
# # This transformation is useful for finding a conjunct that can be pushed down # in the query tree. For example, if the redundant subclause A is fully bound by
subclause -> conjunct
pkg/sql/opt/norm/rules/bool.opt, line 160 at r25 (raw file):
# one side of a join, it can be pushed through the join, even if B AND C cannot. [ExtractRedundantConjunct, Normalize] (Or $left:^(Or) $right:^(Or) & (Succeeded $subclause:(FindRedundantConjunct $left $right)))
subclause -> conjunct
pkg/sql/opt/norm/rules/decorrelate.opt, line 64 at r23 (raw file):
# PushFilterIntoJoinRight rule. A Select operator with outer columns is always # pulled up as far as possible before being pushed back down. #
How did you eliminate the need for cycle detection?
pkg/sql/opt/norm/rules/decorrelate.opt, line 693 at r25 (raw file):
# NormalizeJoinAnyFilter is similar to NormalizeSelectAnyFilter, except that it # operates on Any expressions within Join filters rather than Select filters.
Why did you need to split these into two different rules?
pkg/sql/opt/norm/rules/join.opt, line 124 at r25 (raw file):
SemiJoin | SemiJoinApply | AntiJoin | AntiJoinApply $left:* $right:* & ^(HasOuterCols $left)
Did you mean ^(HasOuterCols $right)?
rytaft
approved these changes
Oct 17, 2018
Reviewed 3 of 92 files at r2, 2 of 197 files at r9, 3 of 195 files at r16, 12 of 42 files at r25.
Reviewable status:complete! 1 of 0 LGTMs obtained
pkg/sql/opt/norm/testdata/rules/bool, line 224 at r25 (raw file):
opt SELECT k=1 OR true AND f=3.5 AS r FROM a
I don't think this is testing the right rules -- AND has precedence here
andy-kimball
added some commits
Oct 10, 2018
andy-kimball
reviewed
Oct 18, 2018
Reviewable status:
complete! 1 of 0 LGTMs obtained
pkg/sql/opt/norm/inline.go, line 22 at r25 (raw file):
Previously, rytaft wrote…
expression -> expressions
Done.
pkg/sql/opt/norm/inline.go, line 23 at r25 (raw file):
Previously, rytaft wrote…
why did you remove the word "correlated"? I think uncorrelated subqueries are ok...
For a little while during the rewrite it was all subqueries. I switched it back, but didn't switch the comment back.
pkg/sql/opt/norm/inline.go, line 46 at r25 (raw file):
Previously, rytaft wrote…
[nit] what is
nd? This variable name is kind of cryptic
It was short for Node. I didn't catch all the places when I renamed Node to Expr. Fixed.
pkg/sql/opt/norm/inline.go, line 127 at r25 (raw file):
Previously, rytaft wrote…
with -> which
Done.
pkg/sql/opt/norm/inline.go, line 130 at r25 (raw file):
Previously, rytaft wrote…
Isn't this backwards? It seems like it's searching
projectionsfor any variable references frominput's projections.
Done.
pkg/sql/opt/norm/inline.go, line 162 at r25 (raw file):
Previously, rytaft wrote…
[nit] another case of
nd. I guess this is like "node"? Could use a better name. Maybe juste, which I've seen you use elsewhere.
Done.
pkg/sql/opt/norm/join.go, line 130 at r25 (raw file):
Previously, rytaft wrote…
Why did you change this comment from
src/dsttocondition/input?
Was from an earlier iteration where that made sense, missed changing comment back.
pkg/sql/opt/norm/join.go, line 301 at r25 (raw file):
Previously, rytaft wrote…
This comment doesn't really make sense anymore
Done.
pkg/sql/opt/norm/join.go, line 560 at r25 (raw file):
Previously, rytaft wrote…
Update comment --
aandbhave been replaced byscalar
I switched it back to use a and b - at one point in my iterations it made sense to use scalar instead.
pkg/sql/opt/norm/project_builder.go, line 29 at r25 (raw file):
Previously, rytaft wrote…
Missing passthrough columns
Done.
pkg/sql/opt/norm/rules/bool.opt, line 42 at r25 (raw file):
Previously, rytaft wrote…
SimplifyTrueAnd -> SimplifyAndTrue
Done.
pkg/sql/opt/norm/rules/bool.opt, line 157 at r25 (raw file):
Previously, rytaft wrote…
subclause -> conjunct
Done.
pkg/sql/opt/norm/rules/bool.opt, line 160 at r25 (raw file):
Previously, rytaft wrote…
subclause -> conjunct
Done.
pkg/sql/opt/norm/rules/decorrelate.opt, line 64 at r23 (raw file):
Previously, rytaft wrote…
How did you eliminate the need for cycle detection?
I added HasOuterCols checks in the PushFilterIntoJoin rules. So we no longer push down in cases where it might trigger a cycle. This did result in at least one case where we can't decorrelate where we could before. I think it's still the right way to go, since these are edge cases that don't justify the complexity of cycle detection.
pkg/sql/opt/norm/rules/decorrelate.opt, line 693 at r25 (raw file):
Previously, rytaft wrote…
Why did you need to split these into two different rules?
Because we can no longer match on List operators like Filters, so we have to match on their parent operator. The new design does not intern or match list operators for various reasons (I listed them in another comment made by Justin). It's a downside, but I think justified by the benefits we get elsewhere (more perf, simpler construction, etc).
pkg/sql/opt/norm/rules/join.opt, line 124 at r25 (raw file):
Previously, rytaft wrote…
Did you mean
^(HasOuterCols $right)?
Good catch. I fixed and added regression cases.
pkg/sql/opt/norm/testdata/rules/bool, line 224 at r25 (raw file):
Previously, rytaft wrote…
I don't think this is testing the right rules -- AND has precedence here
I fixed this and added expectations.
andy-kimball commentedOct 10, 2018
•
edited by knz
Change how the Memo stores groups and expressions. Currently, both
are stored in Go slices which are resized as new groups and expressions
are added to the memo. Expressions store GroupIDs and PrivateIDs, which
are indexes into slices holding memo groups and expression-specific data.
While this design has worked OK, it has several downsides:
There is very little type safety, since all expressions are stored
in identical 16 byte structs so they can be packed into slices.
This can also make debugging tedious, as the debugger does not
know how to decode the expressions.
Expressions can have only 3 fields, since we pack them into 16
bytes. For many operators this is fine, but it requires awkward
workarounds for operators than need >3 fields.
Many nodes have "private" data fields that require the allocation
of a separate object to store. This cancels what would otherwise
be an advantage of the current design, since it ends up allocating
about the same number of objects as the new design for this and
other reasons. It also requires tedious calls to InternPrivate
and LookupPrivate methods in order to map to and from slice
indexes.
When the optimizer decides on a lowest cost expression tree, it
must store that information "off to the side" in BestExpr
structures so that ExprView knows which of the trees in the forest
to traverse. This triggers additional allocations.
Traversal of the tree requires an auxiliary ExprView data structure
that holds onto context necessary for the traversal (i.e. the Memo).
This makes the current design harder to understand and use.
All expressions allocate and store logical properties. While this is
necessary for relational expressions, it is not required for most
scalar operators.
The memo expression storage format is not compatible with nodes used
in other parts of Cockroach, like AST and planNodes. This makes mixing
and matching nodes difficult, and requires "wrapper" nodes to map
between different representations.
The new memo expression representation solves these problems using a more
natural Go-friendly design. Each expression is a differently-typed Go
struct that implements the Expr interface:
This is similar to how AST nodes and planNodes work. Optgen generates a
struct definition for each operator, as well as implementations for the
Expr interface methods. Private values are stored inline in the structs,
replacing the need for an additional allocation. Expressions can be
constructed using a very natural factory interface with no explicit
interning required.
Where it gets more interesting is for relational expressions, which are
part of memo groups that can contain multiple logically equivalent
expressions. The new representation links together expressions in the
same group using "next" pointers. In addition, the first expression in
the memo is allocated with extra inline storage to store relational and
physical properties. This avoids separate allocation for the properties
like the current representation requires. For a simple expression tree
with 1 expression per group, the tree requires just 1 object per memo
group.
Here are numbers for the old vs. new representations: