Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: try harder to trim the filter expression after processing scanNodes. #13373

Merged
merged 1 commit into from
Feb 9, 2017

Conversation

knz
Copy link
Contributor

@knz knz commented Feb 2, 2017

Fixes #3473.
Fixes #13351.

cc @andreimatei.


This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Feb 3, 2017

Take your time, it's not too trivial I realize.

(But so far I know only Peter and yourself understand this code.)

@RaduBerinde
Copy link
Member

Great stuff!


Review status: 0 of 6 files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.


pkg/sql/index_selection.go, line 218 at r1 (raw file):

		if s.filter == parser.DBoolTrue {
			s.filter = nil
		} else if s.filter == parser.DBoolFalse {

[nit] easier to read the flow if we do if it's false { return } and we don't need the else


pkg/sql/index_selection.go, line 1426 at r1 (raw file):

// expandConstraint transforms a potentially complex constraint
// expression into one or more simple constraint suitable for the

constraints


pkg/sql/index_selection.go, line 1464 at r1 (raw file):

		case parser.LE:
			fallthrough
		case parser.GE:

this can just be case parser.LT, parser.LE,..:


pkg/sql/index_selection.go, line 1465 at r1 (raw file):

			fallthrough
		case parser.GE:
			// (a,b,c) < (x,y,z)  -> [a<x]

This is not true, it implies a <= x, e.g. (1, 2, 3) < (1, 2, 4). We need to fix the GT and LT cases. Also fix the function comment, plus a test would be good given that it wasn't caught.


pkg/sql/index_selection.go, line 1522 at r1 (raw file):

// assuming that the expression c on the left (the constraint) is
// true.
func applyConstraint(c *parser.ComparisonExpr, t *parser.ComparisonExpr) parser.Expr {

s/evaluate/simplify?


pkg/sql/index_selection.go, line 1571 at r1 (raw file):

}

// applyConstraintFlat is the common branch of applyConstraint().

[nit] would still help to explain what the arguments are. Also it would help if the datum and t were together and cOp and cdatum were together.


pkg/sql/index_selection.go, line 1576 at r1 (raw file):

) parser.Expr {
	switch cOp {
	case parser.EQ:

This code is very tedious, a mistake could easily sneak in (and I doubt that we have tests that cover all the combinations). I have an idea to avoid having 5*5 cases for the comparison ops (NE excluded). Feel free to ignore if though if you don't agree it helps.

We could represent a constraint as a closed interval on the integer axis. Say we pick -100 to represent -infinity, +100 as +infinity. 0 represents the smaller of cdatum and datum (both if they are equal), 10 represents the larger (unused if they are equal).

First set tx = 0 if datum <= cdatum, or 10 otherwise.

Then e.g. if t.Operator=GE, t corresponds to [tx, +100]. If it is LE, [-100, tx]. If it is GT, [tx+1, +100].

The other is similar, we just set cx = 0 if datum >= cdatum or 10 otherwise.

Then it's simply a matter of detecting if the two intervals are disjoint (return False) or if c is inside t (return True), which are very simple conditions.


pkg/sql/index_selection.go, line 1774 at r1 (raw file):

}

func (v *applyConstraintsVisitor) VisitPost(expr parser.Expr) parser.Expr {

VisitPost is usually next to VisitPre


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented Feb 4, 2017

Review status: 0 of 6 files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.


pkg/sql/index_selection.go, line 218 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] easier to read the flow if we do if it's false { return } and we don't need the else

Done.


pkg/sql/index_selection.go, line 1426 at r1 (raw file):

Previously, RaduBerinde wrote…

constraints

Done.


pkg/sql/index_selection.go, line 1464 at r1 (raw file):

Previously, RaduBerinde wrote…

this can just be case parser.LT, parser.LE,..:

Done.


pkg/sql/index_selection.go, line 1465 at r1 (raw file):

Previously, RaduBerinde wrote…

This is not true, it implies a <= x, e.g. (1, 2, 3) < (1, 2, 4). We need to fix the GT and LT cases. Also fix the function comment, plus a test would be good given that it wasn't caught.

Done.


pkg/sql/index_selection.go, line 1522 at r1 (raw file):

Previously, RaduBerinde wrote…

s/evaluate/simplify?

Done.


pkg/sql/index_selection.go, line 1571 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] would still help to explain what the arguments are. Also it would help if the datum and t were together and cOp and cdatum were together.

Done.


pkg/sql/index_selection.go, line 1576 at r1 (raw file):

Previously, RaduBerinde wrote…

This code is very tedious, a mistake could easily sneak in (and I doubt that we have tests that cover all the combinations). I have an idea to avoid having 5*5 cases for the comparison ops (NE excluded). Feel free to ignore if though if you don't agree it helps.

We could represent a constraint as a closed interval on the integer axis. Say we pick -100 to represent -infinity, +100 as +infinity. 0 represents the smaller of cdatum and datum (both if they are equal), 10 represents the larger (unused if they are equal).

First set tx = 0 if datum <= cdatum, or 10 otherwise.

Then e.g. if t.Operator=GE, t corresponds to [tx, +100]. If it is LE, [-100, tx]. If it is GT, [tx+1, +100].

The other is similar, we just set cx = 0 if datum >= cdatum or 10 otherwise.

Then it's simply a matter of detecting if the two intervals are disjoint (return False) or if c is inside t (return True), which are very simple conditions.

"Simply a matter of ..." hahaha :trollface:

Nevertheless the idea was good, so I implemented it.


pkg/sql/index_selection.go, line 1774 at r1 (raw file):

Previously, RaduBerinde wrote…

VisitPost is usually next to VisitPre

Done.


Comments from Reviewable

@knz knz force-pushed the propagate-more-index-filters branch 2 times, most recently from 9a3baa0 to 6bd530f Compare February 4, 2017 14:13
@RaduBerinde
Copy link
Member

Review status: 0 of 7 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


pkg/sql/index_selection.go, line 1576 at r1 (raw file):

Previously, knz (kena) wrote…

"Simply a matter of ..." hahaha :trollface:

Nevertheless the idea was good, so I implemented it.

I don't think my proposal quite made it across. An interval would (in the code, literally) be represented by two integers instead of datums and inclusive/exclusive flags. The actual datums are not important, just the relationship between them (cmp); so we represent them by two fixed values (0 and 10). Exclusivity is implemented by just adding or subtracting 1 from the integer (which is why I chose 0 and 10 and not 0 and 1). Possible values for integers are from the set: -100, -1, 0, +1, 9, 10, 11, 100.

I'll take an example. Say datum < cdatum. and both constraints are LT

tx=0 (as defined above); interval t would be [-100, -1]
cx=10; interval c would be [-100, 9]

Another example: datum > cdatum, t is LT, c is GE
tx=10; t is [-100, 9]
cx=0; c is [0,+100]

Once we have these intervals, yes, it's simply a matter of comparing the integers:

  • disjoint:c.start > t.end || t.start > end
  • included: t.start <= c.start && c.start <= t.end

The simplification is coming from the fact that we reduce the various combinations of cases (nil vs non-nil, inclusive vs exclusive) to a single case: inclusive intervals with comparable values at both ends.


Comments from Reviewable

@RaduBerinde
Copy link
Member

Review status: 0 of 7 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


pkg/sql/index_selection.go, line 1576 at r1 (raw file):

Previously, RaduBerinde wrote…

I don't think my proposal quite made it across. An interval would (in the code, literally) be represented by two integers instead of datums and inclusive/exclusive flags. The actual datums are not important, just the relationship between them (cmp); so we represent them by two fixed values (0 and 10). Exclusivity is implemented by just adding or subtracting 1 from the integer (which is why I chose 0 and 10 and not 0 and 1). Possible values for integers are from the set: -100, -1, 0, +1, 9, 10, 11, 100.

I'll take an example. Say datum < cdatum. and both constraints are LT

tx=0 (as defined above); interval t would be [-100, -1]
cx=10; interval c would be [-100, 9]

Another example: datum > cdatum, t is LT, c is GE
tx=10; t is [-100, 9]
cx=0; c is [0,+100]

Once we have these intervals, yes, it's simply a matter of comparing the integers:

  • disjoint:c.start > t.end || t.start > end
  • included: t.start <= c.start && c.start <= t.end

The simplification is coming from the fact that we reduce the various combinations of cases (nil vs non-nil, inclusive vs exclusive) to a single case: inclusive intervals with comparable values at both ends.

Something along the lines of:

makeInterval := func(op Operator, largerDatum bool) (bool, int, int) {
  x := 0
  if largerDatum {
    x = 10
  }

  switch op {
  case parser.EQ
    return true, x, x
  case parser.LE:
    return true, -100, x
  case parser.LT:
    return true, -100, x-1
  case parser.GE:
    return true, x, 100
  case parser.GT:
    return true, x+1, 100
  case parser.NE:
    return false, x, x
  default:
    return false, 0, 0
  }
}

cmp := datum.Compare(cDatum)
tOk, tStart, tEnd := makeInterval(t.Operator, cmp > 0)
cOk, cStart, cEnd := makeInterval(c.Operator, cmp < 0)
if tOk && cOk {
  if cStart > tEnd || tStart > cEnd {
    return MadeDBool(false)
  }
  if tStart <= cStart && cEnd <= tEnd {
    return MakeDBool(true)
  }
  return t
}
// BOOM: just handled 5*5 combinations of operators!
// If one is NE and the other is an interval, we can still use the intervals
// (hence the NE case above).
if tOk && c.Operator == NE {
  if cStart < tStart || cStart > tEnd {
    return MakeDBool(true)
  }
  return t
}

Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented Feb 4, 2017

Review status: 0 of 7 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


pkg/sql/index_selection.go, line 1576 at r1 (raw file):

Previously, RaduBerinde wrote…

Something along the lines of:

makeInterval := func(op Operator, largerDatum bool) (bool, int, int) {
  x := 0
  if largerDatum {
    x = 10
  }

  switch op {
  case parser.EQ
    return true, x, x
  case parser.LE:
    return true, -100, x
  case parser.LT:
    return true, -100, x-1
  case parser.GE:
    return true, x, 100
  case parser.GT:
    return true, x+1, 100
  case parser.NE:
    return false, x, x
  default:
    return false, 0, 0
  }
}

cmp := datum.Compare(cDatum)
tOk, tStart, tEnd := makeInterval(t.Operator, cmp > 0)
cOk, cStart, cEnd := makeInterval(c.Operator, cmp < 0)
if tOk && cOk {
  if cStart > tEnd || tStart > cEnd {
    return MadeDBool(false)
  }
  if tStart <= cStart && cEnd <= tEnd {
    return MakeDBool(true)
  }
  return t
}
// BOOM: just handled 5*5 combinations of operators!
// If one is NE and the other is an interval, we can still use the intervals
// (hence the NE case above).
if tOk && c.Operator == NE {
  if cStart < tStart || cStart > tEnd {
    return MakeDBool(true)
  }
  return t
}

Now I understand how it works, thanks for the explanation and the example code. I changed the patch accordingly.
However I am not sure I would have arrived to this myself. Pray, where did you get this idea?


Comments from Reviewable

@knz knz force-pushed the propagate-more-index-filters branch from 6bd530f to c63218b Compare February 4, 2017 20:57
@RaduBerinde
Copy link
Member

Review status: 0 of 7 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


pkg/sql/index_selection.go, line 1576 at r1 (raw file):

Previously, knz (kena) wrote…

Now I understand how it works, thanks for the explanation and the example code. I changed the patch accordingly.
However I am not sure I would have arrived to this myself. Pray, where did you get this idea?

Thanks! I'll take a final look over the entire change by Monday.

Haha, I don't know how I came up with it.. I just hate formulas that have cornercases. I prefer to first apply some transformation if it eliminates checking for special cases. Some examples:

  • whenever I did anything involving computational geometry, I would always choose formulas that don't have special cases like division by zero (e.g. for a line I always used ax+by+c=0 or x=x0+t*dx; y=y0+t*dy instead of y=mx+b)
  • if a point was in a bad place for an algorithm (e.g. origin), I'd first translate all points rather than have a special case
  • if I'm doing a dynamic programming formula, I always fill in the 0th row and column with proper values instead of having ifs in the formula
  • if I'm doing some kind of "fill" on a 2d grid, I surround it with whatever values would prevent the traversal from continuing, rather than checking for the edges.

pkg/sql/index_selection.go, line 1711 at r2 (raw file):

	//
	// We represent a constraint as a closed interval on the integer
	// axis. Say we pick -100 to represent -infinity, +100 as

I wrote down something that might be better for this comment (we can leave out some of the details that can be easily gleamed from the code, and just explain the approach):

We map a few interesting points of the datum space to the integer axis as
follows:

                  /- the smaller between datum and cdatum
                  |
 -infinity        |      /- the larger between datum and cdatum
     |            |      |
     |            |      |            /- +infinity
     v            v      v            v
     |------------|------|------------|
   -100           0      10          100

If the datums are equal, they both map to 0.

Constraints become closed intervals on this axis. For example:
a "GT" constraint with the smaller datum becomes [0, 100].  For exclusive
constraints we add or subtract 1 from the datum value: a "GE" constraint with
the smaller datum becomes [1, 100]; a "LE" constraint becomes [-100, -1].

Checking for overlap between the constraints then becomes equivalent to
checking for overlap between the closed intervals (which is easy).

Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented Feb 7, 2017

@nvanbenschoten for checking whether I'm not doing anything here that conceptually conflicts with #13444.

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Feb 7, 2017

Reviewed 1 of 6 files at r1, 2 of 4 files at r2.
Review status: 3 of 7 files reviewed at latest revision, 16 unresolved discussions, all commit checks successful.


pkg/sql/index_selection.go, line 210 at r2 (raw file):

	if s.filter != nil {
		// Constraint propagation may have produced new constant sub-expressions.
		// Propagate them.

nit: Extend Propagate them. to something like Propagate them and check if filter can be applied prematurely.


pkg/sql/index_selection.go, line 1448 at r2 (raw file):

			return a
		}
		vals := *c.Right.(*parser.DTuple)

After #13444 goes in this will have to be vals := c.Right.(*parser.DTuple).D. Same throughout these changes. That said, nothing here conceptually conflicts and neither of the new normalization state flags will need to be considered.


pkg/sql/index_selection.go, line 1525 at r2 (raw file):

	switch t := expr.(type) {
	case *parser.AndExpr:
		if t.Left == parser.DBoolTrue && t.Right == parser.DBoolTrue {

You passed the Fizzbuzz test! 👏


pkg/sql/index_selection.go, line 1592 at r2 (raw file):

// [x OP datum] assuming that the expression [x cOp cdatum] (the
// constraint) is true.
func applyConstraintFlat(

nit: it's a little confusing that between applyConstraint and applyConstraintFlat the "truthy" expression switches places.


pkg/sql/index_selection.go, line 1711 at r2 (raw file):

Previously, RaduBerinde wrote…

I wrote down something that might be better for this comment (we can leave out some of the details that can be easily gleamed from the code, and just explain the approach):

We map a few interesting points of the datum space to the integer axis as
follows:

                  /- the smaller between datum and cdatum
                  |
 -infinity        |      /- the larger between datum and cdatum
     |            |      |
     |            |      |            /- +infinity
     v            v      v            v
     |------------|------|------------|
   -100           0      10          100

If the datums are equal, they both map to 0.

Constraints become closed intervals on this axis. For example:
a "GT" constraint with the smaller datum becomes [0, 100].  For exclusive
constraints we add or subtract 1 from the datum value: a "GE" constraint with
the smaller datum becomes [1, 100]; a "LE" constraint becomes [-100, -1].

Checking for overlap between the constraints then becomes equivalent to
checking for overlap between the closed intervals (which is easy).

Love the diagrams!


pkg/sql/index_selection.go, line 1734 at r2 (raw file):

	cOk, cStart, cEnd := makeInterval(cOp, cmp < 0)
	if tOk && cOk {
		if cStart > tEnd || tStart > cEnd {

Let's pull these two conditions into disjoint and overlapping booleans (with little diagrams if you're feeling artistic). Then at least disjoint can be shared with the NE/IsNot case.


pkg/sql/index_selection.go, line 1743 at r2 (raw file):

	}
	// BOOM: just handled 5*5 combinations of operators!
	// If one is NE and the other is an interval, we can still use the intervals

nit: qualify which "one is NE", since this matters


pkg/sql/index_selection.go, line 1744 at r2 (raw file):

	// BOOM: just handled 5*5 combinations of operators!
	// If one is NE and the other is an interval, we can still use the intervals
	// (hence the NE case above).

s/above/below/


pkg/sql/index_selection.go, line 1745 at r2 (raw file):

	// If one is NE and the other is an interval, we can still use the intervals
	// (hence the NE case above).
	if tOk && (cOp == parser.NE || cOp == parser.IsNot) {

I must be visualizing this incorrectly, but I feel like this should be if cOk && (t.Operator == parser.NE || t.Operator == parser.IsNot). We're assuming that cOp is true, so if cOp requires that x != 5, then we can't simplify x < 3 to true. Meanwhile, if cOp requires that x < 3, then we can simplify x != 5 to true. Where am I getting tripped up?


pkg/sql/index_selection.go, line 1755 at r2 (raw file):

// makeInterval supports the range comparison in applyConstraintsFlat above.
// See the comment at the point of call for more details.
func makeInterval(op parser.ComparisonOperator, largerDatum bool) (bool, int, int) {

nit: is the boolean return value documented anywhere?

double nit: usually ok flags are the last return value

triple nit: makeInterval could benefit from a better name. Perhaps makeComparisonInterval


Comments from Reviewable

@RaduBerinde
Copy link
Member

pkg/sql/index_selection.go, line 1745 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I must be visualizing this incorrectly, but I feel like this should be if cOk && (t.Operator == parser.NE || t.Operator == parser.IsNot). We're assuming that cOp is true, so if cOp requires that x != 5, then we can't simplify x < 3 to true. Meanwhile, if cOp requires that x < 3, then we can simplify x != 5 to true. Where am I getting tripped up?

I definitely got confused as to which is which, maybe we need to give them better names.


Comments from Reviewable

@knz knz force-pushed the propagate-more-index-filters branch from c63218b to ec76cb9 Compare February 9, 2017 21:43
@knz
Copy link
Contributor Author

knz commented Feb 9, 2017

Quite a few things to fix with the rebase and the bug y'all have found, but I got there in the end. PTAL!


Review status: 3 of 7 files reviewed at latest revision, 16 unresolved discussions, some commit checks failed.


pkg/sql/index_selection.go, line 210 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: Extend Propagate them. to something like Propagate them and check if filter can be applied prematurely.

Done.


pkg/sql/index_selection.go, line 1448 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

After #13444 goes in this will have to be vals := c.Right.(*parser.DTuple).D. Same throughout these changes. That said, nothing here conceptually conflicts and neither of the new normalization state flags will need to be considered.

Done.


pkg/sql/index_selection.go, line 1592 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: it's a little confusing that between applyConstraint and applyConstraintFlat the "truthy" expression switches places.

Agreed, fixed.


pkg/sql/index_selection.go, line 1711 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Love the diagrams!

Done.


pkg/sql/index_selection.go, line 1734 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's pull these two conditions into disjoint and overlapping boolean (with little diagrams if you're feeling artistic). Then at least disjoint can be shared with the NE/IsNot case.

done


pkg/sql/index_selection.go, line 1743 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: qualify which "one is NE", since this matters

Done.


pkg/sql/index_selection.go, line 1744 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/above/below/

Done.


pkg/sql/index_selection.go, line 1745 at r2 (raw file):

Previously, RaduBerinde wrote…

I definitely got confused as to which is which, maybe we need to give them better names.

Done. Also added a test, since this was not caught earlier.


pkg/sql/index_selection.go, line 1755 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: is the boolean return value documented anywhere?

double nit: usually ok flags are the last return value

triple nit: makeInterval could benefit from a better name. Perhaps makeComparisonInterval

Done.


Comments from Reviewable

@RaduBerinde
Copy link
Member

:lgtm:


Review status: 1 of 6 files reviewed at latest revision, 16 unresolved discussions, some commit checks pending.


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented Feb 9, 2017

Ok let's pray this is not going to break everything. TFYR!

@knz knz merged commit ba56550 into cockroachdb:master Feb 9, 2017
@knz knz deleted the propagate-more-index-filters branch February 9, 2017 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants