New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inverted: fix intersection of SpanExpressions in some cases #118256
Conversation
66430bf
to
c1fec83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/inverted/expression.go
line 703 at r1 (raw file):
if len(right.FactoredUnionSpans) == 0 && right.Operator == None && !trivialIntersectionWithSelf { right.FactoredUnionSpans = origRightSpans }
I think this is going to cause a regression. What if you leave this function as-is, but change tryPruneChildren
to the following:
// tryPruneChildren takes an expr with two child *SpanExpression and removes children when
// safe to do so.
func tryPruneChildren(expr *SpanExpression, op SetOperator) {
isEmptyExpr := func(e *SpanExpression) bool {
return len(e.FactoredUnionSpans) == 0 && e.Left == nil && e.Right == nil
}
if isEmptyExpr(expr.Left.(*SpanExpression)) {
expr.Left = nil
}
if isEmptyExpr(expr.Right.(*SpanExpression)) {
expr.Right = nil
}
if expr.Operator == SetUnion {
// Promotes the left and right sub-expressions of child to the parent expr, when
// the other child is empty.
promoteChild := func(child *SpanExpression) {
// For SetUnion, the FactoredUnionSpans for the child is already nil since
// it has been unioned into expr. Therefore, we don't need to update
// expr.FactoredUnionSpans.
expr.Operator = child.Operator
expr.Left = child.Left
expr.Right = child.Right
// If child.FactoredUnionSpans is non-empty, we need to recalculate
// SpansToRead since it may have contained some spans that were removed by
// discarding child.FactoredUnionSpans.
if child.FactoredUnionSpans != nil {
expr.SpansToRead = expr.FactoredUnionSpans
if expr.Left != nil {
expr.SpansToRead = unionSpans(expr.SpansToRead, expr.Left.(*SpanExpression).SpansToRead)
}
if expr.Right != nil {
expr.SpansToRead = unionSpans(expr.SpansToRead, expr.Right.(*SpanExpression).SpansToRead)
}
}
}
promoteLeft := expr.Left != nil && expr.Right == nil
promoteRight := expr.Left == nil && expr.Right != nil
if promoteLeft {
promoteChild(expr.Left.(*SpanExpression))
}
if promoteRight {
promoteChild(expr.Right.(*SpanExpression))
}
} else if expr.Operator == SetIntersection {
// The result of intersecting with the empty set is the empty set. In this
// case, we can discard the non-empty child.
if expr.Left == nil {
expr.Right = nil
} else if expr.Right == nil {
expr.Left = nil
}
}
if expr.Left == nil && expr.Right == nil {
expr.Operator = None
expr.SpansToRead = expr.FactoredUnionSpans
}
}
This addition to pkg/sql/inverted/testdata/expression
shows that it seems to work:
new-span-leaf name=a tight=true unique=true span=a
----
span expression
├── tight: true, unique: true
├── to read: ["a", "a"]
└── union spans: ["a", "a"]
new-span-leaf name=c tight=true unique=true span=c
----
span expression
├── tight: true, unique: true
├── to read: ["c", "c"]
└── union spans: ["c", "c"]
and result=b-and-c left=b right=c
----
span expression
├── tight: true, unique: true
├── to read: ["b", "d")
├── union spans: empty
└── INTERSECTION
├── span expression
│ ├── tight: true, unique: true
│ ├── to read: ["b", "b"]
│ └── union spans: ["b", "b"]
└── span expression
├── tight: true, unique: true
├── to read: ["c", "c"]
└── union spans: ["c", "c"]
or result=a-or-b-and-c left=a right=b-and-c
----
span expression
├── tight: true, unique: false
├── to read: ["a", "d")
├── union spans: ["a", "a"]
└── INTERSECTION
├── span expression
│ ├── tight: true, unique: true
│ ├── to read: ["b", "b"]
│ └── union spans: ["b", "b"]
└── span expression
├── tight: true, unique: true
├── to read: ["c", "c"]
└── union spans: ["c", "c"]
and result=a-and-a-or-b-and-c left=a right=a-or-b-and-c
----
span expression
├── tight: true, unique: false
├── to read: ["a", "a"]
└── union spans: ["a", "a"]
ac6f570
to
7084cb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/inverted/expression.go
line 703 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think this is going to cause a regression. What if you leave this function as-is, but change
tryPruneChildren
to the following:// tryPruneChildren takes an expr with two child *SpanExpression and removes children when // safe to do so. func tryPruneChildren(expr *SpanExpression, op SetOperator) { isEmptyExpr := func(e *SpanExpression) bool { return len(e.FactoredUnionSpans) == 0 && e.Left == nil && e.Right == nil } if isEmptyExpr(expr.Left.(*SpanExpression)) { expr.Left = nil } if isEmptyExpr(expr.Right.(*SpanExpression)) { expr.Right = nil } if expr.Operator == SetUnion { // Promotes the left and right sub-expressions of child to the parent expr, when // the other child is empty. promoteChild := func(child *SpanExpression) { // For SetUnion, the FactoredUnionSpans for the child is already nil since // it has been unioned into expr. Therefore, we don't need to update // expr.FactoredUnionSpans. expr.Operator = child.Operator expr.Left = child.Left expr.Right = child.Right // If child.FactoredUnionSpans is non-empty, we need to recalculate // SpansToRead since it may have contained some spans that were removed by // discarding child.FactoredUnionSpans. if child.FactoredUnionSpans != nil { expr.SpansToRead = expr.FactoredUnionSpans if expr.Left != nil { expr.SpansToRead = unionSpans(expr.SpansToRead, expr.Left.(*SpanExpression).SpansToRead) } if expr.Right != nil { expr.SpansToRead = unionSpans(expr.SpansToRead, expr.Right.(*SpanExpression).SpansToRead) } } } promoteLeft := expr.Left != nil && expr.Right == nil promoteRight := expr.Left == nil && expr.Right != nil if promoteLeft { promoteChild(expr.Left.(*SpanExpression)) } if promoteRight { promoteChild(expr.Right.(*SpanExpression)) } } else if expr.Operator == SetIntersection { // The result of intersecting with the empty set is the empty set. In this // case, we can discard the non-empty child. if expr.Left == nil { expr.Right = nil } else if expr.Right == nil { expr.Left = nil } } if expr.Left == nil && expr.Right == nil { expr.Operator = None expr.SpansToRead = expr.FactoredUnionSpans } }
This addition to
pkg/sql/inverted/testdata/expression
shows that it seems to work:new-span-leaf name=a tight=true unique=true span=a ---- span expression ├── tight: true, unique: true ├── to read: ["a", "a"] └── union spans: ["a", "a"] new-span-leaf name=c tight=true unique=true span=c ---- span expression ├── tight: true, unique: true ├── to read: ["c", "c"] └── union spans: ["c", "c"] and result=b-and-c left=b right=c ---- span expression ├── tight: true, unique: true ├── to read: ["b", "d") ├── union spans: empty └── INTERSECTION ├── span expression │ ├── tight: true, unique: true │ ├── to read: ["b", "b"] │ └── union spans: ["b", "b"] └── span expression ├── tight: true, unique: true ├── to read: ["c", "c"] └── union spans: ["c", "c"] or result=a-or-b-and-c left=a right=b-and-c ---- span expression ├── tight: true, unique: false ├── to read: ["a", "d") ├── union spans: ["a", "a"] └── INTERSECTION ├── span expression │ ├── tight: true, unique: true │ ├── to read: ["b", "b"] │ └── union spans: ["b", "b"] └── span expression ├── tight: true, unique: true ├── to read: ["c", "c"] └── union spans: ["c", "c"] and result=a-and-a-or-b-and-c left=a right=a-or-b-and-c ---- span expression ├── tight: true, unique: false ├── to read: ["a", "a"] └── union spans: ["a", "a"]
Thanks, this fix makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/inverted/testdata/expression
line 592 at r2 (raw file):
<nil> new-span-leaf name=a tight=true unique=true span=a
nit: maybe add a note about what this series of things is testing and that it's a regression test
This commit fixes the logic for intersecting two SpanExpressions in some cases. In particular, consider the following example: - left expr is `ARRAY['str1'] <@ t.links` which translates to a SpanExpression in which only FactoredUnionSpans is set to the span containing `str1` - right expr is `ARRAY ['str1'] && t.links OR ...` which translates to a SpanExpression in which FactoredUnionSpans is set to exactly same span containing `str1` plus some other stuff in children expressions. In other words, we have `a AND (a OR b)`. When intersecting SpanExpressions, we intersect their FactoredUnionSpans, and then we update FactoredUnionSpans of expressions to subtract the shared ones. Previously, in the example above this would result in making the left expression empty, so it would be pruned. However, that is incorrect - we have the equivalent of `left AND right`, so we must ensure that `left` expression is satisfied, and previously this wouldn't be the case. The fix is that if one of the children expressions became empty, then intersection of two children is also empty, so instead of promoting non-empty child we now nil non-empty child out. In the example above, previously we would get `a OR b`, and now we'll correctly get `a`. Release note (bug fix): Previously, in some cases CockroachDB could incorrectly evaluate queries that scanned an inverted index and had a `WHERE` filter in which two sides of the `AND` expression had "similar" expressions (e.g. `ARRAY['str1'] <@ col AND (ARRAY['str1'] && col OR ...)`), and this is now fixed. The bug has been present since pre-22.2 version.
7084cb8
to
970f45d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build succeeded: |
This commit fixes the logic for intersecting two SpanExpressions in some cases. In particular, consider the following example:
ARRAY['str1'] <@ t.links
which translates to a SpanExpression in which only FactoredUnionSpans is set to the span containingstr1
ARRAY ['str1'] && t.links OR ...
which translates to a SpanExpression in which FactoredUnionSpans is set to exactly same span containingstr1
plus some other stuff in children expressions.In other words, we have
a AND (a OR b)
.When intersecting SpanExpressions, we intersect their FactoredUnionSpans, and then we update FactoredUnionSpans of expressions to subtract the shared ones. Previously, in the example above this would result in making the left expression empty, so it would be pruned. However, that is incorrect - we have the equivalent of
left AND right
, so we must ensure thatleft
expression is satisfied, and previously this wouldn't be the case. The fix is that if one of the children expressions became empty, then intersection of two children is also empty, so instead of promoting non-empty child we now nil non-empty child out. In the example above, previously we would geta OR b
, and now we'll correctly geta
.Fixes: #117979.
Release note (bug fix): Previously, in some cases CockroachDB could incorrectly evaluate queries that scanned an inverted index and had a
WHERE
filter in which two sides of theAND
expression had "similar" expressions (e.g.ARRAY['str1'] <@ col AND (ARRAY['str1'] && col OR ...)
), and this is now fixed. The bug has been present since pre-22.2 version.