Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inverted: fix intersection of SpanExpressions in some cases #118256

Merged
merged 1 commit into from Jan 26, 2024

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jan 24, 2024

This commit fixes the logic for intersecting two SpanExpressions in some cases. In particular, consider the following example:

  • left expr is ARRAY['str1'] <@ t.links which translates to a SpanExpression in which only FactoredUnionSpans is set to the span containing str1
  • right expr is ARRAY ['str1'] && t.links OR ... which translates to a SpanExpression in which FactoredUnionSpans is set to exactly same span containing str1 plus some other stuff in children expressions.

In other words, we have a AND (a OR b).

When intersecting SpanExpressions, we intersect their FactoredUnionSpans, and then we update FactoredUnionSpans of expressions to subtract the shared ones. Previously, in the example above this would result in making the left expression empty, so it would be pruned. However, that is incorrect - we have the equivalent of left AND right, so we must ensure that left expression is satisfied, and previously this wouldn't be the case. The fix is that if one of the children expressions became empty, then intersection of two children is also empty, so instead of promoting non-empty child we now nil non-empty child out. In the example above, previously we would get a OR b, and now we'll correctly get a.

Fixes: #117979.

Release note (bug fix): Previously, in some cases CockroachDB could incorrectly evaluate queries that scanned an inverted index and had a WHERE filter in which two sides of the AND expression had "similar" expressions (e.g. ARRAY['str1'] <@ col AND (ARRAY['str1'] && col OR ...)), and this is now fixed. The bug has been present since pre-22.2 version.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. labels Jan 24, 2024
@yuzefovich yuzefovich force-pushed the inverted-fix branch 2 times, most recently from 66430bf to c1fec83 Compare January 24, 2024 01:36
@yuzefovich yuzefovich marked this pull request as ready for review January 24, 2024 02:17
@yuzefovich yuzefovich requested a review from a team as a code owner January 24, 2024 02:17
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/inverted/expression.go line 703 at r1 (raw file):

		if len(right.FactoredUnionSpans) == 0 && right.Operator == None && !trivialIntersectionWithSelf {
			right.FactoredUnionSpans = origRightSpans
		}

I think this is going to cause a regression. What if you leave this function as-is, but change tryPruneChildren to the following:

// tryPruneChildren takes an expr with two child *SpanExpression and removes children when
// safe to do so.  
func tryPruneChildren(expr *SpanExpression, op SetOperator) {  
  isEmptyExpr := func(e *SpanExpression) bool {  
    return len(e.FactoredUnionSpans) == 0 && e.Left == nil && e.Right == nil  
  }  
  if isEmptyExpr(expr.Left.(*SpanExpression)) {  
    expr.Left = nil  
  }  
  if isEmptyExpr(expr.Right.(*SpanExpression)) {  
    expr.Right = nil  
  }  
  if expr.Operator == SetUnion {  
    // Promotes the left and right sub-expressions of child to the parent expr, when  
    // the other child is empty.  
    promoteChild := func(child *SpanExpression) {  
      // For SetUnion, the FactoredUnionSpans for the child is already nil since  
      // it has been unioned into expr. Therefore, we don't need to update  
      // expr.FactoredUnionSpans.  
      expr.Operator = child.Operator  
      expr.Left = child.Left  
      expr.Right = child.Right  
  
      // If child.FactoredUnionSpans is non-empty, we need to recalculate  
      // SpansToRead since it may have contained some spans that were removed by  
      // discarding child.FactoredUnionSpans.  
      if child.FactoredUnionSpans != nil {  
        expr.SpansToRead = expr.FactoredUnionSpans  
        if expr.Left != nil {  
          expr.SpansToRead = unionSpans(expr.SpansToRead, expr.Left.(*SpanExpression).SpansToRead)  
        }  
        if expr.Right != nil {  
          expr.SpansToRead = unionSpans(expr.SpansToRead, expr.Right.(*SpanExpression).SpansToRead)  
        }  
      }  
    }  
    promoteLeft := expr.Left != nil && expr.Right == nil  
    promoteRight := expr.Left == nil && expr.Right != nil  
    if promoteLeft {  
      promoteChild(expr.Left.(*SpanExpression))  
    }  
    if promoteRight {  
      promoteChild(expr.Right.(*SpanExpression))  
    }  
  } else if expr.Operator == SetIntersection {  
    // The result of intersecting with the empty set is the empty set. In this  
    // case, we can discard the non-empty child.  
    if expr.Left == nil {  
      expr.Right = nil  
    } else if expr.Right == nil {  
      expr.Left = nil  
    }  
  }  
  if expr.Left == nil && expr.Right == nil {  
    expr.Operator = None  
    expr.SpansToRead = expr.FactoredUnionSpans  
  }  
}

This addition to pkg/sql/inverted/testdata/expression shows that it seems to work:

new-span-leaf name=a tight=true unique=true span=a  
----  
span expression  
├── tight: true, unique: true  
├── to read: ["a", "a"]  
└── union spans: ["a", "a"]  
  
new-span-leaf name=c tight=true unique=true span=c  
----  
span expression  
├── tight: true, unique: true  
├── to read: ["c", "c"]  
└── union spans: ["c", "c"]  
  
and result=b-and-c left=b right=c  
----  
span expression  
├── tight: true, unique: true  
├── to read: ["b", "d")  
├── union spans: empty  
└── INTERSECTION  
├── span expression  
│ ├── tight: true, unique: true  
│ ├── to read: ["b", "b"]  
│ └── union spans: ["b", "b"]  
└── span expression  
├── tight: true, unique: true  
├── to read: ["c", "c"]  
└── union spans: ["c", "c"]  
  
or result=a-or-b-and-c left=a right=b-and-c  
----  
span expression  
├── tight: true, unique: false  
├── to read: ["a", "d")  
├── union spans: ["a", "a"]  
└── INTERSECTION  
├── span expression  
│ ├── tight: true, unique: true  
│ ├── to read: ["b", "b"]  
│ └── union spans: ["b", "b"]  
└── span expression  
├── tight: true, unique: true  
├── to read: ["c", "c"]  
└── union spans: ["c", "c"]  
  
and result=a-and-a-or-b-and-c left=a right=a-or-b-and-c  
----  
span expression  
├── tight: true, unique: false  
├── to read: ["a", "a"]  
└── union spans: ["a", "a"]

@yuzefovich yuzefovich force-pushed the inverted-fix branch 2 times, most recently from ac6f570 to 7084cb8 Compare January 26, 2024 00:52
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/inverted/expression.go line 703 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think this is going to cause a regression. What if you leave this function as-is, but change tryPruneChildren to the following:

// tryPruneChildren takes an expr with two child *SpanExpression and removes children when
// safe to do so.  
func tryPruneChildren(expr *SpanExpression, op SetOperator) {  
  isEmptyExpr := func(e *SpanExpression) bool {  
    return len(e.FactoredUnionSpans) == 0 && e.Left == nil && e.Right == nil  
  }  
  if isEmptyExpr(expr.Left.(*SpanExpression)) {  
    expr.Left = nil  
  }  
  if isEmptyExpr(expr.Right.(*SpanExpression)) {  
    expr.Right = nil  
  }  
  if expr.Operator == SetUnion {  
    // Promotes the left and right sub-expressions of child to the parent expr, when  
    // the other child is empty.  
    promoteChild := func(child *SpanExpression) {  
      // For SetUnion, the FactoredUnionSpans for the child is already nil since  
      // it has been unioned into expr. Therefore, we don't need to update  
      // expr.FactoredUnionSpans.  
      expr.Operator = child.Operator  
      expr.Left = child.Left  
      expr.Right = child.Right  
  
      // If child.FactoredUnionSpans is non-empty, we need to recalculate  
      // SpansToRead since it may have contained some spans that were removed by  
      // discarding child.FactoredUnionSpans.  
      if child.FactoredUnionSpans != nil {  
        expr.SpansToRead = expr.FactoredUnionSpans  
        if expr.Left != nil {  
          expr.SpansToRead = unionSpans(expr.SpansToRead, expr.Left.(*SpanExpression).SpansToRead)  
        }  
        if expr.Right != nil {  
          expr.SpansToRead = unionSpans(expr.SpansToRead, expr.Right.(*SpanExpression).SpansToRead)  
        }  
      }  
    }  
    promoteLeft := expr.Left != nil && expr.Right == nil  
    promoteRight := expr.Left == nil && expr.Right != nil  
    if promoteLeft {  
      promoteChild(expr.Left.(*SpanExpression))  
    }  
    if promoteRight {  
      promoteChild(expr.Right.(*SpanExpression))  
    }  
  } else if expr.Operator == SetIntersection {  
    // The result of intersecting with the empty set is the empty set. In this  
    // case, we can discard the non-empty child.  
    if expr.Left == nil {  
      expr.Right = nil  
    } else if expr.Right == nil {  
      expr.Left = nil  
    }  
  }  
  if expr.Left == nil && expr.Right == nil {  
    expr.Operator = None  
    expr.SpansToRead = expr.FactoredUnionSpans  
  }  
}

This addition to pkg/sql/inverted/testdata/expression shows that it seems to work:

new-span-leaf name=a tight=true unique=true span=a  
----  
span expression  
├── tight: true, unique: true  
├── to read: ["a", "a"]  
└── union spans: ["a", "a"]  
  
new-span-leaf name=c tight=true unique=true span=c  
----  
span expression  
├── tight: true, unique: true  
├── to read: ["c", "c"]  
└── union spans: ["c", "c"]  
  
and result=b-and-c left=b right=c  
----  
span expression  
├── tight: true, unique: true  
├── to read: ["b", "d")  
├── union spans: empty  
└── INTERSECTION  
├── span expression  
│ ├── tight: true, unique: true  
│ ├── to read: ["b", "b"]  
│ └── union spans: ["b", "b"]  
└── span expression  
├── tight: true, unique: true  
├── to read: ["c", "c"]  
└── union spans: ["c", "c"]  
  
or result=a-or-b-and-c left=a right=b-and-c  
----  
span expression  
├── tight: true, unique: false  
├── to read: ["a", "d")  
├── union spans: ["a", "a"]  
└── INTERSECTION  
├── span expression  
│ ├── tight: true, unique: true  
│ ├── to read: ["b", "b"]  
│ └── union spans: ["b", "b"]  
└── span expression  
├── tight: true, unique: true  
├── to read: ["c", "c"]  
└── union spans: ["c", "c"]  
  
and result=a-and-a-or-b-and-c left=a right=a-or-b-and-c  
----  
span expression  
├── tight: true, unique: false  
├── to read: ["a", "a"]  
└── union spans: ["a", "a"]

Thanks, this fix makes sense.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 5 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/inverted/testdata/expression line 592 at r2 (raw file):

<nil>

new-span-leaf name=a tight=true unique=true span=a

nit: maybe add a note about what this series of things is testing and that it's a regression test

This commit fixes the logic for intersecting two SpanExpressions in some
cases. In particular, consider the following example:
- left expr is `ARRAY['str1'] <@ t.links` which translates to
a SpanExpression in which only FactoredUnionSpans is set to the span
containing `str1`
- right expr is `ARRAY ['str1'] && t.links OR ...` which translates to
a SpanExpression in which FactoredUnionSpans is set to exactly same span
containing `str1` plus some other stuff in children expressions.

In other words, we have `a AND (a OR b)`.

When intersecting SpanExpressions, we intersect their
FactoredUnionSpans, and then we update FactoredUnionSpans of expressions
to subtract the shared ones. Previously, in the example above this would
result in making the left expression empty, so it would be pruned.
However, that is incorrect - we have the equivalent of `left AND right`,
so we must ensure that `left` expression is satisfied, and previously
this wouldn't be the case. The fix is that if one of the children
expressions became empty, then intersection of two children is also
empty, so instead of promoting non-empty child we now nil non-empty
child out. In the example above, previously we would get `a OR b`, and
now we'll correctly get `a`.

Release note (bug fix): Previously, in some cases CockroachDB could
incorrectly evaluate queries that scanned an inverted index and had
a `WHERE` filter in which two sides of the `AND` expression had
"similar" expressions (e.g.
`ARRAY['str1'] <@ col AND (ARRAY['str1'] && col OR ...)`), and this is
now fixed. The bug has been present since pre-22.2 version.
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

@craig
Copy link
Contributor

craig bot commented Jan 26, 2024

Build succeeded:

@craig craig bot merged commit ed5c092 into cockroachdb:master Jan 26, 2024
7 of 9 checks passed
@yuzefovich yuzefovich deleted the inverted-fix branch January 26, 2024 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WHERE clause on ARRAY incorrectly applied when GIN index exists
3 participants