Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix panic by rewriting how we move filter conditions up. #1570

Merged
merged 6 commits into from
Feb 1, 2023

Conversation

PavelSafronov
Copy link
Contributor

Fix panic by rewriting how we move filter conditions up.

Fixes dolthub/dolt#5214

Copy link
Contributor

@max-hoffman max-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried joins can still be tree roots, but otherwise mostly good and this helps eliminate redundant filters as a nice side-effect

return transform.NodeWithCtx(node, func(transform.Context) bool { return true }, func(c transform.Context) (sql.Node, transform.TreeIdentity, error) {
if c.Node != topJoin {
return c.Node, transform.SameTree, nil
resultNode, resultIdentity, err := transform.Node(node, func(n sql.Node) (sql.Node, transform.TreeIdentity, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this creates a CROSS_JOIN at the tree root in testing for me. i suspect the filter would be dropped with the new change

> explain select * from xy a cross join xy b on a.x = 2;
+-------------------------------------+
| plan                                |
+-------------------------------------+
| CrossJoin                           |
|  ├─ Filter                          |
|  │   ├─ (a.x = 2)                   |
|  │   └─ TableAlias(a)               |
|  │       └─ IndexedTableAccess(xy)  |
|  │           ├─ index: [xy.x]       |
|  │           ├─ filters: [{[2, 2]}] |
|  │           └─ columns: [x y]      |
|  └─ Exchange                        |
|      └─ TableAlias(b)               |
|          └─ Table                   |
|              ├─ name: xy            |
|              └─ columns: [x y]      |
+-------------------------------------+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could check if node == topJoin before doing the tree search for that case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also git checkout main -- sql/analyzer/optimization_rules_test.go in that case also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I add the node == topJoin test, we still get the panic from populateSubgraph, even though this seems like the correct action. Investigating this in more detail, we might be trying to fix the wrong issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also git checkout main -- sql/analyzer/optimization_rules_test.go in that case also

Sorry, not following yet, what does this do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if topJoin can be the tree root, the changes the unit tests are unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could check if node == topJoin before doing the tree search for that case

This causes a break in a different portion of the code. I suspect one of the downstream rules makes certain assumption about the query plan, and the node == topJoin approach violates one of those assumption. As discussed offline a bit, I'm keeping the PR as-is and we'll just use this approach of updating the topJoin parent, instead of updating the topJoin itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concerns are correctness related, I don't mind how it happens but CROSS_JOIN root nodes can't lose filters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concerns are correctness related, I don't mind how it happens but CROSS_JOIN root nodes can't lose filters.

Absolutely makes sense.
Here's the result of the explain select... you mentioned earlier, with this fix in place:

+-------------------------------------+
| plan                                |
+-------------------------------------+
| CrossJoin                           |
|  ├─ Filter                          |
|  │   ├─ (a.x = 2)                   |
|  │   └─ TableAlias(a)               |
|  │       └─ IndexedTableAccess(xy)  |
|  │           ├─ index: [xy.x]       |
|  │           ├─ filters: [{[2, 2]}] |
|  │           └─ columns: [x y]      |
|  └─ Exchange                        |
|      └─ TableAlias(b)               |
|          └─ Table                   |
|              ├─ name: xy            |
|              └─ columns: [x y]      |
+-------------------------------------+

The filter isn't being dropped. I can add this particular join query to our tests as well.

return n, transform.SameTree, nil
}

switch imp := n.(type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
switch imp := n.(type) {
switch n := n.(type) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels really strange to reassign n like this, but the scoping seems to work out, so going with this approach.

return plan.NewFilter(
expression.JoinAnd(append([]sql.Expression{n.Expression}, nonJoinFilters...)...),
n.Child), transform.NewTree, nil
newExpression := expression.JoinAnd(append([]sql.Expression{imp.Expression}, nonJoinFilters...)...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be a bit shorter

Suggested change
newExpression := expression.JoinAnd(append([]sql.Expression{imp.Expression}, nonJoinFilters...)...)
newExpression := expression.JoinAnd(append(nonJoinFilters, n.Expression)...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorter, and we're also not expanding nonJoinFilters unnecessarily. Changing.

}
})

return resultNode, resultIdentity, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you don't use these vars, maybe return the transform result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this is testing/debug code, removing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are coming back. I'm adding a check after the transform.Node call to ensure that nonJoinFilters (which we've removed from the tree) are actually used and are properly added back to the tree. Wouldn't want to just remove these and drop them on the floor.

PavelSafronov and others added 2 commits February 1, 2023 11:20
- renamed argument in switch
- simplified joining of expression to nonJoinFilters
- added a panic that gets called if the removed nonJoinFilters are not re-added to the tree
}
})

if len(nonJoinFilters) > 0 {
panic("nonJoinFilters should have been used and cleared out")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the event we forget to apply filters, we should maybe return an error rather than panicking. I have not always been consistent about this in the past. We try to organize errors into sql/errors.go but we are also inconsistent about that in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, will add an entry in sql/errors.go and will return it here.

Copy link
Contributor

@max-hoffman max-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we've talked through the last couple of changes and the panic is my last nit. Thanks for being patient through several iterations.

- add an explicit `node == topJoin` check to handle the case that there is nothing that contains topJoin (if it's literally the top of the tree)
- reverted all optimization_rules_test.go changes, which are specifically testing the case where the join is the top of the tree
- replaced `panic` with an error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dolt panic during JOIN
2 participants