Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #34450 -- Fixed multi-valued JOIN reuse when filtering by expressions. #16710

Closed
wants to merge 2 commits into from

Conversation

charettes
Copy link
Member

@charettes charettes commented Mar 31, 2023

Thanks Roman Odaisky for the report.


First commit addresses the issue, second one is an optimization for join promotion that was missing when passing expressions to filter().

tests/lookup/tests.py Outdated Show resolved Hide resolved
tests/lookup/tests.py Outdated Show resolved Hide resolved
Comment on lines +1539 to +1542
self.assertEqual(
Season.objects.get(Exact(F("games__home"), "NY") & Q(games__away="Boston")),
self.s1,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s make sure changing the order doesn’t affect anything the way it happened in the ticket

Suggested change
self.assertEqual(
Season.objects.get(Exact(F("games__home"), "NY") & Q(games__away="Boston")),
self.s1,
)
self.assertEqual(
Season.objects.get(Exact(F("games__home"), "NY") & Q(games__away="Boston")),
self.s1,
)
self.assertEqual(
Season.objects.get(Q(games__away="Boston") & Exact(F("games__home"), "NY")),
self.s1,
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@charettes
Copy link
Member Author

@rdaysky thanks for the suggestion but did you actually test the change to confirm they address the issue you reported?

Comment on lines +2014 to +2022
branch_negated=True,
current_negated=True,
can_reuse=can_reuse,
Copy link
Member Author

@charettes charettes Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These kwargs were originally passed along but didn't carry any meaning since passing an expression to build_filter didn't make use of them until now.

Not doing so causes queries.tests.ManyToManyExcludeTest failures with the optimization enabled.

Comment on lines +1375 to +1383
pre_joins = self.alias_refcount.copy()
condition = filter_expr.resolve_expression(
self, allow_joins=allow_joins, summarize=summarize
self, allow_joins=allow_joins, reuse=can_reuse, summarize=summarize
)
if not isinstance(condition, Lookup):
condition = self.build_lookup(["exact"], condition, True)
return WhereNode([condition], connector=AND), []
used_joins = {
k for k, v in self.alias_refcount.items() if v > pre_joins.get(k, 0)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pre_joins and used_joins logic is not novel, it's something used in the typical filtering by (lookup: str, value: Any) case below but was just never implemented in this case. I wouldn't be surprise if this related to other reported issues when doing filter(obj: Expression) calls.

tests/lookup/tests.py Outdated Show resolved Hide resolved
Comment on lines +1539 to +1542
self.assertEqual(
Season.objects.get(Exact(F("games__home"), "NY") & Q(games__away="Boston")),
self.s1,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

}
return (
WhereNode([condition], connector=AND),
used_joins if not current_negated else (),
Copy link
Member Author

@charettes charettes Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The require_outer logic below is more complex so I think we'll want to add move coverage around usage of things like IsNull(expr, True) as well.

This logic is not flawless though as ticket-25946 demonstrates.

Comment on lines +1384 to +1388
require_outer = current_negated or (
isinstance(condition, IsNull)
and condition.rhs is True
and not current_negated
)
Copy link
Member Author

@charettes charettes Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can get this logic any smarter until we add proper support for Expression.nullable which is the same reason why ticket-25946 is blocked for lookup transforms and ticket-32398 is blocked for expression references.

Copy link
Member Author

@charettes charettes Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per Anssi's explanation on the topic

Currently, the only lookup that can match null values is isnull. However, it seems plausible that custom lookups can be used to create other lookups that match null values. This will be immediately obvious when expressions are allowed in the filter() clause. For example, .filter(Equal(Coalesce('favourite_book__title', 'first_book__title'), 'Foo')) must use left outer joins for the joins, as otherwise an author that has favourite_book with title
'Foo', but no first book would not be matched.

I think that's a strong enough rationale on why we can't merge the second commit here unfortunately.

In order to add proper support for this optimization we'd need to add a way to let the join promotion logic know that an expression has special null-handling semantics (e.g. IsNull, Coalesce) and have it taint expressions that contain them like we do with contains_aggregate and similar attributes.

Until we add support for that we'll have to live with LEFT OUTER joins when using expressions just like we do when filtering against annotations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's remove the 2nd commit.

Copy link
Member

@felixxm felixxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charettes Thanks 👍 Let's remove the 2nd commit and we're ready to go.

Comment on lines +1384 to +1388
require_outer = current_negated or (
isinstance(condition, IsNull)
and condition.rhs is True
and not current_negated
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's remove the 2nd commit.

Comment on lines 1360 to 1362
cls.g1 = Game.objects.create(season=cls.s1, home="NY", away="Boston")
cls.g2 = Game.objects.create(season=cls.s1, home="NY", away="Tampa")
cls.g3 = Game.objects.create(season=cls.s3, home="Boston", away="Tampa")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cls.g1 = Game.objects.create(season=cls.s1, home="NY", away="Boston")
cls.g2 = Game.objects.create(season=cls.s1, home="NY", away="Tampa")
cls.g3 = Game.objects.create(season=cls.s3, home="Boston", away="Tampa")
Game.objects.create(season=cls.s1, home="NY", away="Boston")
Game.objects.create(season=cls.s1, home="NY", away="Tampa")
Game.objects.create(season=cls.s3, home="Boston", away="Tampa")

…ssions.

Thanks Roman Odaisky for the report.
This required adjusting changes made in 8593e16 (refs #32143) to pass former
build_filter() negation and reusable joins kwargs to ensure proper join
promotion to outer on multi-valued exclusions.
@charettes
Copy link
Member Author

charettes commented Apr 4, 2023

@felixxm I've pushed an adjusted first commit but kept the second one in the PR for archival purposes since it might provide historical context on why some sub-optimal LEFT JOINs are generated when using lookups directly.

Is that okay with you if we only merge the first commit and close the PR afterwards? I can create a Trac issue for it and link it to ticket-25946 and ticket-32398 since they all relate to each other otherwise.

@felixxm
Copy link
Member

felixxm commented Apr 4, 2023

@felixxm I've pushed an adjusted first commit but kept the second one in the PR for archival purposes since it might provide historical context on why some sub-optimal LEFT JOINs are generated when using lookups directly.

Is that okay with you if we only merge the first commit and close the PR afterwards? I can create a Trac issue for it and link it to ticket-25946 and ticket-32398 since they all relate to each other otherwise.

Yes, it works for me. I opened a separate PR (#16721) to double-check with CI that everything is green with the first commit. I'm going to close it as soon as CI confirms.

@felixxm
Copy link
Member

felixxm commented Apr 4, 2023

Merged the 1st commit in 0e1aae7.

@felixxm felixxm closed this Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants