-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed #28900 -- Made SELECT respect the order specified by values()/values_list(). #16703
base: main
Are you sure you want to change the base?
Conversation
|
||
if queryset._fields: | ||
# extra(select=...) cols are always at the start of the row. | ||
names = [ | ||
*query.extra_select, | ||
*query.values_select, | ||
*query.annotation_select, | ||
] | ||
fields = [ | ||
*queryset._fields, | ||
*(f for f in query.annotation_select if f not in queryset._fields), | ||
] | ||
if fields != names: | ||
# Reorder according to fields. | ||
index_map = {name: idx for idx, name in enumerate(names)} | ||
rowfactory = operator.itemgetter(*[index_map[f] for f in fields]) | ||
return map( | ||
rowfactory, | ||
compiler.results_iter( | ||
chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queryset._fields
is only truthy when fields are provided to values_list
but when it's the case they are now properly selected in the right order so there is no need for re-positioning shenanigans!
This should have a noticeable positive impact on performance when using values_list
as no swapping of select order has to be performed when retrieving results.
f63f356
to
050f918
Compare
6718afc
to
fa9b201
Compare
annotation_mask = ( | ||
value | ||
for value in dict.fromkeys(self.annotation_select) | ||
if value != alias | ||
) | ||
self.set_annotation_mask(annotation_mask) | ||
self.set_annotation_mask(set(self.annotation_select).difference({alias})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This partially reverts the changes introduced by d6b6e5d which were too naive; the order doesn't only have to be preserved for the local annotation group it must be across extra_select + model_fields + annotations
.
fa9b201
to
a3b787f
Compare
self.assertIn("GROUP BY 2", sql) | ||
self.assertIn("ORDER BY 2", sql) | ||
self.assertIn("GROUP BY 1", sql) | ||
self.assertIn("ORDER BY 1", sql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expected as the usage of annotate
after values
now append instead of insert at the beginning.
25fb2f3
to
ae1b1cb
Compare
) | ||
.values_list("zero", "num") | ||
) | ||
self.assertCountEqual(qs1.union(qs2), [(1, 0), (0, 2)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use to fail because before it was doing
SELECT num, 0 AS zero
FROM queries_number
UNION
SELECT num, 0 AS zero
FROM queries_number
Completely disregarding the explicit order specified to values_list
Now it does
SELECT num, 0 AS zero
FROM queries_number
UNION
SELECT 0 AS zero, num
FROM queries_number
@@ -2200,7 +2210,7 @@ def test_col_alias_quoted(self): | |||
{"tag_per_parent__max": 2}, | |||
) | |||
sql = captured_queries[0]["sql"] | |||
self.assertIn("AS %s" % connection.ops.quote_name("col1"), sql) | |||
self.assertIn("AS %s" % connection.ops.quote_name("parent"), sql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When values
is used to reference foreign keys their name is used as alias for the column names
Before
SELECT MAX("tag_per_parent")
FROM
(SELECT "queries_tag"."parent_id" AS "col1",
COUNT("queries_tag"."id") AS "tag_per_parent"
FROM "queries_tag"
GROUP BY 1) subquery;
After
SELECT MAX("tag_per_parent")
FROM
(SELECT "queries_tag"."parent_id" AS "parent",
COUNT("queries_tag"."id") AS "tag_per_parent"
FROM "queries_tag"
GROUP BY 1) subquery;
Now that we group by select index anyway I'm not sure how relevant this assertion is anymore but I figured I'd keep it around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charettes Thanks 👍 Fantastic!
ae1b1cb
to
19f3e1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only part that I think might be missing is a mention in the release notes. My thoughts are that it's possible that users tried to work around this bug in all sort of weird ways (mixing extras) or might not even realize that they have a buggy query because the swapped columns happen to share compatible types. They would benefit from being explained why this was solved and what to expect. Thoughts?
I think it's a good idea
Maybe putting something in the values()
docs explicitly mentioning the select ordering (with a versionchanged note explaining the change a little) might be a good start?
django/db/models/sql/query.py
Outdated
if self.selected: | ||
# Augment selected with all annotations as the mask was removed. | ||
for name in self.annotations: | ||
self.selected[name] = name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also believe I can revert this without a test failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting I'll try to see what this might be needed for by running the full suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is effectively is no way to trigger it anymore (this PR is a few months old at this point 😅).
I checked all the set_annotation_mask
call and the only one that can still pass names = None
is the one in get_aggregation
assuming self.annotation_select_mask = None
prior
django/django/db/models/sql/query.py
Lines 456 to 482 in e56a32b
annotation_select_mask = self.annotation_select_mask | |
for alias, aggregate_expr in aggregate_exprs.items(): | |
self.check_alias(alias) | |
aggregate = aggregate_expr.resolve_expression( | |
self, allow_joins=True, reuse=None, summarize=True | |
) | |
if not aggregate.contains_aggregate: | |
raise TypeError("%s is not an aggregate expression" % alias) | |
# Temporarily add aggregate to annotations to allow remaining | |
# members of `aggregates` to resolve against each others. | |
self.append_annotation_mask([alias]) | |
aggregate_refs = aggregate.get_refs() | |
refs_subquery |= any( | |
getattr(self.annotations[ref], "contains_subquery", False) | |
for ref in aggregate_refs | |
) | |
refs_window |= any( | |
getattr(self.annotations[ref], "contains_over_clause", True) | |
for ref in aggregate_refs | |
) | |
aggregate = aggregate.replace_expressions(replacements) | |
self.annotations[alias] = aggregate | |
replacements[Ref(alias, aggregate)] = aggregate | |
# Stash resolved aggregates now that they have been allowed to resolve | |
# against each other. | |
aggregates = {alias: self.annotations.pop(alias) for alias in aggregate_exprs} | |
self.set_annotation_mask(annotation_select_mask) |
I'm happy to remove the code if you'd like as it's not activated by any code path but I wonder if a good preventive measure here might be drop support for passing None
in the first place (and adjust get_aggregation
accordingly) as if a future use case come up we'll definitely run into issues with Query.selected
being out of sync.
The alternative would be write low level tests for Query.set_annotation_mask
directly to cover for a potential future use case. Whichever you prefer works for me, just lmk @sarahboyce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh interesting
I'm happy to remove the code if you'd like as it's not activated by any code path but I wonder if a good preventive measure here might be drop support for passing None in the first place (and adjust get_aggregation accordingly) as if a future use case come up we'll definitely run into issues with Query.selected being out of sync.
I think this is a good idea
I thought perhaps .set_annotation_mask(None)
was needed to clear the annotation mask but I can see in .clear_select_clause()
it's doing .set_annotation_mask(())
.
So I don't think we need it, I think you've concluded the same, hence agree dropping support is a good idea 👍
I appreciate this is unrelated to this change so if you'd prefer that we pick this up later in a separate PR, that's fine by me. Also can be swayed to the other suggestions, so if you had a secret preference, just shout 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I figured I'd just drop the new logic after all as the gymnastic required in get_aggregation
where getting out of hand.
19f3e1b
to
cdeafdd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work @charettes 😍
From my narrow understanding, looks pretty sweet to me 👍
I'd definitely add a mention in the release notes, with a few concrete examples of changes perhaps.
list(book.items()), | ||
[("other_rating", self.b1.rating - 1), ("rating", self.b1.rating)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
items()
is used to check the insertion order of the key in the dict
but the test fails even by only comparing dict
when the changes in ValuesIterable.__iter__
are reverted.
f2dd4c2
to
91b62ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @charettes 🥳 looks really good
django/db/models/sql/query.py
Outdated
if self.selected: | ||
# Augment selected with all annotations as the mask was removed. | ||
for name in self.annotations: | ||
self.selected[name] = name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh interesting
I'm happy to remove the code if you'd like as it's not activated by any code path but I wonder if a good preventive measure here might be drop support for passing None in the first place (and adjust get_aggregation accordingly) as if a future use case come up we'll definitely run into issues with Query.selected being out of sync.
I think this is a good idea
I thought perhaps .set_annotation_mask(None)
was needed to clear the annotation mask but I can see in .clear_select_clause()
it's doing .set_annotation_mask(())
.
So I don't think we need it, I think you've concluded the same, hence agree dropping support is a good idea 👍
I appreciate this is unrelated to this change so if you'd prefer that we pick this up later in a separate PR, that's fine by me. Also can be swayed to the other suggestions, so if you had a secret preference, just shout 👍
docs/ref/models/querysets.txt
Outdated
The ``SELECT`` clause generated when using ``values()`` respects the | ||
order of the specified ``*fields`` and ``**expressions``. In previous | ||
versions, the order was based off the nature of the referenced model | ||
fields and annotations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ``SELECT`` clause generated when using ``values()`` respects the | |
order of the specified ``*fields`` and ``**expressions``. In previous | |
versions, the order was based off the nature of the referenced model | |
fields and annotations. | |
The ``SELECT`` clause generated when using ``values()`` was updated to | |
respect the order of the specified ``*fields`` and ``**expressions``. |
I think the description of what was happening previously is a little confusing and debating whether we shouldn't worry too much as I think how it's put in the release notes is nicer. Not sure 🤔
91b62ea
to
b82c3e1
Compare
…ected). Previously the order was always extra_fields + model_fields + annotations with respective local ordering inferred from the insertion order of *selected. This commits introduces a new `Query.selected` propery that keeps tracks of the global select order as specified by on values assignment. This is crucial feature to allow the combination of queries mixing annotations and table references. It also allows the removal of the re-ordering shenanigans perform by ValuesListIterable in order to re-map the tuples returned from the database backend to the order specified by values_list() as they'll be in the right order at query compilation time. Refs #28553 as the initially reported issue that was only partially fixed for annotations by d6b6e5d. Thanks Mariusz Felisiak and Sarah Boyce for review.
Previously, only the selected column aliases would be propagated and annotations were ignored.
This should ensure it never drifts from Query.selected while maintaining backward compatibility.
b82c3e1
to
3787a84
Compare
|
||
The ``SELECT`` clause generated when using ``values()`` was updated to | ||
respect the order of the specified ``*fields`` and ``**expressions``. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sarahboyce I applied your suggestion and I'll defer to you to determine if the release notes are enough.
IMO it might be as the previous behavior was borderline a bug and undocumented and based on the number of tickets I triaged where I had to teach reporters about it I doubt many users even knew it. Still it might be the extra admonition to at least point out it behaves correctly.
ticket-28900
The TL;DR here is that previously when using
values
orvalues_list
the resulting order of theSELECT
members was not matching the one explicitly specified.Instead the order was always
SELECT *extra_fields, *model_fields, *annotations
where each local group respected the explicitly specified order. This was leading to issues particularly when combining queries and trying to combine annotations with model fields as exemplified hereWhat this PR does is
Query.selected
ValuesListIterable
The only part that I think might be missing is a mention in the release notes. My thoughts are that it's possible that users tried to work around this bug in all sort of weird ways (mixing extras) or might not even realize that they have a buggy query because the swapped columns happen to share compatible types. They would benefit from being explained why this was solved and what to expect. Thoughts?