Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #28900 -- Made SELECT respect the order specified by values()/values_list(). #16703

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

charettes
Copy link
Member

@charettes charettes commented Mar 30, 2023

ticket-28900


The TL;DR here is that previously when using values or values_list the resulting order of the SELECT members was not matching the one explicitly specified.

Instead the order was always SELECT *extra_fields, *model_fields, *annotations where each local group respected the explicitly specified order. This was leading to issues particularly when combining queries and trying to combine annotations with model fields as exemplified here

What this PR does is

  1. Store a global order in Query.selected
  2. Remove the unnecessary re-mapping of order in ValuesListIterable
  3. Address ticket-28900 by having the combinator select inheritance logic rely on this new property

The only part that I think might be missing is a mention in the release notes. My thoughts are that it's possible that users tried to work around this bug in all sort of weird ways (mixing extras) or might not even realize that they have a buggy query because the swapped columns happen to share compatible types. They would benefit from being explained why this was solved and what to expect. Thoughts?

Comment on lines -223 to -247

if queryset._fields:
# extra(select=...) cols are always at the start of the row.
names = [
*query.extra_select,
*query.values_select,
*query.annotation_select,
]
fields = [
*queryset._fields,
*(f for f in query.annotation_select if f not in queryset._fields),
]
if fields != names:
# Reorder according to fields.
index_map = {name: idx for idx, name in enumerate(names)}
rowfactory = operator.itemgetter(*[index_map[f] for f in fields])
return map(
rowfactory,
compiler.results_iter(
chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size
),
)
Copy link
Member Author

@charettes charettes Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queryset._fields is only truthy when fields are provided to values_list but when it's the case they are now properly selected in the right order so there is no need for re-positioning shenanigans!

This should have a noticeable positive impact on performance when using values_list as no swapping of select order has to be performed when retrieving results.

@charettes charettes force-pushed the ticket-28553-compiler branch 2 times, most recently from f63f356 to 050f918 Compare May 23, 2023 04:55
@charettes charettes force-pushed the ticket-28553-compiler branch 3 times, most recently from 6718afc to fa9b201 Compare June 21, 2024 05:29
@charettes charettes changed the title Made SELECT respect the order specified by values()/values_list(). Fixed #28900 -- Made SELECT respect the order specified by values()/values_list(). Jun 21, 2024
@charettes charettes marked this pull request as ready for review June 21, 2024 05:32
Comment on lines -1198 to +1199
annotation_mask = (
value
for value in dict.fromkeys(self.annotation_select)
if value != alias
)
self.set_annotation_mask(annotation_mask)
self.set_annotation_mask(set(self.annotation_select).difference({alias}))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This partially reverts the changes introduced by d6b6e5d which were too naive; the order doesn't only have to be preserved for the local annotation group it must be across extra_select + model_fields + annotations.

self.assertIn("GROUP BY 2", sql)
self.assertIn("ORDER BY 2", sql)
self.assertIn("GROUP BY 1", sql)
self.assertIn("ORDER BY 1", sql)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expected as the usage of annotate after values now append instead of insert at the beginning.

@charettes charettes force-pushed the ticket-28553-compiler branch 2 times, most recently from 25fb2f3 to ae1b1cb Compare June 21, 2024 06:18
)
.values_list("zero", "num")
)
self.assertCountEqual(qs1.union(qs2), [(1, 0), (0, 2)])
Copy link
Member Author

@charettes charettes Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use to fail because before it was doing

SELECT num, 0 AS zero
FROM queries_number
UNION
SELECT num, 0 AS zero
FROM queries_number

Completely disregarding the explicit order specified to values_list

Now it does

SELECT num, 0 AS zero
FROM queries_number
UNION
SELECT 0 AS zero, num
FROM queries_number

@@ -2200,7 +2210,7 @@ def test_col_alias_quoted(self):
{"tag_per_parent__max": 2},
)
sql = captured_queries[0]["sql"]
self.assertIn("AS %s" % connection.ops.quote_name("col1"), sql)
self.assertIn("AS %s" % connection.ops.quote_name("parent"), sql)
Copy link
Member Author

@charettes charettes Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When values is used to reference foreign keys their name is used as alias for the column names

Before

SELECT MAX("tag_per_parent")
FROM
  (SELECT "queries_tag"."parent_id" AS "col1",
          COUNT("queries_tag"."id") AS "tag_per_parent"
   FROM "queries_tag"
   GROUP BY 1) subquery;

After

SELECT MAX("tag_per_parent")
FROM
  (SELECT "queries_tag"."parent_id" AS "parent",
          COUNT("queries_tag"."id") AS "tag_per_parent"
   FROM "queries_tag"
   GROUP BY 1) subquery;

Now that we group by select index anyway I'm not sure how relevant this assertion is anymore but I figured I'd keep it around.

Copy link
Member

@felixxm felixxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charettes Thanks 👍 Fantastic!

django/db/models/sql/compiler.py Show resolved Hide resolved
Copy link
Contributor

@sarahboyce sarahboyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only part that I think might be missing is a mention in the release notes. My thoughts are that it's possible that users tried to work around this bug in all sort of weird ways (mixing extras) or might not even realize that they have a buggy query because the swapped columns happen to share compatible types. They would benefit from being explained why this was solved and what to expect. Thoughts?

I think it's a good idea
Maybe putting something in the values() docs explicitly mentioning the select ordering (with a versionchanged note explaining the change a little) might be a good start?

django/db/models/query.py Show resolved Hide resolved
django/db/models/sql/compiler.py Outdated Show resolved Hide resolved
Comment on lines 2420 to 2423
if self.selected:
# Augment selected with all annotations as the mask was removed.
for name in self.annotations:
self.selected[name] = name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also believe I can revert this without a test failure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting I'll try to see what this might be needed for by running the full suite.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is effectively is no way to trigger it anymore (this PR is a few months old at this point 😅).

I checked all the set_annotation_mask call and the only one that can still pass names = None is the one in get_aggregation assuming self.annotation_select_mask = None prior

annotation_select_mask = self.annotation_select_mask
for alias, aggregate_expr in aggregate_exprs.items():
self.check_alias(alias)
aggregate = aggregate_expr.resolve_expression(
self, allow_joins=True, reuse=None, summarize=True
)
if not aggregate.contains_aggregate:
raise TypeError("%s is not an aggregate expression" % alias)
# Temporarily add aggregate to annotations to allow remaining
# members of `aggregates` to resolve against each others.
self.append_annotation_mask([alias])
aggregate_refs = aggregate.get_refs()
refs_subquery |= any(
getattr(self.annotations[ref], "contains_subquery", False)
for ref in aggregate_refs
)
refs_window |= any(
getattr(self.annotations[ref], "contains_over_clause", True)
for ref in aggregate_refs
)
aggregate = aggregate.replace_expressions(replacements)
self.annotations[alias] = aggregate
replacements[Ref(alias, aggregate)] = aggregate
# Stash resolved aggregates now that they have been allowed to resolve
# against each other.
aggregates = {alias: self.annotations.pop(alias) for alias in aggregate_exprs}
self.set_annotation_mask(annotation_select_mask)

I'm happy to remove the code if you'd like as it's not activated by any code path but I wonder if a good preventive measure here might be drop support for passing None in the first place (and adjust get_aggregation accordingly) as if a future use case come up we'll definitely run into issues with Query.selected being out of sync.

The alternative would be write low level tests for Query.set_annotation_mask directly to cover for a potential future use case. Whichever you prefer works for me, just lmk @sarahboyce.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh interesting

I'm happy to remove the code if you'd like as it's not activated by any code path but I wonder if a good preventive measure here might be drop support for passing None in the first place (and adjust get_aggregation accordingly) as if a future use case come up we'll definitely run into issues with Query.selected being out of sync.

I think this is a good idea
I thought perhaps .set_annotation_mask(None) was needed to clear the annotation mask but I can see in .clear_select_clause() it's doing .set_annotation_mask(()).
So I don't think we need it, I think you've concluded the same, hence agree dropping support is a good idea 👍

I appreciate this is unrelated to this change so if you'd prefer that we pick this up later in a separate PR, that's fine by me. Also can be swayed to the other suggestions, so if you had a secret preference, just shout 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I figured I'd just drop the new logic after all as the gymnastic required in get_aggregation where getting out of hand.

django/db/models/sql/query.py Outdated Show resolved Hide resolved
Copy link
Member

@David-Wobrock David-Wobrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @charettes 😍
From my narrow understanding, looks pretty sweet to me 👍

I'd definitely add a mention in the release notes, with a few concrete examples of changes perhaps.

Comment on lines +577 to +578
list(book.items()),
[("other_rating", self.b1.rating - 1), ("rating", self.b1.rating)],
Copy link
Member Author

@charettes charettes Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

items() is used to check the insertion order of the key in the dict but the test fails even by only comparing dict when the changes in ValuesIterable.__iter__ are reverted.

@charettes charettes force-pushed the ticket-28553-compiler branch 3 times, most recently from f2dd4c2 to 91b62ea Compare June 27, 2024 05:23
Copy link
Contributor

@sarahboyce sarahboyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @charettes 🥳 looks really good

Comment on lines 2420 to 2423
if self.selected:
# Augment selected with all annotations as the mask was removed.
for name in self.annotations:
self.selected[name] = name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh interesting

I'm happy to remove the code if you'd like as it's not activated by any code path but I wonder if a good preventive measure here might be drop support for passing None in the first place (and adjust get_aggregation accordingly) as if a future use case come up we'll definitely run into issues with Query.selected being out of sync.

I think this is a good idea
I thought perhaps .set_annotation_mask(None) was needed to clear the annotation mask but I can see in .clear_select_clause() it's doing .set_annotation_mask(()).
So I don't think we need it, I think you've concluded the same, hence agree dropping support is a good idea 👍

I appreciate this is unrelated to this change so if you'd prefer that we pick this up later in a separate PR, that's fine by me. Also can be swayed to the other suggestions, so if you had a secret preference, just shout 👍

Comment on lines 750 to 753
The ``SELECT`` clause generated when using ``values()`` respects the
order of the specified ``*fields`` and ``**expressions``. In previous
versions, the order was based off the nature of the referenced model
fields and annotations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The ``SELECT`` clause generated when using ``values()`` respects the
order of the specified ``*fields`` and ``**expressions``. In previous
versions, the order was based off the nature of the referenced model
fields and annotations.
The ``SELECT`` clause generated when using ``values()`` was updated to
respect the order of the specified ``*fields`` and ``**expressions``.

I think the description of what was happening previously is a little confusing and debating whether we shouldn't worry too much as I think how it's put in the release notes is nicer. Not sure 🤔

…ected).

Previously the order was always extra_fields + model_fields + annotations with
respective local ordering inferred from the insertion order of *selected.

This commits introduces a new `Query.selected` propery that keeps tracks of the
global select order as specified by on values assignment. This is crucial
feature to allow the combination of queries mixing annotations and table
references.

It also allows the removal of the re-ordering shenanigans perform by
ValuesListIterable in order to re-map the tuples returned from the database
backend to the order specified by values_list() as they'll be in the right
order at query compilation time.

Refs #28553 as the initially reported issue that was only partially fixed
for annotations by d6b6e5d.

Thanks Mariusz Felisiak and Sarah Boyce for review.
Previously, only the selected column aliases would be propagated and
annotations were ignored.
This should ensure it never drifts from Query.selected while maintaining
backward compatibility.

The ``SELECT`` clause generated when using ``values()`` was updated to
respect the order of the specified ``*fields`` and ``**expressions``.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarahboyce I applied your suggestion and I'll defer to you to determine if the release notes are enough.

IMO it might be as the previous behavior was borderline a bug and undocumented and based on the number of tickets I triaged where I had to teach reporters about it I doubt many users even knew it. Still it might be the extra admonition to at least point out it behaves correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants