Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #33928 -- Avoided unnecessary queries when cascade updating. #15969

Merged
merged 2 commits into from
Aug 27, 2022

Conversation

charettes
Copy link
Member

Models that use SET, SET_NULL, and SET_DEFAULT as on_delete handler
don't have to fetch objects for the sole purpose of passing them back to
a follow up UPDATE query filtered by the retrieved objects primary key.

This was achieved by flagging SET handlers as lazy and having the collector
logic deferr object collections until the last minute. This should ensure that
the rare cases where custom on_delete handlers are defined remain uncalled when
when dealing with an empty collection of instances.

This reduces the number queries required to apply SET handlers from 2 to 1
where the remaining UPDATE use the same predicate as the non removed SELECT
query.

In a lot of ways this is similar to the fast-delete optimization that was added
in #18676 but for updates this time. The conditions only happen to be simpler
in this case because SET handlers are always terminal; they never cascade to
more deletes that can be combined.

Thanks @rgehan for the report.

@charettes
Copy link
Member Author

The only potentially undesirable side effect of this change is that a callable passed to SET or a callable field default called by SET_DEFAULT will now be called even if not necessary. This can be avoided by disabling this optimization with the following changes if deemed necessary

diff --git a/django/db/models/deletion.py b/django/db/models/deletion.py
index be2002d23e..b268f01540 100644
--- a/django/db/models/deletion.py
+++ b/django/db/models/deletion.py
@@ -53,6 +53,10 @@ def SET(value):
     if callable(value):

         def set_on_delete(collector, field, sub_objs, using):
+            # Avoid potential side effects (e.g. queries) that value()
+            # might incur.
+            if not sub_objs:
+                return
             collector.add_field_update(field, value(), sub_objs)

     else:
@@ -73,6 +77,10 @@ def SET_NULL(collector, field, sub_objs, using):


 def SET_DEFAULT(collector, field, sub_objs, using):
+    if callable(field.default) and not sub_objs:
+        # Avoid potential side effects (e.g. queries) that field.get_default()
+        # might incur.
+        return
     collector.add_field_update(field, field.get_default(), sub_objs)

@charettes charettes force-pushed the ticket-33928 branch 2 times, most recently from c903fc4 to 8122b4a Compare August 18, 2022 01:21
for (field, value), instances in instances_for_fieldvalues.items():
for field, value, instances in self.field_updates:
if (
isinstance(instances, models.QuerySet)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using models.QuerySet instead of a direct QuerySet import to avoid circular imports.

@charettes charettes marked this pull request as ready for review August 18, 2022 12:26
@charettes charettes force-pushed the ticket-33928 branch 2 times, most recently from 7fd6e5d to edcffaa Compare August 18, 2022 18:04
Copy link
Member

@felixxm felixxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charettes Thanks 👍 I really like the proposed approach 🥇

…updates.

Model instances retrieved for bulk field update purposes are not exposed
to the outside world and thus are not required to be kept update to
date.
Models that use SET, SET_NULL, and SET_DEFAULT as on_delete handler
don't have to fetch objects for the sole purpose of passing them back to
a follow up UPDATE query filtered by the retrieved objects primary key.

This was achieved by flagging SET handlers as _lazy_ and having the
collector logic defer object collections until the last minute. This
should ensure that the rare cases where custom on_delete handlers are
defined remain uncalled when when dealing with an empty collection of
instances.

This reduces the number queries required to apply SET handlers from
2 to 1 where the remaining UPDATE use the same predicate as the non
removed SELECT query.

In a lot of ways this is similar to the fast-delete optimization that
was added in #18676 but for updates this time. The conditions only
happen to be simpler in this case because SET handlers are always
terminal. They never cascade to more deletes that can be combined.

Thanks Renan GEHAN for the report.
@felixxm felixxm merged commit 0701bb8 into django:main Aug 27, 2022
@dutkulang
Copy link

dutkulang commented Aug 28, 2022 via email

combined_updates = reduce(or_, updates)
combined_updates.update(**{field.name: value})
if objs:
model = objs[0].__class__

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be the best place to ask this, but is there any reason why this is not field.model instead?

When using polymorphism, this line causes database consistency too break, as it only updates the table of the derived class which corresponds to the first object in objs. However if it were field.model instead, the base class' table would be updated and consistency would be honored.

Should I create a PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, please create a new ticket in Trac with a sample project and follow our bug reporting guidelines. All bugfixes require an accepted ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants