Optimize OrderedModelBase._wrt_map() - remove unnecessary query #286

ibaguio · 2023-03-11T18:32:43Z

I've recently updated the version of django-ordered-model from one of my projects, and noticed that the change from 3.6 to 3.7 caused one of my SQL query count tests to fail. After some investigation, I found that there were additional (unnecessary) query caused by wrt_map() when initializing the instance. Specifically, one (1) SQL query was added for every field in order_with_respect_to

I found that wrt_map gets the actual related object from the database. And the .save() method compares the value from _original_wrt_map.

Now, I figured, that instead of actually comparing the objects, we could achieve the same result by comparing the _id of the fields instead. More in this below.

To visualize the problem
Let's use the Answer model from our tests/models.py

Fetching an object from the database yieds:

answer = Answer.objects.get()    
# 3 sql queries!
# 1. SELECT "tests_answer"."id", "tests_answer"."order", "tests_answer"."question_id", "tests_answer"."user_id" FROM "tests_answer" ORDER BY "tests_answer"."question_id" ASC, "tests_answer"."user_id" ASC, "tests_answer"."order" ASC LIMIT 1
# 2. SELECT "tests_question"."id" FROM "tests_question" WHERE "tests_question"."id" = 1 LIMIT 21
# 3. SELECT "tests_testuser"."id" FROM "tests_testuser" WHERE "tests_testuser"."id" = 1 LIMIT 21

this is caused by the invocation of _wrt_map() from OrderedModelBase.init to define the _original_wrt_map.

Proposed Solution

The changes I've introduced in this PR, uses the field _id instead of the related object when building the wrt_map. This way, the related object is not fetched from the database (if not yet set prior).

Caveat: This solution only works for order_with_respect_to fields that are actually ForeignKeys. As of writing, I have not yet tested it on reverse FK, ManyToMany, etc. I know my changes will not work for these. I'm hoping someone could contribute to solving the problem.

Alternate (Temporary) Solution

The easier solution here, and without code change, would be to hardcode _id when defining the order_with_respect_to. So instead of

    class Answer(models.Model):
        order_with_respect_to = ("question", "user")

we could define our model:

    class Answer(models.Model):
        order_with_respect_to = ("question_id", "user_id")

This of course, requires that we update the documentation, and also assumes that the package user (developer) is aware of this problem, and would require them to update existing code. I would not suggest this as a long term solution.

--
Additional notes:

We could probably add more assertNumQueries checks on other parts of the tests

the newly introduced OrderedModelBase._wrt_map method gets the value of the fields defined in order_with_respect_to. this forces django to query the object from the database, but only to be used for comparison. It is optimal to just compare the _id instead of the actual model object.

ibaguio · 2023-03-11T18:34:37Z

tests/utils.py

@@ -0,0 +1,37 @@
+# Query count helpers, copied from django source


i've added this utils file since this package supports django <3.1, and assertNumQueries was only introduced therein

shuckc · 2023-03-13T11:16:10Z

Firstly - thank you. I really appreciate the diligence it takes to uncover this sort of thing and then to raise it with us. I am happy to merge in the context manager and use it in tests to track this sort of regression. I didn't expect building the _original_wrt_map to fire off additional queries, but I lacked the tool to find out!

We should absolutely try to use the documented {field}_id trick to make this cheaper. There seems to be two ways to go ahead with it, one is to re-write the field lookups (as you have done), however to do so conditionally on the type of the field, probably when the model is elaborated rather than during usage. The second approach is to require the user to be explicit in what they want, and help them by raising a Django system check WARNING if a ForeignKey field is referenced without using _id - with wording to suggest this as a worthwhile optimisation.

I'll take a look at how best to move forward - we could use a test case for order_with_respect_to on a non-foreign key field, or at last some reasoning that this is a nonsensical configuration and reject it.

This will also improve other callers of _wrt_map() via _validate_orderdering_reference(), (notably swap, above and below for OWRT models) that would have triggered the extra queries, as well as rendering ordering buttons in the admin.

shuckc · 2023-03-13T12:38:53Z

Looking at our ItemGroup and GroupedItem example, it would reduce to one query if for double-underscore relations we pre-fetched the intermediate object with an automatic select_related(), as well as use _id.

So our order_with_respect_to = "group__user" would need to be re-written as follows:

    class ItemGroup(models.Model):
        user = models.ForeignKey(User, on_delete=models.CASCADE)
        general_info = models.CharField(max_length=100)

+    class GroupedItemManager(OrderedModelManager):
+        def get_queryset(self):
+            return super(GroupedItemManager, self).get_queryset().select_related('group')

    class GroupedItem(OrderedModel):
        group = models.ForeignKey(ItemGroup, on_delete=models.CASCADE)
        specific_info = models.CharField(max_length=100)
-       order_with_respect_to = 'group__user'
+       order_with_respect_to = 'group__user_id'
+       objects = GroupedItemManager()

I think I am in favour of updating our docs and raising a ChecksERROR for any value in order_with_respect_to that is not a ForeignKey field, or one or more ForeignKey traversals ending with a ForeignKey field, so that we can rely on the _id portion.

ibaguio · 2023-03-14T04:30:02Z

You're absolutely welcome.

I think we can implement the checks to raise the warning, and update the docs first. That shouldn't take a lot of effort to do, while we devise a proper solution on how to improve the _wrt_map. I implemented the _id for my project, which was the cheapest solution. And to protect myself, I've added a test to check all models that has the order_with_respect_to defined, and make sure it ends with _id.

From my perspective, mitigating the issue of added SQL queries is CRITICAL. Some poor django app out there using v3.7 this of this awesome library would have its performance impacted, especially if they didn't have any query count tests to monitor their ORM code.

Here's a snippet of the test code i have in my project, which we could probably add for the checks.
Note:

that there are additional checks i've added here, not just for the order_wrt
this is structrued as a test, and not a check

    def test_ordered_models(self):
        """Test that Models that inherit OrderedModel has the correct queryset.

        If you are seeing this error, either:
        - the model's queryset is not inheriting the OrderedModelQuerySet
        - the model does not have ordering defined
        - the model.order_with_respect_to does not use the _id of the field
        """

        misconfigured_models = []
        all_ordered_models = [
            model
            for model in self.all_app_models  # all_app_models is defined somewhere else
            if issubclass(model, OrderedModelBase)
        ]

        for model in all_ordered_models:
            # check to make sure that models.objects is a subclass of our Manager / Queryset
            # this is in case there is a custom manager / queryset added by the dev, and overwritten OrderedModel
            if not (
                issubclass(model.objects.__class__, OrderedModelManager) or
                issubclass(model.objects.none().__class__, OrderedModelQuerySet)
            ):
                misconfigured_models.append(model)
           
            # Meta.ordering is required for OrderedModel
            if not model._meta.ordering:
                misconfigured_models.append(model)
            
            # this is the codeblock we need to check the _id
            if model.order_with_respect_to:
                for field in model.order_with_respect_to:
                    if not field.endswith("_id"):
                        misconfigured_models.append(model)

        self.assertTrue(
            not misconfigured_models,
            "\nMisconfigured models found:\n\n  > %s\n\nMore info found in test docstring." % (
                "\n  > ".join([
                    f"{model._meta.app_label}.{model.__name__}"
                    for model in set(misconfigured_models)
                ]),
            )
        )

ibaguio · 2023-03-14T04:40:05Z

Regarding the double underscore relations, I would suggest not using select_related, because it takes all the data of the related table. Imagine if your related model has a lot of fields, your DB query will be a bombshell due to the select_related, but then again, you only actually want the id of one field. We can just annotate the _id of the relation.

Top of my head, it would look like something like this:

+    class GroupedItemManager(OrderedModelManager):
+        def get_queryset(self):
+            return super().get_queryset().annotate(order_wrt_id=F("group__user_id"))

that's just a rough idea, of course the multiple order_with_respect_to values needs to be supported. This would require some django ORM ' magic', i'm happy to dive deep into that once i get some free time myself.

Personally, this is not an urgent issue for me, since all my usage of the OrderedModel, only has at most 1 order_with_respect_to, but nevertheless, needs to be supported

…ses are returned

shuckc · 2023-03-14T11:46:45Z

Very nice - I've converted these to system checks in b416595 (all Errors) as they are a common misconfiguration.

There is a legitimate test failure, adding the _id suffix breaks rendering of up/down arrows in the admin for inlines:

FAIL: test_move_up_down_links_ordered_inline (tests.tests.OrderedModelAdminTest)
FAIL: test_move_up_down_proxy_stacked_inline (tests.tests.OrderedModelAdminTest)
FAIL: test_move_links (tests.tests.OrderedModelAdminWithCustomPKInlineTest)

Confirmed by trivially reverting the following lines. Changing the keys of the OWRT dictionary likely needs a corresponding change in the admin where they are lifted out into the URL - will look at this later.

@@ -128,8 +128,8 @@ class OrderedModelBase(models.Model):
     def _wrt_map(self):
         d = {}
         for order_field_name in self.get_order_with_respect_to():
-            if not order_field_name.endswith("_id"):
-                order_field_name = order_field_name + "_id"

shuckc · 2023-03-14T15:06:43Z

For now I have added a slow method OrderedModel._get_related_objects() known to trigger extra queries for use by the Admin. The admin already fired those queries prior to 3.7 during URL generation so it is not a regression here, although it looks like it could be improved. This passes all tests and gives us a no-worse than 3.6 release for the 3.7 series.

I've added documentation around the new Checks and if this looks good I'll release it at 3.7.2

ibaguio · 2023-03-14T15:10:31Z

I agree that admin should be improved later. I've seen first hand how an unsuspecting N+1 in the admin can crash a production environment. But for now, the fix for the wrt is good.

I'm good ✅ with your changes, and thank you for adding those django checks!

ibaguio added 2 commits March 12, 2023 01:39

Add query count tests

3f5aeaf

ibaguio commented Mar 11, 2023

View reviewed changes

ibaguio marked this pull request as draft March 11, 2023 18:36

ibaguio changed the title ~~WIP: Optimize OrderedModelBase._wrt_map() - reduce query count~~ Optimize OrderedModelBase._wrt_map() - reduce query count Mar 11, 2023

ibaguio changed the title ~~Optimize OrderedModelBase._wrt_map() - reduce query count~~ Optimize OrderedModelBase._wrt_map() - remove unnecessary query Mar 11, 2023

run black tests/utils.py

46e8bab

add system Check that OrderedQuerySet and OrderedModelManager subclas…

b416595

…ses are returned

shuckc force-pushed the remove_unecessary_query branch from 0aa0fd5 to b6520cf Compare March 14, 2023 14:37

shuckc added 2 commits March 14, 2023 14:45

check that order_with_respect_to values are ForeignKey fields

6b60ed6

add slowpath for admin that queries FK objects

048a270

shuckc force-pushed the remove_unecessary_query branch from b6520cf to 048a270 Compare March 14, 2023 14:46

update docs prior to release

53cd95b

shuckc marked this pull request as ready for review March 14, 2023 15:17

shuckc merged commit 71d6e07 into django-ordered-model:master Mar 14, 2023

ibaguio deleted the remove_unecessary_query branch March 15, 2023 04:07

shuckc mentioned this pull request Mar 29, 2023

If order_with_respect_to is a foreign key of a foreign key it generates aditional queries. #293

Open

shuckc mentioned this pull request Oct 19, 2023

order_with_respect_to with non-foreign keys #298

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize OrderedModelBase._wrt_map() - remove unnecessary query #286

Optimize OrderedModelBase._wrt_map() - remove unnecessary query #286

ibaguio commented Mar 11, 2023 •

edited

ibaguio Mar 11, 2023

shuckc commented Mar 13, 2023

shuckc commented Mar 13, 2023

ibaguio commented Mar 14, 2023 •

edited

ibaguio commented Mar 14, 2023 •

edited

shuckc commented Mar 14, 2023

shuckc commented Mar 14, 2023

ibaguio commented Mar 14, 2023

		@@ -0,0 +1,37 @@
		# Query count helpers, copied from django source

Optimize OrderedModelBase._wrt_map() - remove unnecessary query #286

Optimize OrderedModelBase._wrt_map() - remove unnecessary query #286

Conversation

ibaguio commented Mar 11, 2023 • edited

ibaguio Mar 11, 2023

Choose a reason for hiding this comment

shuckc commented Mar 13, 2023

shuckc commented Mar 13, 2023

ibaguio commented Mar 14, 2023 • edited

ibaguio commented Mar 14, 2023 • edited

shuckc commented Mar 14, 2023

shuckc commented Mar 14, 2023

ibaguio commented Mar 14, 2023

ibaguio commented Mar 11, 2023 •

edited

ibaguio commented Mar 14, 2023 •

edited

ibaguio commented Mar 14, 2023 •

edited