Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize OrderedModelBase._wrt_map() - remove unnecessary query #286

Merged
merged 7 commits into from
Mar 14, 2023

Conversation

ibaguio
Copy link
Contributor

@ibaguio ibaguio commented Mar 11, 2023

I've recently updated the version of django-ordered-model from one of my projects, and noticed that the change from 3.6 to 3.7 caused one of my SQL query count tests to fail. After some investigation, I found that there were additional (unnecessary) query caused by wrt_map() when initializing the instance. Specifically, one (1) SQL query was added for every field in order_with_respect_to

I found that wrt_map gets the actual related object from the database. And the .save() method compares the value from _original_wrt_map.

Now, I figured, that instead of actually comparing the objects, we could achieve the same result by comparing the _id of the fields instead. More in this below.

To visualize the problem
Let's use the Answer model from our tests/models.py

Fetching an object from the database yieds:

answer = Answer.objects.get()    
# 3 sql queries!
# 1. SELECT "tests_answer"."id", "tests_answer"."order", "tests_answer"."question_id", "tests_answer"."user_id" FROM "tests_answer" ORDER BY "tests_answer"."question_id" ASC, "tests_answer"."user_id" ASC, "tests_answer"."order" ASC LIMIT 1
# 2. SELECT "tests_question"."id" FROM "tests_question" WHERE "tests_question"."id" = 1 LIMIT 21
# 3. SELECT "tests_testuser"."id" FROM "tests_testuser" WHERE "tests_testuser"."id" = 1 LIMIT 21

this is caused by the invocation of _wrt_map() from OrderedModelBase.init to define the _original_wrt_map.

Proposed Solution

The changes I've introduced in this PR, uses the field _id instead of the related object when building the wrt_map. This way, the related object is not fetched from the database (if not yet set prior).

Caveat: This solution only works for order_with_respect_to fields that are actually ForeignKeys. As of writing, I have not yet tested it on reverse FK, ManyToMany, etc. I know my changes will not work for these. I'm hoping someone could contribute to solving the problem.

Alternate (Temporary) Solution

The easier solution here, and without code change, would be to hardcode _id when defining the order_with_respect_to. So instead of

    class Answer(models.Model):
        order_with_respect_to = ("question", "user")

we could define our model:

    class Answer(models.Model):
        order_with_respect_to = ("question_id", "user_id")

This of course, requires that we update the documentation, and also assumes that the package user (developer) is aware of this problem, and would require them to update existing code. I would not suggest this as a long term solution.

--
Additional notes:

  • We could probably add more assertNumQueries checks on other parts of the tests

the newly introduced OrderedModelBase._wrt_map method gets the
value of the fields defined in order_with_respect_to. this forces
django to query the object from the database, but only to be used
for comparison. It is optimal to just compare the _id instead of
the actual model object.
@@ -0,0 +1,37 @@
# Query count helpers, copied from django source
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've added this utils file since this package supports django <3.1, and assertNumQueries was only introduced therein

@ibaguio ibaguio marked this pull request as draft March 11, 2023 18:36
@ibaguio ibaguio changed the title WIP: Optimize OrderedModelBase._wrt_map() - reduce query count Optimize OrderedModelBase._wrt_map() - reduce query count Mar 11, 2023
@ibaguio ibaguio changed the title Optimize OrderedModelBase._wrt_map() - reduce query count Optimize OrderedModelBase._wrt_map() - remove unnecessary query Mar 11, 2023
@shuckc
Copy link
Member

shuckc commented Mar 13, 2023

Firstly - thank you. I really appreciate the diligence it takes to uncover this sort of thing and then to raise it with us. I am happy to merge in the context manager and use it in tests to track this sort of regression. I didn't expect building the _original_wrt_map to fire off additional queries, but I lacked the tool to find out!

We should absolutely try to use the documented {field}_id trick to make this cheaper. There seems to be two ways to go ahead with it, one is to re-write the field lookups (as you have done), however to do so conditionally on the type of the field, probably when the model is elaborated rather than during usage. The second approach is to require the user to be explicit in what they want, and help them by raising a Django system check WARNING if a ForeignKey field is referenced without using _id - with wording to suggest this as a worthwhile optimisation.

I'll take a look at how best to move forward - we could use a test case for order_with_respect_to on a non-foreign key field, or at last some reasoning that this is a nonsensical configuration and reject it.

This will also improve other callers of _wrt_map() via _validate_orderdering_reference(), (notably swap, above and below for OWRT models) that would have triggered the extra queries, as well as rendering ordering buttons in the admin.

@shuckc
Copy link
Member

shuckc commented Mar 13, 2023

Looking at our ItemGroup and GroupedItem example, it would reduce to one query if for double-underscore relations we pre-fetched the intermediate object with an automatic select_related(), as well as use _id.

So our order_with_respect_to = "group__user" would need to be re-written as follows:

    class ItemGroup(models.Model):
        user = models.ForeignKey(User, on_delete=models.CASCADE)
        general_info = models.CharField(max_length=100)

+    class GroupedItemManager(OrderedModelManager):
+        def get_queryset(self):
+            return super(GroupedItemManager, self).get_queryset().select_related('group')

    class GroupedItem(OrderedModel):
        group = models.ForeignKey(ItemGroup, on_delete=models.CASCADE)
        specific_info = models.CharField(max_length=100)
-       order_with_respect_to = 'group__user'
+       order_with_respect_to = 'group__user_id'
+       objects = GroupedItemManager()

I think I am in favour of updating our docs and raising a ChecksERROR for any value in order_with_respect_to that is not a ForeignKey field, or one or more ForeignKey traversals ending with a ForeignKey field, so that we can rely on the _id portion.

@ibaguio
Copy link
Contributor Author

ibaguio commented Mar 14, 2023

You're absolutely welcome.

I think we can implement the checks to raise the warning, and update the docs first. That shouldn't take a lot of effort to do, while we devise a proper solution on how to improve the _wrt_map. I implemented the _id for my project, which was the cheapest solution. And to protect myself, I've added a test to check all models that has the order_with_respect_to defined, and make sure it ends with _id.

From my perspective, mitigating the issue of added SQL queries is CRITICAL. Some poor django app out there using v3.7 this of this awesome library would have its performance impacted, especially if they didn't have any query count tests to monitor their ORM code.

Here's a snippet of the test code i have in my project, which we could probably add for the checks.
Note:

  • that there are additional checks i've added here, not just for the order_wrt
  • this is structrued as a test, and not a check
    def test_ordered_models(self):
        """Test that Models that inherit OrderedModel has the correct queryset.

        If you are seeing this error, either:
        - the model's queryset is not inheriting the OrderedModelQuerySet
        - the model does not have ordering defined
        - the model.order_with_respect_to does not use the _id of the field
        """

        misconfigured_models = []
        all_ordered_models = [
            model
            for model in self.all_app_models  # all_app_models is defined somewhere else
            if issubclass(model, OrderedModelBase)
        ]

        for model in all_ordered_models:
            # check to make sure that models.objects is a subclass of our Manager / Queryset
            # this is in case there is a custom manager / queryset added by the dev, and overwritten OrderedModel
            if not (
                issubclass(model.objects.__class__, OrderedModelManager) or
                issubclass(model.objects.none().__class__, OrderedModelQuerySet)
            ):
                misconfigured_models.append(model)
           
            # Meta.ordering is required for OrderedModel
            if not model._meta.ordering:
                misconfigured_models.append(model)
            
            # this is the codeblock we need to check the _id
            if model.order_with_respect_to:
                for field in model.order_with_respect_to:
                    if not field.endswith("_id"):
                        misconfigured_models.append(model)

        self.assertTrue(
            not misconfigured_models,
            "\nMisconfigured models found:\n\n  > %s\n\nMore info found in test docstring." % (
                "\n  > ".join([
                    f"{model._meta.app_label}.{model.__name__}"
                    for model in set(misconfigured_models)
                ]),
            )
        )

@ibaguio
Copy link
Contributor Author

ibaguio commented Mar 14, 2023

Regarding the double underscore relations, I would suggest not using select_related, because it takes all the data of the related table. Imagine if your related model has a lot of fields, your DB query will be a bombshell due to the select_related, but then again, you only actually want the id of one field. We can just annotate the _id of the relation.

Top of my head, it would look like something like this:

+    class GroupedItemManager(OrderedModelManager):
+        def get_queryset(self):
+            return super().get_queryset().annotate(order_wrt_id=F("group__user_id"))

that's just a rough idea, of course the multiple order_with_respect_to values needs to be supported. This would require some django ORM ' magic', i'm happy to dive deep into that once i get some free time myself.

Personally, this is not an urgent issue for me, since all my usage of the OrderedModel, only has at most 1 order_with_respect_to, but nevertheless, needs to be supported

@shuckc
Copy link
Member

shuckc commented Mar 14, 2023

Very nice - I've converted these to system checks in b416595 (all Errors) as they are a common misconfiguration.

There is a legitimate test failure, adding the _id suffix breaks rendering of up/down arrows in the admin for inlines:

FAIL: test_move_up_down_links_ordered_inline (tests.tests.OrderedModelAdminTest)
FAIL: test_move_up_down_proxy_stacked_inline (tests.tests.OrderedModelAdminTest)
FAIL: test_move_links (tests.tests.OrderedModelAdminWithCustomPKInlineTest)

Confirmed by trivially reverting the following lines. Changing the keys of the OWRT dictionary likely needs a corresponding change in the admin where they are lifted out into the URL - will look at this later.

@@ -128,8 +128,8 @@ class OrderedModelBase(models.Model):
     def _wrt_map(self):
         d = {}
         for order_field_name in self.get_order_with_respect_to():
-            if not order_field_name.endswith("_id"):
-                order_field_name = order_field_name + "_id"

@shuckc
Copy link
Member

shuckc commented Mar 14, 2023

For now I have added a slow method OrderedModel._get_related_objects() known to trigger extra queries for use by the Admin. The admin already fired those queries prior to 3.7 during URL generation so it is not a regression here, although it looks like it could be improved. This passes all tests and gives us a no-worse than 3.6 release for the 3.7 series.

I've added documentation around the new Checks and if this looks good I'll release it at 3.7.2

@ibaguio
Copy link
Contributor Author

ibaguio commented Mar 14, 2023

I agree that admin should be improved later. I've seen first hand how an unsuspecting N+1 in the admin can crash a production environment. But for now, the fix for the wrt is good.

I'm good ✅ with your changes, and thank you for adding those django checks!

@shuckc shuckc marked this pull request as ready for review March 14, 2023 15:17
@shuckc shuckc merged commit 71d6e07 into django-ordered-model:master Mar 14, 2023
@ibaguio ibaguio deleted the remove_unecessary_query branch March 15, 2023 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants