Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize admin queries #110

Closed
wants to merge 5 commits into from
Closed

Conversation

henribru
Copy link
Contributor

@henribru henribru commented Nov 25, 2022

Fixes #88

With these changes the number of queries should now be constant and independent of the number and nature of the tasks.

I must say, the make file in this repository is excellent! It makes contributing very simple.

Before:
Screenshot from 2022-11-25 13-34-57
Screenshot from 2022-11-25 13-35-00

After:
Screenshot from 2022-11-25 13-36-12
Screenshot from 2022-11-25 13-34-37

@Skrattoune
Copy link
Contributor

Thanks for this @henribru !
I was intitially not convinced by the rename of '+' into subtasks as it generates a migration ... even if the name made more sense for me.
But it seems it is mandatory to have it working.

For the ChangeList, it works very well.

It works very well for me, with and without the :

@cached_property
    def executing_dt(self):
        if hasattr(self, "_executing_dt"):
            return self._executing_dt

So I'm not completely sure about the use of this. Could you please explain?

For the ChangeForm, I still have n+1 query on TaskModel.state.
Do you have them as well ?

@Skrattoune
Copy link
Contributor

In fact I don't really know what is the specificity & role of execute_dt. I personally use create_dt and update_dt
Could you clarify it for me?

@henribru
Copy link
Contributor Author

In fact I don't really know what is the specificity & role of execute_dt. I personally use create_dt and update_dt

executing_dt is used in elapsed_sec which is used in human_throughput, and human_throughput is used in the admin, that's why I've had to optimize it.

It works very well for me, with and without the :
So I'm not completely sure about the use of this. Could you please explain?

The default implementation of executing_dt leads to a query per task, so if we don't annotate _executing_dt and then use it in executing_dt we still have n+1. The reason I'm doing the ugly hasattr trick is so the old implementation is still available for backwards compatibility for anyone using this outside of the admin.

For the ChangeForm, I still have n+1 query on TaskModel.state.
Do you have them as well ?

Probably, I didn't focus on optimizing ChangeForm just because I've never had particularly noticeable performance issues with it.

@henribru
Copy link
Contributor Author

I've been doing some testing on a production app where there are millions of TaskModel and SignalInfoModel. For TaskModel these changes definitely help, though it's still quite slow. One interesting thing I noticed is that a lot of the time is actually spent calculating the filters. Indexing TaskModel.name, SignalInfoModel.signal_name and SignalInfoModel.hostname seems to help with that issue, so it could be worth considering adding that.

The select_related on task for SignalInfoModel seems to cause issues on such a large data set though, to the point where the change list isn't even loading in any time I've bothered waiting. I haven't investigated precisely which query it hangs on yet, but it seems like perhaps this join ends up being a lot slower than just doing n+1 queries. I don't really understand why given that the select_relateds on TaskModel are improving performance. Maybe it will be more performant to do prefetch_related since that gets rid of the join at the cost of 1 additional query. I need to do some more investigation

@Skrattoune
Copy link
Contributor

For the ChangeForm, I still have n+1 query on TaskModel.state.
Do you have them as well ?

Probably, I didn't focus on optimizing ChangeForm just because I've never had particularly noticeable performance issues with it.

So I checked, the calls responsible for the ChangeForm n+1 calls are not related to the calls made by ChangList. Therefore it does not add any benefit to move the get_queryset from TaskModelChangeList to TaskModelAdmin

But as you say, we can live with those ChangeForm n+1 calls

@Skrattoune
Copy link
Contributor

Skrattoune commented Nov 29, 2022

It works very well for me, with and without the :
So I'm not completely sure about the use of this. Could you please explain?

The default implementation of executing_dt leads to a query per task, so if we don't annotate _executing_dt and then use it in executing_dt we still have n+1. The reason I'm doing the ugly hasattr trick is so the old implementation is still available for backwards compatibility for anyone using this outside of the admin.

What about implementing your idea, but without declaring _executing_dt? executing_dt is anyway declared as a cached_property. So we can directly set it with the get_queryset annotation:

qs = (
            qs.filter(parent_task__isnull=True)
            .prefetch_related(
                Prefetch(
                    "sub_tasks",
                    queryset=TaskModel.objects.select_related("state")
                    .annotate(executing_dt=executing_dt)
                    .order_by('-create_dt'),
                )
            )
            .annotate(executing_dt=executing_dt)
        )

and then we don't need the extra 2 lines:

        if hasattr(self, "_executing_dt"):
            return self._executing_dt

It seems to work fine for me
What do you think?

@henribru
Copy link
Contributor Author

Ah, I see what you mean. I thought maybe if I annotated executing_dt the property method would take precedence and still be called, but you're probably right that it's the opposite

@Skrattoune
Copy link
Contributor

Ah, I see what you mean. I thought maybe if I annotated executing_dt the property method would take precedence and still be called, but you're probably right that it's the opposite

because of the cached_property decorator, the system first looks into the cache if there is already a value.
If there is, it returns it
if there isn't, then it runs the code as defined in the property definition, and store the result into the cache for further use

When you annotate, you store data for a given property into the cache ... so you fill up the cache

huey_monitor/models.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Dec 1, 2022

Codecov Report

Merging #110 (28e728c) into master (c5a741e) will increase coverage by 0.18%.
The diff coverage is 84.61%.

@@            Coverage Diff             @@
##           master     #110      +/-   ##
==========================================
+ Coverage   83.92%   84.11%   +0.18%     
==========================================
  Files          40       41       +1     
  Lines         840      850      +10     
==========================================
+ Hits          705      715      +10     
  Misses        135      135              
Impacted Files Coverage Δ
huey_monitor/models.py 93.10% <50.00%> (-0.76%) ⬇️
huey_monitor/admin.py 86.90% <83.33%> (+1.71%) ⬆️
...tor/migrations/0010_alter_taskmodel_parent_task.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

huey_monitor/admin.py Outdated Show resolved Hide resolved
jedie pushed a commit that referenced this pull request Dec 27, 2022
@jedie jedie mentioned this pull request Dec 27, 2022
@jedie
Copy link
Collaborator

jedie commented Dec 27, 2022

@henribru i used your code changes and create a new PR with rebased code here: #114

@jedie jedie closed this Dec 27, 2022
@henribru
Copy link
Contributor Author

henribru commented Dec 27, 2022

@henribru i used your code changes and create a new PR with rebased code here: #114

Okay, great! Though if you could add me as a co-author on the commit with the syntax shown in https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors that'd be nice

jedie pushed a commit that referenced this pull request Dec 28, 2022
@jedie
Copy link
Collaborator

jedie commented Dec 28, 2022

@henribru
Copy link
Contributor Author

@jedie Don't think it quite worked, when it does it should show both me and you as authors, random example from another repo:
image

I think it's because you used a link to my profile in the brackets instead of an email address. You can use the one from my commits in this branch, i.e. 6639509+henribru@users.noreply.github.com

jedie pushed a commit that referenced this pull request Dec 28, 2022
Contributed by https://github.com/henribru in #110

Co-authored-by: henribru <6639509+henribru@users.noreply.github.com>
@jedie
Copy link
Collaborator

jedie commented Dec 28, 2022

fixed:
grafik

@henribru
Copy link
Contributor Author

Thanks, appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Database queries take too much time when loading the task list
4 participants