Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] use "defer" to avoid loading extra (possibly expensive) data in Task Manager #12157

Open
kdelee opened this issue May 3, 2022 · 1 comment
Labels
needs_refinement Apply to items in 'Backlog' that need refinement.

Comments

@kdelee
Copy link
Member

kdelee commented May 3, 2022

In the task manager, we make a series of queries such as:

https://github.com/ansible/awx/blob/devel/awx/main/scheduler/task_manager.py#L90-L100

graph_workflow_jobs = [wf for wf in WorkflowJob.objects.filter(status='running')]

return [invsrc for invsrc in InventorySource.objects.filter(inventory_id__in=inventory_ids, update_on_launch=True)]

We do not need all the fields that come back from these queries. I had attempted to cut down on how much data we load in #12045 by using the only directive, but it was later reverted because we determined that because of ways the polymorphic models have secret fields, that this resulted in MORE queries (@AlanCoding helped me find this by hacking the dev environment and using the django debug toolbar. We made the Task Manager only run when we hit an api endpoint in order to get this information.)

An alternative approach would be to use defer https://docs.djangoproject.com/en/4.0/ref/models/querysets/#defer to specifically exclude certain fields.

I had manually searched through the code to find that we reference ['celery_task_id', 'controller_node', 'created', 'execution_node', 'instance_group', 'job_explanation', 'name', 'pk', 'status'] so the work would consist of finding out which fields are on the models that are NOT in that list, and defer those fields.

This came up while looking at #11671

@AlanCoding
Copy link
Member

I would categorize this as more performance rather than tech debt. Just making that point now as I'm trying to organize issues.

@kdelee kdelee changed the title [Tech debt] use "defer" to avoid loading extra (possibly expensive) data in Task Manager [Performance] use "defer" to avoid loading extra (possibly expensive) data in Task Manager May 12, 2022
@john-westcott-iv john-westcott-iv added the needs_refinement Apply to items in 'Backlog' that need refinement. label Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs_refinement Apply to items in 'Backlog' that need refinement.
Projects
None yet
Development

No branches or pull requests

4 participants