-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve task manager performance for task dependencies #5787
Improve task manager performance for task dependencies #5787
Conversation
awx/main/scheduler/task_manager.py
Outdated
@@ -598,6 +536,8 @@ def process_tasks(self, all_sorted_tasks): | |||
self.process_running_tasks(running_tasks) | |||
|
|||
pending_tasks = [t for t in all_sorted_tasks if t.status == 'pending'] | |||
dependencies = self.generate_dependencies(pending_tasks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not only pass in tasks that have dependencies_processed=False
?
Otherwise, you're attempting updates to lots of rows every 30 seconds. Even if they are no-ops, that sounds like a recipe for database conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, fixed it.
Build failed.
|
The unit test is problematic Although the Therefore it seems measuring the elapsed time and comparing to a flat expected value is not a great idea. I will drop this test unless someone can think of a more reliable way to approach this. |
Build failed.
|
I think that coverage in Zuul tests would be good. For example, create a job, run the task manager, and verify that the flag is swapped. Run it again, and verify that the generate_dependencies method is called without that job. Since this is an internal field, this is the only place we can have this kind of coverage (except indirectly of course). |
awx/main/scheduler/task_manager.py
Outdated
def generate_dependencies(self, task): | ||
dependencies = [] | ||
if type(task) is Job: | ||
def generate_dependencies(self, pending_tasks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be better to call the argument something else, like undeped_tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed variable name to undeped_tasks
@AlanCoding Wrote a new functional unit test that makes sure |
Build failed.
|
cb7392e
to
8eecbc6
Compare
Build failed.
|
# .call_args is tuple, (positional_args, kwargs), [0][0] then is | ||
# the first positional arg, i.e. the first argument of | ||
# .generate_dependencies() | ||
assert job not in tm.generate_dependencies.call_args[0][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are no other jobs in the database here, I would prefer a stricter assertion of tm.generate_dependencies.call_args[0][0] == []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlanCoding because project.scm_update_on_launch == True, then .call_args[0][0] contains [ProjectUpdate]. I like the idea of having a stricter assertion though, so I will turn off the project update and assert the empty list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of my favorite pull requests ever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging into this @fosterseth; this part of AWX can be gnarly, and I'm excited about this optimization - great work!
Once zuul is passing, and @elyezer signs off, let's merge this one 🚢
@fosterseth @ryanpetrello so the intent here is to improve the depency graph generation without changing any current behavior, right? I think @squidboylan will be interested on taking a look on this one. |
Yep, once zuul is passing I'll take a look and can sign off |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it.
e6c1899
to
edd7634
Compare
Build failed.
|
edd7634
to
90c8705
Compare
Build succeeded.
|
90c8705
to
51a7f20
Compare
Build succeeded.
|
51a7f20
to
25862dc
Compare
This adds a boolean "dependencies_processed" field to UnifiedJob model. The default value is False. Once the task manager generates dependencies for this task, it will not generate them again on subsequent runs. The changes also remove .process_dependencies(), as this method repeats the same code as .process_pending_tasks(), and is not needed. Once dependencies are generated, they are handled at .process_pending_tasks(). Adds a unit test that should catch regressions for this fix.
6d3aa0d
to
9b4b216
Compare
Build succeeded.
|
class Migration(migrations.Migration): | ||
|
||
dependencies = [ | ||
('main', '0107_v370_workflow_convergence_api_toggle'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
@elyezer yes the intent is to preserve the behavior overall, just improve performance by removing redundant work. |
Build succeeded (gate pipeline).
|
I am wondering what happens when a project update fails one consequence of this PR is that
old: If the project succeeds, then all is well and this list is never used
Line 572 in 13faa0e
I'm not sure if this method is intended to handle a lot of items and might bog the system down if a project failed with a ton of pending tasks behind it -- I see a lot of queries, saves, and websocket messages Note: The old task manager would eventually fail the tasks through the pre_run_hook() |
@fosterseth I actually think the new behavior (update them all) makes more sense in this failure scenario.
I'm not drastically concerned about this scenario; this doesn't sound like a very regularly occurring event to me. |
For completion sake I'll put what I said in the merge meeting here in the PR. Setup:
Before:
After:
I am fine with this. This is better. As long as the user can set the cache_timeout=0 and, effectively, Before behavior. Why might this be important? Because I can imagine a customer that is running a Job that does a git code update, in a workflow, that relies on the next job syncing that code update. |
SUMMARY
#5154
continuation of #5672
Task dependencies include project updates or inventory source updates. The task manager looks through each Job in "pending" state and determines if it needs to launch an associated project update or inventory source update. This part of the task manager is quite slow.
The task manager is not persistent. A new Task Manager object is launched each time (once per 30 seconds, and on external signals). Therefore, it must rediscover the state of running/pending jobs each time.
These changes help alleviate the redundant work the task manager performs each time it is called.
1. Remove the method
.process_dependencies()
This method is nearly a copy and paste of
.process_pending_tasks()
, so we can just call that directly instead2. Add
dependencies_processed
boolean field to UnifiedJob modelThis boolean should be interpreted as "Has the task manager checked and processed potential dependencies for this job". Not all jobs have dependencies by nature (ad hoc commands, for example). The task manager's job is to check if it has dependencies, and whether it should spawn those dependencies.
3. Only generate dependencies once for each task
The current task manager will generate/spawn dependencies for pending jobs each time it runs. We only need to do this once. We can check that
task.dependencies_processed
is False before runninggenerate_dependencies()
.Performance:
Calling
TaskManager().schedule()
on 5,000 "pending" state Jobs takes around 8 seconds (vs 200 seconds with the previous TaskManager)Note, this is the elapsed time after the dependencies have been already generated. Generating dependencies for 5,000 jobs the first time takes a few minutes.
ISSUE TYPE
COMPONENT NAME
AWX VERSION
ADDITIONAL INFORMATION