Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCM inv source should trigger project update #12073

Merged
merged 3 commits into from
May 16, 2022

Conversation

fosterseth
Copy link
Member

@fosterseth fosterseth commented Apr 19, 2022

SUMMARY

scm based inventory sources should launch project updates prior to running inventory updates for that source (if update_on_launch is True for the project)

Implementation

The key change is to allow dependencies to have their own dependencies. A number of changes are needed to make that possible.

  1. Turn off symmetry for dependent_jobs M2M field
    It's better that this field capture some directionality of the relationship. This will help when we need to determine whether a job is blocked because a dependency has yet to run.

With change,

job.dependent_jobs.add(proj)

job.dependent_jobs.all() # returns [proj]
proj.dependent_jobs.all() # returns []
proj.unifiedjobs_blocked_by.all() # returns [job]
  1. Split up generate_dependencies() for Job and InventoryUpdate
    This keeps the code cleaner and easier to read
if type(task) is Job:
    created_dependencies += self.gen_dep_for_job(task)
elif type(task) is InventoryUpdate:
    created_dependencies += self.gen_dep_for_inventory_update(task)
  1. Call generate_dependencies() twice
    Now that dependencies can have their own dependencies, we need to call generate_dependencies() a second time in the task manager loop. This need not be recursive, because the depth is only going to be a max of 2 (JobA > Inventory UpdateA > Project Update A)

Importantly, this all applies if you start an inventory source update manually. It doesn't have to be in the context of of a job launch. If a project update_on_launch is True, the project will be updated if corresponding job or inventory source is launched.

ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME
  • API
AWX VERSION
awx: 20.0.2.dev249+g38ab7fc48c

@AlanCoding
Copy link
Member

Thanks for the implementation notes. The CI check is telling you that you need to rebase and bump the migration number.

I'll try to get a good look at this tomorrow, and will want to sync up about the status of testing.

@@ -575,7 +575,8 @@ class Meta:
dependent_jobs = models.ManyToManyField(
'self',
editable=False,
related_name='%(class)s_blocked_jobs+',
related_name='%(class)s_blocked_jobs',
symmetrical=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@@ -302,14 +304,10 @@ def create_inventory_update(self, task, inventory_source_task):
# self.process_inventory_sources(inventory_sources)
return inventory_task

def capture_chain_failure_dependencies(self, task, dependencies):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was kind of misnamed -- our mixins have the method get_jobs_fail_chain() that is unique to types of tasks (ProjectUpdate, InventoryUpdate, Job, etc).

in essence all this method does is add dependencies with activity stream off, so that is what we should call it.

with disable_activity_stream():
task.dependent_jobs.add(*dependencies)

for dep in dependencies:
# Add task + all deps except self
dep.dependent_jobs.add(*([task] + [d for d in dependencies if d != dep]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably related to the migration, but could you explain how you're able to remove this?

Copy link
Member Author

@fosterseth fosterseth Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this change here could have implications

consider this setup (JobA and JobB both share the projUpdate dependency. But only JobA depends on invUpdate)

            JobA                    JobB
            |  |                      |
            |  |                      |
            |  |                      |
            |  |                      |
invUpdate <-+  +--->  projUpdate <----+

Currently on devel, invUpdate and projUpdate are linked (because of the above code snippet that is in question). If one fails, it brings down the other. If invUpdate fails, it causes projUpdate to fail, and thus JobB would fail, even though JobB never depended on invUpdate in the first place.

I can't think of why this behavior is needed.

Because of the nature of sharing dependencies, if a project update or inventory update fails, it should only fail the jobs that directly depend on it, not other stuff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dependent_jobs field plays a role in determining which jobs fail, as I mention above.

The other role for dependent_jobs is to determine whether a job is blocked from running.

in task_manager.py

if not task.dependent_jobs_finished():
    blocked_by = task.dependent_jobs.first()

On devel this really only comes into play for Jobs

type Job

def dependent_jobs_finished(self):
    for j in self.dependent_jobs.all():
        if j.status in ['pending', 'waiting', 'running']:
            return False
    return True

But for ProjectUpdate and InventoryUpdate, this always returned True

def dependent_jobs_finished(self):
    return True

Therefore, the removed code in the diff doesn't really have an effect here. Jobs still have the same dependent_jobs as before. The only difference is that InventoryUpdate and ProjectUpdate do not have each other as dependencies, but this doesn't affect the is_blocked behavior.

@fosterseth fosterseth force-pushed the scm_invsrc_project_update branch 2 times, most recently from cbbe50d to c487f2b Compare May 6, 2022 16:38
blocked_jobs = list(self.unifiedjob_blocked_jobs.all())
other_updates = []
if blocked_jobs:
for dep in blocked_jobs[0].dependent_jobs.all():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit suspicious of this. I don't know that the blocked_jobs list is ordered... it might be, but that's really non-obvious looking at it, and the 0 reference, blocked_jobs[0] always feels suspicious. If this is well-established, then that's fine, but some code comments could help. What is the first entry of the blocked jobs?

If the list is always length 1, then maybe you should just loop instead of using the first one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that's probably the job. This is making more sense to me. It will fail its job if it fails... and any other unified jobs that is waiting on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah blocked_jobs are all the jobs that depend on this inventory update succeeding.

But, how can we fail the other inventory updates that were started too (to retain the same behavior as devel)? Well, we can get that by looking at one of the blocked_jobs and getting its dependent_jobs. These dependent jobs might include project updates, so we need to filter it down further by checking its type

if type(dep) is type(self) and dep.id != self.id:

We can really grab any of the blocked_jobs for this, as they all are guaranteed to have the same inventory, and thus the same set of inventory sources.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comments around this @AlanCoding

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

Ultimately, looks like these feed into handle_work_error via the dispatcher errback. That part of the design isn't bad, it's pretty sensible to handle the triggers this way. On the other hand, handle_work_error looks even worse than the reaper in terms for changing job status from underneath a running process, and not even bothering to send a cancel signal. Problem for another day...

except ValueError:
start_args = dict()
# generator for inventory sources related to this task
task_inv_sources = (invsrc for invsrc in self.all_inventory_sources if invsrc.inventory == task.inventory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially when looping over a global jobs list like self.all_inventory_sources, I would like to stick with *_id fields for performance reasons (even if it may or may not make a difference). So here, I would make the conditional be if invsrc.inventory_id == task.inventory_id

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well I get that you're just moving the code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made the change

@@ -572,6 +596,8 @@ def process_tasks(self, all_sorted_tasks):
pending_tasks = [t for t in all_sorted_tasks if t.status == 'pending']
undeped_tasks = [t for t in pending_tasks if not t.dependencies_processed]
dependencies = self.generate_dependencies(undeped_tasks)
deps_of_deps = self.generate_dependencies(dependencies)
dependencies += deps_of_deps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you mentioning this, but now I get it. Basically, deps_of_deps can only be the project updates generated for inventory updates. Because you know it only goes this-many levels deep it's written like this and not iterated indefinitely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly

Copy link
Member

@AlanCoding AlanCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have 2 requests remaining

  1. that we add some code comments to the effect that blocked_jobs[0] is just a back-reference to the job and
  2. change related object references to the *_id field in a case, this will make future optimization work go more smoothly

after these, I'm happy to approve

@fosterseth
Copy link
Member Author

fosterseth commented May 11, 2022

my latest commit addresses an issue where the task manager can start a job even if a dependency has failed.
job_blocked_by checks if a dependency is in ACTIVE_STATES and if not, proceeds to run. Well that isn't good because the dependency could be a failed or error state.

The job would run and probably fail during execution because it detects that the last project update/inventory update has failed, stopping it in its tracks. But we can prevent the dispatcher from even getting that far by failing the task as soon as we know we should (in the task manager).

@AlanCoding please take a look at my most recent commit to see how I handled the above ^

b94b1ef

note: this is a current issue in devel as well

# dependency has failed or errored.
elif dep.status in ("error", "failed"):
task.status = 'failed'
task.job_explanation = 'Previous Task Failed: {"job_type": "%s", "job_name": "%s", "job_id": "%s"}' % (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only note is that I'd like this message to be distinguishable from the similar message in the handle_work_error task - for ease of debugging. Any changing of the wording whatsoever would be sufficient to accomplish that.

- scm based inventory sources should launch project updates prior to
running inventory updates for that source.
- fixes scenario where a job is based on projectA, but the inventory
source is based on projectB. Running the job will likely trigger a
sync for projectA, but not projectB.

comments
…ate to the inventoryupdate.source_project field
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants