Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass combined artifacts from nested workflows into downstream nodes #12223

Merged
merged 9 commits into from
Jun 23, 2022

Conversation

AlanCoding
Copy link
Member

SUMMARY

Connect #4481

The issue is a reasonable request, and I am trying not to overthink the problem.

As of putting up this PR, I am doing this without a schema change. This trades lower memory use for more processing cost. Because of that, I've tried to write the method get_combined_artifacts relatively efficiently, although you can come up with cases where it will slog.

ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME
  • API
AWX VERSION
21.0.0
ADDITIONAL INFORMATION

I've done two test cases

  • like the issue states, a workflow with (sliced job) --> (receiving job), and verify that receiving job gets artifacts
  • a workflow inside a workflow with normal job, and receiving job downstream in outer workflow, verify receiving job gets artifacts

Both test cases fail in devel, finding that the receiving job does not get the artifacts.

@AlanCoding AlanCoding marked this pull request as ready for review May 12, 2022 15:10
@AlanCoding
Copy link
Member Author

test results look good, ready for review.

@AlanCoding
Copy link
Member Author

having thought about it, I realize I do need to put in some recursion protection.

@@ -57,6 +57,20 @@
logger = logging.getLogger('awx.main.models.workflow')


def get_artifacts_from_unified_job(job):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this standalone function, why not just have different implementations of .get_combined_artifacts() on the different UnifiedJob subtypes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to process, but I wound up going with exactly this suggestion. That allowed getting rid of this method altogether.

job_queryset = (
UnifiedJob.objects.filter(unified_job_node__workflow_job=self)
.defer('job_args', 'job_cwd', 'start_args', 'result_traceback')
.order_by('status', 'finished', 'id')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit iffy on ordering by status. This will wind up being alphabetical, which feels like it might have unintended consequences.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, I was kind of fishing for a comment like this. The status (s)uccessful comes after (f)ailed and (e)rror, thus, the successful jobs will have their artifacts take precedence.

In practice, I'm iffy on whether error or failed jobs should contribute artifacts in the first place. Still unsure on what the best thing to do here is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.filter(status='successful') instead, then?

This brings to mind a thing that I was enthusiastic about when I noticed it early on in the Django upgrades, but that got buried under the weight of dealing with the difficult bits: Django 3 supports actual enumerated choices for choice fields. https://docs.djangoproject.com/en/3.2/ref/models/fields/#enumeration-types

It'd be nice to have these, and avoid having to hard-code choice strings. Not urgent, just kind of pleasant.

)
if parents_set is None:
parents_set = set()
new_parents_set = parents_set | set([self.id])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to start getting away from Python 2-isms like set([...]).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it even possible to have a cycle in the jobs? The job templates, yes, I could see that potentially happening, but even if the same JT is used in multiple places inside a nested workflow won't it just spawn distinct new jobs (thus having new pks)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, as structured I don't think it would have any cycles. The map of workflow jobs would always be a tree, since a workflow job node can't (in practice) link to the same workflow job as another job.

I kept coming back to questioning if we should track and skip duplicate workflow job templates, which is a much stronger criteria. A workflow can use the same WFJT inside of it multiple times, or it can be nested deeper. This would cut down on queries, but I'm not sure if users might actually want to combine them because multiple runs of the same WFJT could give different artifacts.

@@ -682,6 +681,28 @@ def get_ancestor_workflows(self):
wj = wj.get_workflow_job()
return ancestors

def get_effective_artifacts(self, parents_set=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just make this signature (self, **kwargs) as well, and get parents_set out of the kwargs dict? It will make things just a bit nicer if we wind up adding more things later on.

@@ -318,8 +318,7 @@ def get_job_kwargs(self):
for parent_node in self.get_parent_nodes():
is_root_node = False
aa_dict.update(parent_node.ancestor_artifacts)
if parent_node.job and hasattr(parent_node.job, 'artifacts'):
aa_dict.update(parent_node.job.artifacts)
aa_dict.update(parent_node.job.get_effective_artifacts(parents_set=set([self.workflow_job_id])))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

I need to think through this. This is the part that I worry is going to cause a bunch of repeated queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I think this isn't a problem.

I still dislike set([...]), though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants