Use FQCN for roles where available #1021

AlanCoding · 2022-03-11T03:08:44Z

Connect #1019

Keeping track of the event data from these test cases is hard. Events go:

playbook_on_start
playbook_on_play_start
playbook_on_task_start
runner_on_start
runner_on_ok
playbook_on_stats

The bolded event types have historically had the role entry inside of the event_data dict.

The new resolved_action key is only present on "playbook_on_task_start". This is a break in pattern from other keys like task, which occurs through all events where that task is active, similar to role, those 3 bolded events. I tried to raise this point in #995, although without the full testing and context. Here's the full event data:

https://gist.github.com/AlanCoding/325a84d4121249188a573ad25249cda7

The ben.json file corresponds to the existing test that this follows the pattern of, and peanut.json is for the test added here.

Speaking in broad strokes, task and role keys are contextual in nature. A role will run multiple tasks, and everything in that context will be marked with that role. However, resolved_action gives the FQCN of a task only at the start of the task. This doesn't leave good options for the role FQCN. Introduce a new key, sure, but would this key be contextual or not? If it is non-contextual like resolved_action, then it would only appear in "playbook_on_task_start", which is really weird for a role, since the role has multiple tasks. So that would leave it as weird semi-contextual thing. To make it contextual, it would become inconsistent with the implementation of resolved_action.

So here I'm testing the waters to see if there's any pushback for the "cheap" option. Make the role key FQCN if it's a collection.

softwarefactory-project-zuul · 2022-03-11T03:13:11Z

Build succeeded.

ansible-build-python-tarball : SUCCESS in 3m 56s

AlanCoding · 2022-03-11T15:04:21Z

To help review of this change, here is the new test's event data, with everything removed except what I find potentially relevant.

[
  {
    "event": "playbook_on_start",
  },
  {
    "event": "playbook_on_play_start",
  },
  {
    "event": "playbook_on_task_start",
    "event_data": {
      "task": "debug",
      "task_action": "debug",
      "role": "groovy.peanuts.hello",
      "name": "groovy.peanuts.hello : debug",
      "resolved_action": "ansible.builtin.debug"
    }
  },
  {
    "event": "runner_on_start",
    "event_data": {
      "task": "debug",
      "task_action": "debug",
      "role": "groovy.peanuts.hello",
      "host": "testhost",
    }
  },
  {
    "event": "runner_on_ok",
    "event_data": {
      "task": "debug",
      "task_action": "debug",
      "role": "groovy.peanuts.hello",
      "host": "testhost",
    }
  },
  {
    "event": "playbook_on_stats",
  }
]

sivel · 2022-03-11T15:32:15Z

The new resolved_action key is only present on "playbook_on_task_start". This is a break in pattern from other keys like task, which occurs through all events where that task is active, similar to role, those 3 bolded events

That sounds like a bug in core where the data is not persisting somehow. All references to the task object should have that attribute. I'd file an issue to get it looked at.

AlanCoding · 2022-03-11T16:34:37Z

@sivel we already discussed the diff to do that

diff --git a/ansible_runner/display_callback/callback/awx_display.py b/ansible_runner/display_callback/callback/awx_display.py
index aa29f39..0aee596 100644
--- a/ansible_runner/display_callback/callback/awx_display.py
+++ b/ansible_runner/display_callback/callback/awx_display.py
@@ -409,6 +409,7 @@ class CallbackModule(DefaultCallbackModule):
             task=(task.name or task.action),
             task_uuid=str(task._uuid),
             task_action=task.action,
+            resolved_action=getattr(task, 'resolved_action', ''),
             task_args='',
         )
         try:
@@ -554,7 +555,6 @@ class CallbackModule(DefaultCallbackModule):
             name=task.get_name(),
             is_conditional=is_conditional,
             uuid=task_uuid,
-            resolved_action=getattr(task, 'resolved_action', '')
         )
         with self.capture_event_data('playbook_on_task_start', **event_data):
             super(CallbackModule, self).v2_playbook_on_task_start(task, is_conditional)

This was probably not done, because it's unclear what the objective is, or what the standards for runner's event_data is. I know it's been said many times that this is a stable API, so not changing long-standing fields (to some reasonable extent) is an objective. I am fudging that in this PR by changing the content of role from the name to the FQCN. I don't want an answer for resolved_action in isolation of the role FQCN.

For the overall objective, how should the JSON data I posted above look, with task and role FQCNs?

What is task_action anyway, why is it different from task, and why should it not be the FQCN?

If task is the name, then should role also be the name (as opposed to the FQCN)? And then we introduce another (contextual) field resolved_role? But the callback from Ansible core doesn't have any notion of a "resolved" role, so I have strong hesitations.

The resolved_role field was only just merged recently, so we're free to change that to be more consistent, but I don't know how I would.

AlanCoding · 2022-03-14T17:26:50Z

I put up #1023 to address the issue of the code coverage decline.

Ladas · 2022-03-17T13:04:24Z

@AlanCoding @sivel
Assuming the naming is the same in analytics:

task: task name, whatever user writes, e.g. "Install httpd module"

task_action: module used, not sure if it can be other resource too, e.g. "yum" (based on what user inputs, so can be "ansible.builtin.yum")

resolved_action: the FQCN of resource (we see mainly modules in events), e.g. ansible.builtin.yum

role: role name, based on what user inputs, it can be role name, FQCN, path to /tmp/xy/roles/role_x, relative path, etc.

Based on the above, it seems it'd be consistent to have resolved_role that would be role's FQCN.

My assumption is also that if the content doesn't come from a collection the resolved_* will be null.And if the content doesn't come from role the *role* will be null.

As for the main usecase, we need a good way to detect:

content comes from collection
collection namespace and name
resource name

We should be able to nicely get that by parsing the FQCN attributes, counting on schema always being {namespace}.{name}.{dotted_path?}{resource} or null (empty string will be acceptable too).

AlanCoding · 2022-03-17T16:10:47Z

This diff changes the role field to be FQCN, and I believe it should work for analytics needs.

Ladas · 2022-03-17T16:55:44Z

@AlanCoding I don't see test on what happens when the role is standalone (not part of a collection). I think with 1 field it might be harder to recognize?

Or will it be enough to just say, if the format is {namespace}.{name}.{dotted_path?}{resource}, the role comes from collection? If yes, one field should be usable. (the processing side in Insights will need Postgres Regexp though)

So for me, separate field will have more consistency and easier processing. But I am ok with whatever you guys decide, if we can detect role as part of collection, role without collection and no role.

AlanCoding · 2022-03-17T19:01:58Z

That's right! There's a pre-existing test for a role that's not from a collection.

ansible-runner/test/integration/test_events.py

Line 135 in 49062c4

    
           role_events = [event for event in r.events if event.get('event_data', {}).get('role', '') == "benthomasson.hello_role"]

The new test uses a make-believe role from a collection, groovy.peanuts.hello.

To make things painfully complicated, benthomasson.hello_role is the name of a folder, not a sub-folder benthomasson/hello_role/. I don't know of a test for that case... which could be sub-divided into within a collection and not within a collection.

I didn't think of the point you raised. It's true, this role name cannot distinguish between a collection and a dotted path (or name). Maybe you were trying to make this point earlier and I wasn't getting it.

So for me, separate field will have more consistency and easier processing. But I am ok with whatever you guys decide, if we can detect role as part of collection, role without collection and no role.

I don't think anyone wants to make a decision. Say we add another field for the role FQCN. This probably needs to go in all events that currently have role. That does argue for changing resolved_action to be in all event data that has task.

What would we call the new field? role_fqcn? resolved_role?

Ladas · 2022-03-18T13:25:57Z

@AlanCoding resolved_role sounds good to me. And I was expecting that resolved_action would be present for all tasks and null if the module doesn't come from a collection (so e.g. coming from a project SCM repo).

softwarefactory-project-zuul · 2022-03-18T15:19:15Z

Build succeeded.

ansible-build-python-tarball : SUCCESS in 4m 01s

AlanCoding · 2022-03-18T15:43:22Z

And I was expecting that resolved_action would be present for all tasks and null if the module doesn't come from a collection (so e.g. coming from a project SCM repo).

This is beyond the scope of the discussion so far, so I tested it manually. Using a local module in library/ next to the playbook I find this in the event data:

"task_action": "my_test", "resolved_action": "my_test"

I was designing the resolved_role to not be present in these cases. I tend to prefer that we not add irrelevant data because it bloats the amount of data users have to get through. I guess I could do the same thing for resolved_action, and I will see about pushing another commit for that now.

AlanCoding · 2022-03-18T15:48:13Z

^ maybe I'm not so confident in that. I have a little bit of hesitancy about adding that change onto this.

What we have now for the role case we care about looks like:

[
  {
    "event": "playbook_on_start",
  },
  {
    "event": "playbook_on_play_start",
  },
  {
    "event": "playbook_on_task_start",
    "event_data": {
      "task": "debug",
      "task_action": "debug",
      "role": "hello",
      "resolved_role": "groovy.peanuts.hello",
      "name": "groovy.peanuts.hello : debug",
      "resolved_action": "ansible.builtin.debug"
    }
  },
  {
    "event": "runner_on_start",
    "event_data": {
      "task": "debug",
      "task_action": "debug",
      "role": "hello",
      "resolved_role": "groovy.peanuts.hello",
      "host": "testhost",
    }
  },
  {
    "event": "runner_on_ok",
    "event_data": {
      "task": "debug",
      "task_action": "debug",
      "role": "hello",
      "resolved_role": "groovy.peanuts.hello",
      "host": "testhost",
    }
  },
  {
    "event": "playbook_on_stats",
  }
]

Ladas · 2022-03-18T17:17:33Z

@AlanCoding I think the resolved_role looks good like this. And it's good it's present on runner_on_ok. The Insights is looking mostly on the tasks with terminal states, so having every important info there is good.

Lets chat with @sivel when he's back. Ideally the resolved_action will behave the same as resolved_role and I think it should be there only for content from the actual Collections. (unless I am missing something, having just the user inserted name [task_action, role] for non-collection content seems ok to me)

sivel · 2022-03-23T14:34:17Z

I'm a bit behind in here, and was having trouble following what if anything was needed of me. @Ladas provided these topics:

We should have separate attr resolved_role. (to help recognize if the content comes from a collection and easier parsing)

I'd say that is fine. You can still use task._role.get_name() for both if you want. get_name(include_role_fqcn=True/False).

resolved_action and resolved_role should be empty when the content doesn't come from a collection

I guess this just depends on the goals of how the data will be used. The core code doesn't concern itself with whether resolved_action represents a collection or not, but you are free to inspect it and use it however it's most relevant to you. The regex for matching an FQCN is defined as VALID_FQCR_RE = re.compile(r'^\w+(\.\w+){2,}$')

resolved_action and resolved_role should be present in all events. (Insights is only looking at the terminal events, like runner_on_ok, to avoid unnecessary traversing of the related events)

I think this is the right thing to do. I don't think it should only be present in playbook_on_task_start

Ladas · 2022-03-23T15:20:52Z

@sivel right for 3.): One of our usecases is to recognize if the content comes from collection or not. Are you saying we're not able to do that?

Or are you saying that matching VALID_FQCR_RE = re.compile(r'^\w+(.\w+){2,}$') guarantees the resource is from a collection? From @AlanCoding comment #1021 (comment), it wasn't clear to me if we're e.g. able to craft standalone role, that would match the regexp.

sivel · 2022-03-23T15:26:16Z

Technically you could poke task._role._collection_name but it's supposed to be private, and exposed via get_name() instead. ._role is named like it should be private, but that was just lack of imagination that we didn't make it public. So effectively that is public, just the attrs hanging off of ._role really aren't.

As such, there isn't really a straight forward way to know if someone just created roles/my.cool.role, you wouldn't know if it was a collection or just a role name that looked like a collection resource.

If you do poke at ._collection_name just know that it could be renamed or modified in some backwards incompatible way.

Ladas · 2022-03-24T17:22:02Z

@AlanCoding @sivel Sounds like using private methods that can become backwards incompatible is not a great idea.

I think that if we'll put there FQCN in a format {namespace}.{name}.{dotted_path?}{resource}, Insights for AAP can parse the namespace.name from it and the we can join this to the installed_collections we have on job, to get accurate collection usage. That should filter out the edge cases, where non-collection resource has FQCN format. I think this would need to be done anyway, to get usage distribution for specific collection versions.

I assume the installed_collections sent in ansible/awx@0e80f66, should have a list of all valid collections, so this approach should be viable.

AlanCoding · 2022-03-25T15:10:10Z

As such, there isn't really a straight forward way to know if someone just created roles/my.cool.role, you wouldn't know if it was a collection or just a role name that looked like a collection resource.

Is there not? It sounded like you were converging on:

role = task._role.get_name(include_role_fqcn=False)
resolved_role = task._role.get_name(include_role_fqcn=True)

Having both of these, you could get the collection of the role without much more complex references to the list of installed collections. The approach of comparing to installed collections won't event work for my test case because I'm using a collection in source control next to the playbook.

My concern about task._role.get_name(include_role_fqcn=True/False) is that I don't have an ironclad of the versions of ansible-core this works with. I'm pretty sure we're testing with Ansible 2.9, but I'm unclear if the executor team has a "soft" goal of actually working with versions further back in time. But I could just write this in in a super ugly defensive way and be done with it.

sivel · 2022-03-25T15:15:37Z

include_role_fqcn was added in 2.9.10 to role.get_name()

ansible/ansible@1f3185d

AlanCoding · 2022-03-28T20:34:50Z

Well then, honestly, the current diff of this PR remains my preference unless someone has a highly specific change to request.

Ladas · 2022-03-31T14:19:47Z

@AlanCoding @sivel if there is no other way, I think that checking resolved_role with {namespace}.{name}.{dotted_path?}{resource} is acceptable (even when it is not 100%, that this means actual Collection content)

So I am fine with the current approach.

@AlanCoding do I need to file an issue for resolved_action being in all events? (and not just the playbook_on_task_start) Or is it captured somewhere? It'd be good to finish this for a current release. (or do you actually implement it here? Since I see the resolved_action is moving)

AlanCoding · 2022-03-31T14:24:17Z

@Ladas why don't you subtract the "role" from "resolved_role", from the test case:

      "role": "hello",
      "resolved_role": "groovy.peanuts.hello",

Like you're saying, there's no other correct way to do this. In the current implementation, any task that has "resolved_role" will also have "role" (not necessarily the other way around, when a role isn't from a collection).

do I need to file an issue for resolved_action being in all events?

No, that's currently implemented in this PR.

nitzmahone

Other than maybe a different default for old core versions, changes LGTM

nitzmahone · 2022-04-09T01:35:05Z

ansible_runner/display_callback/callback/awx_display.py

@@ -409,6 +409,7 @@ def set_task(self, task, local=False):
            task=(task.name or task.action),
            task_uuid=str(task._uuid),
            task_action=task.action,
+            resolved_action=getattr(task, 'resolved_action', ''),


Any reason not to make the default value here task_action instead of empty string? resolved_action isn't guaranteed to be a module/action FQCN if the actual module run was a legacy library module anyway, so in the "old core" case where the value is missing, it seems like presenting a less-accurate value is preferable to no value at all...

@nitzmahone I think both options should be ok. The processing side will likely do a regexp check for FQCN format on resolved_action, if it won't match we'll use task_action.

The only option that could make more sense to me would be to exclude resolved_action if it's the same as task.

I pushed a commit that does the original suggestion here, just fallback to the task.action.

softwarefactory-project-zuul · 2022-04-11T17:37:39Z

Unable to freeze job graph: Job ansible-buildset-registry does not specify a run playbook

softwarefactory-project-zuul · 2022-04-11T17:40:50Z

Unable to freeze job graph: Job ansible-buildset-registry does not specify a run playbook

AlanCoding · 2022-04-11T17:50:15Z

For purposes of final approval, I want to make sure I present the final data structure here. Script for doing this:

import json
import sys

import ansible_runner.interface


KEEP_KEYS = ['task', 'task_action', 'role', 'resolved_role', 'name', 'resolved_action']


r = ansible_runner.interface.run(
    private_data_dir=sys.argv[-2],
    playbook=sys.argv[-1]
)


events = []

for event in r.events:
    event_data = event['event_data']
    d = {"event": event['event']}
    for key in KEEP_KEYS:
        if key in event_data:
            d[key] = event_data[key]
    events.append(d)


print(json.dumps(events, indent=2))

Running that for the new test case python pair.py test/fixtures/projects/collection_role use_role.yml gives this structure:

[
  {
    "event": "playbook_on_start"
  },
  {
    "event": "playbook_on_play_start",
    "name": "all"
  },
  {
    "event": "playbook_on_task_start",
    "task": "debug",
    "task_action": "debug",
    "role": "hello",
    "resolved_role": "groovy.peanuts.hello",
    "name": "groovy.peanuts.hello : debug",
    "resolved_action": "ansible.builtin.debug"
  },
  {
    "event": "runner_on_start",
    "task": "debug",
    "task_action": "debug",
    "role": "hello",
    "resolved_role": "groovy.peanuts.hello",
    "resolved_action": "ansible.builtin.debug"
  },
  {
    "event": "runner_on_ok",
    "task": "debug",
    "task_action": "debug",
    "role": "hello",
    "resolved_role": "groovy.peanuts.hello",
    "resolved_action": "ansible.builtin.debug"
  },
  {
    "event": "playbook_on_stats",
    "resolved_role": "groovy.peanuts.hello",
    "resolved_action": "ansible.builtin.debug"
  }
]

Now that I look at it, the playbook_on_stats looks a bit problematic, but I don't believe that changes with this PR in any way.

AlanCoding · 2022-04-11T17:57:34Z

No wait, that was a real implementation problem, will push a fix.

softwarefactory-project-zuul · 2022-04-11T18:03:38Z

Unable to freeze job graph: Job ansible-buildset-registry does not specify a run playbook

AlanCoding · 2022-04-11T18:04:18Z

Updated structure

[
  {
    "event": "playbook_on_start"
  },
  {
    "event": "playbook_on_play_start",
    "name": "all"
  },
  {
    "event": "playbook_on_task_start",
    "task": "debug",
    "task_action": "debug",
    "role": "hello",
    "resolved_role": "groovy.peanuts.hello",
    "name": "groovy.peanuts.hello : debug",
    "resolved_action": "ansible.builtin.debug"
  },
  {
    "event": "runner_on_start",
    "task": "debug",
    "task_action": "debug",
    "role": "hello",
    "resolved_role": "groovy.peanuts.hello",
    "resolved_action": "ansible.builtin.debug"
  },
  {
    "event": "runner_on_ok",
    "task": "debug",
    "task_action": "debug",
    "role": "hello",
    "resolved_role": "groovy.peanuts.hello",
    "resolved_action": "ansible.builtin.debug"
  },
  {
    "event": "playbook_on_stats"
  }
]

Ladas · 2022-04-11T19:17:38Z

@AlanCoding this looks great 👍

AlanCoding · 2022-04-11T20:17:19Z

Looks like CI may be fixed in #1034

EDIT: pushed again to pick up CI fixes.

softwarefactory-project-zuul · 2022-04-12T15:30:34Z

Build succeeded.

ansible-build-python-tarball : SUCCESS in 5m 40s

AlanCoding · 2022-04-12T15:39:47Z

I have no idea about this new failure

+ /usr/bin/dnf install -y gcc krb5-devel krb5-workstation make python38-cffi python38-cryptography python38-devel python38-jinja2 python38-lxml python38-pycparser python38-requests python38-six python38-yaml rsync
Last metadata expiration check: 0:00:05 ago on Tue Apr 12 15:25:15 2022.
Error: 
 Problem: cannot install the best candidate for the job
  - nothing provides libgomp = 8.5.0-12.el8 needed by gcc-8.5.0-12.el8.x86_64
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
The command '/bin/sh -c assemble' returned a non-zero code: 1

softwarefactory-project-zuul · 2022-04-12T16:11:34Z

Build succeeded.

ansible-build-python-tarball : SUCCESS in 4m 24s

AlanCoding · 2022-04-14T15:39:42Z

last CI failure I mentioned looks like it was flake, it's passing now.

AlanCoding requested a review from a team as a code owner March 11, 2022 03:08

nitzmahone reviewed Apr 9, 2022

View reviewed changes

AlanCoding force-pushed the test_roles branch from a78f550 to d4038c2 Compare April 11, 2022 17:40

AlanCoding added 3 commits April 12, 2022 11:23

Use FQCN for roles where available

e748832

Consistently apply resolved_role similar to resolved_action

471c6cf

Change fallback value to normal task_action

eb4cc31

AlanCoding force-pushed the test_roles branch from 012df7e to ac43db4 Compare April 12, 2022 15:23

Fix issue where resolved_ vars were there on stats event

dd05eeb

AlanCoding force-pushed the test_roles branch from ac43db4 to dd05eeb Compare April 12, 2022 16:06

nitzmahone approved these changes Apr 14, 2022

View reviewed changes

Shrews merged commit 6a53a5d into ansible:devel Apr 14, 2022

AlanCoding mentioned this pull request Apr 18, 2022

Ship the resolved_role event data to analytics ansible/awx#12066

Merged

Akasurde mentioned this pull request Sep 13, 2022

Include FQCN of role in callback data #1019

Closed

Use FQCN for roles where available #1021

Use FQCN for roles where available #1021

Conversation

AlanCoding commented Mar 11, 2022

softwarefactory-project-zuul bot commented Mar 11, 2022

AlanCoding commented Mar 11, 2022

sivel commented Mar 11, 2022

AlanCoding commented Mar 11, 2022

AlanCoding commented Mar 14, 2022

Ladas commented Mar 17, 2022 • edited

AlanCoding commented Mar 17, 2022

Ladas commented Mar 17, 2022 • edited

AlanCoding commented Mar 17, 2022

Ladas commented Mar 18, 2022

softwarefactory-project-zuul bot commented Mar 18, 2022

AlanCoding commented Mar 18, 2022

AlanCoding commented Mar 18, 2022

Ladas commented Mar 18, 2022

sivel commented Mar 23, 2022

Ladas commented Mar 23, 2022

sivel commented Mar 23, 2022

Ladas commented Mar 24, 2022 • edited

AlanCoding commented Mar 25, 2022

sivel commented Mar 25, 2022

AlanCoding commented Mar 28, 2022

Ladas commented Mar 31, 2022 • edited

AlanCoding commented Mar 31, 2022

nitzmahone left a comment

Choose a reason for hiding this comment

nitzmahone Apr 9, 2022

Choose a reason for hiding this comment

Ladas Apr 11, 2022

Choose a reason for hiding this comment

AlanCoding Apr 11, 2022

Choose a reason for hiding this comment

softwarefactory-project-zuul bot commented Apr 11, 2022

softwarefactory-project-zuul bot commented Apr 11, 2022

AlanCoding commented Apr 11, 2022

AlanCoding commented Apr 11, 2022

softwarefactory-project-zuul bot commented Apr 11, 2022

AlanCoding commented Apr 11, 2022

Ladas commented Apr 11, 2022

AlanCoding commented Apr 11, 2022 • edited

softwarefactory-project-zuul bot commented Apr 12, 2022

AlanCoding commented Apr 12, 2022

softwarefactory-project-zuul bot commented Apr 12, 2022

AlanCoding commented Apr 14, 2022

Ladas commented Mar 17, 2022 •

edited

Ladas commented Mar 17, 2022 •

edited

Ladas commented Mar 24, 2022 •

edited

Ladas commented Mar 31, 2022 •

edited

AlanCoding commented Apr 11, 2022 •

edited