Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FQCN for roles where available #1021

Merged
merged 4 commits into from
Apr 14, 2022
Merged

Conversation

AlanCoding
Copy link
Member

Connect #1019

Keeping track of the event data from these test cases is hard. Events go:

  • playbook_on_start
  • playbook_on_play_start
  • playbook_on_task_start
  • runner_on_start
  • runner_on_ok
  • playbook_on_stats

The bolded event types have historically had the role entry inside of the event_data dict.

The new resolved_action key is only present on "playbook_on_task_start". This is a break in pattern from other keys like task, which occurs through all events where that task is active, similar to role, those 3 bolded events. I tried to raise this point in #995, although without the full testing and context. Here's the full event data:

https://gist.github.com/AlanCoding/325a84d4121249188a573ad25249cda7

The ben.json file corresponds to the existing test that this follows the pattern of, and peanut.json is for the test added here.

Speaking in broad strokes, task and role keys are contextual in nature. A role will run multiple tasks, and everything in that context will be marked with that role. However, resolved_action gives the FQCN of a task only at the start of the task. This doesn't leave good options for the role FQCN. Introduce a new key, sure, but would this key be contextual or not? If it is non-contextual like resolved_action, then it would only appear in "playbook_on_task_start", which is really weird for a role, since the role has multiple tasks. So that would leave it as weird semi-contextual thing. To make it contextual, it would become inconsistent with the implementation of resolved_action.

So here I'm testing the waters to see if there's any pushback for the "cheap" option. Make the role key FQCN if it's a collection.

@AlanCoding AlanCoding requested a review from a team as a code owner March 11, 2022 03:08
@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@AlanCoding
Copy link
Member Author

To help review of this change, here is the new test's event data, with everything removed except what I find potentially relevant.

[
  {
    "event": "playbook_on_start",
  },
  {
    "event": "playbook_on_play_start",
  },
  {
    "event": "playbook_on_task_start",
    "event_data": {
      "task": "debug",
      "task_action": "debug",
      "role": "groovy.peanuts.hello",
      "name": "groovy.peanuts.hello : debug",
      "resolved_action": "ansible.builtin.debug"
    }
  },
  {
    "event": "runner_on_start",
    "event_data": {
      "task": "debug",
      "task_action": "debug",
      "role": "groovy.peanuts.hello",
      "host": "testhost",
    }
  },
  {
    "event": "runner_on_ok",
    "event_data": {
      "task": "debug",
      "task_action": "debug",
      "role": "groovy.peanuts.hello",
      "host": "testhost",
    }
  },
  {
    "event": "playbook_on_stats",
  }
]

@sivel
Copy link
Member

sivel commented Mar 11, 2022

The new resolved_action key is only present on "playbook_on_task_start". This is a break in pattern from other keys like task, which occurs through all events where that task is active, similar to role, those 3 bolded events

That sounds like a bug in core where the data is not persisting somehow. All references to the task object should have that attribute. I'd file an issue to get it looked at.

@AlanCoding
Copy link
Member Author

@sivel we already discussed the diff to do that

diff --git a/ansible_runner/display_callback/callback/awx_display.py b/ansible_runner/display_callback/callback/awx_display.py
index aa29f39..0aee596 100644
--- a/ansible_runner/display_callback/callback/awx_display.py
+++ b/ansible_runner/display_callback/callback/awx_display.py
@@ -409,6 +409,7 @@ class CallbackModule(DefaultCallbackModule):
             task=(task.name or task.action),
             task_uuid=str(task._uuid),
             task_action=task.action,
+            resolved_action=getattr(task, 'resolved_action', ''),
             task_args='',
         )
         try:
@@ -554,7 +555,6 @@ class CallbackModule(DefaultCallbackModule):
             name=task.get_name(),
             is_conditional=is_conditional,
             uuid=task_uuid,
-            resolved_action=getattr(task, 'resolved_action', '')
         )
         with self.capture_event_data('playbook_on_task_start', **event_data):
             super(CallbackModule, self).v2_playbook_on_task_start(task, is_conditional)

This was probably not done, because it's unclear what the objective is, or what the standards for runner's event_data is. I know it's been said many times that this is a stable API, so not changing long-standing fields (to some reasonable extent) is an objective. I am fudging that in this PR by changing the content of role from the name to the FQCN. I don't want an answer for resolved_action in isolation of the role FQCN.

For the overall objective, how should the JSON data I posted above look, with task and role FQCNs?

What is task_action anyway, why is it different from task, and why should it not be the FQCN?

If task is the name, then should role also be the name (as opposed to the FQCN)? And then we introduce another (contextual) field resolved_role? But the callback from Ansible core doesn't have any notion of a "resolved" role, so I have strong hesitations.

The resolved_role field was only just merged recently, so we're free to change that to be more consistent, but I don't know how I would.

@AlanCoding
Copy link
Member Author

I put up #1023 to address the issue of the code coverage decline.

@Ladas
Copy link

Ladas commented Mar 17, 2022

@AlanCoding @sivel
Assuming the naming is the same in analytics:

task: task name, whatever user writes, e.g. "Install httpd module"

task_action: module used, not sure if it can be other resource too, e.g. "yum" (based on what user inputs, so can be "ansible.builtin.yum")

resolved_action: the FQCN of resource (we see mainly modules in events), e.g. ansible.builtin.yum

role: role name, based on what user inputs, it can be role name, FQCN, path to /tmp/xy/roles/role_x, relative path, etc.

Based on the above, it seems it'd be consistent to have resolved_role that would be role's FQCN.

My assumption is also that if the content doesn't come from a collection the resolved_* will be null.And if the content doesn't come from role the *role* will be null.


As for the main usecase, we need a good way to detect:

  • content comes from collection
  • collection namespace and name
  • resource name

We should be able to nicely get that by parsing the FQCN attributes, counting on schema always being {namespace}.{name}.{dotted_path?}{resource} or null (empty string will be acceptable too).

@AlanCoding
Copy link
Member Author

This diff changes the role field to be FQCN, and I believe it should work for analytics needs.

@Ladas
Copy link

Ladas commented Mar 17, 2022

@AlanCoding I don't see test on what happens when the role is standalone (not part of a collection). I think with 1 field it might be harder to recognize?

Or will it be enough to just say, if the format is {namespace}.{name}.{dotted_path?}{resource}, the role comes from collection? If yes, one field should be usable. (the processing side in Insights will need Postgres Regexp though)


So for me, separate field will have more consistency and easier processing. But I am ok with whatever you guys decide, if we can detect role as part of collection, role without collection and no role.

@AlanCoding
Copy link
Member Author

That's right! There's a pre-existing test for a role that's not from a collection.

role_events = [event for event in r.events if event.get('event_data', {}).get('role', '') == "benthomasson.hello_role"]

The new test uses a make-believe role from a collection, groovy.peanuts.hello.

To make things painfully complicated, benthomasson.hello_role is the name of a folder, not a sub-folder benthomasson/hello_role/. I don't know of a test for that case... which could be sub-divided into within a collection and not within a collection.

I didn't think of the point you raised. It's true, this role name cannot distinguish between a collection and a dotted path (or name). Maybe you were trying to make this point earlier and I wasn't getting it.

So for me, separate field will have more consistency and easier processing. But I am ok with whatever you guys decide, if we can detect role as part of collection, role without collection and no role.

I don't think anyone wants to make a decision. Say we add another field for the role FQCN. This probably needs to go in all events that currently have role. That does argue for changing resolved_action to be in all event data that has task.

What would we call the new field? role_fqcn? resolved_role?

@Ladas
Copy link

Ladas commented Mar 18, 2022

@AlanCoding resolved_role sounds good to me. And I was expecting that resolved_action would be present for all tasks and null if the module doesn't come from a collection (so e.g. coming from a project SCM repo).

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@AlanCoding
Copy link
Member Author

And I was expecting that resolved_action would be present for all tasks and null if the module doesn't come from a collection (so e.g. coming from a project SCM repo).

This is beyond the scope of the discussion so far, so I tested it manually. Using a local module in library/ next to the playbook I find this in the event data:

"task_action": "my_test", "resolved_action": "my_test"

I was designing the resolved_role to not be present in these cases. I tend to prefer that we not add irrelevant data because it bloats the amount of data users have to get through. I guess I could do the same thing for resolved_action, and I will see about pushing another commit for that now.

@AlanCoding
Copy link
Member Author

^ maybe I'm not so confident in that. I have a little bit of hesitancy about adding that change onto this.

What we have now for the role case we care about looks like:

[
  {
    "event": "playbook_on_start",
  },
  {
    "event": "playbook_on_play_start",
  },
  {
    "event": "playbook_on_task_start",
    "event_data": {
      "task": "debug",
      "task_action": "debug",
      "role": "hello",
      "resolved_role": "groovy.peanuts.hello",
      "name": "groovy.peanuts.hello : debug",
      "resolved_action": "ansible.builtin.debug"
    }
  },
  {
    "event": "runner_on_start",
    "event_data": {
      "task": "debug",
      "task_action": "debug",
      "role": "hello",
      "resolved_role": "groovy.peanuts.hello",
      "host": "testhost",
    }
  },
  {
    "event": "runner_on_ok",
    "event_data": {
      "task": "debug",
      "task_action": "debug",
      "role": "hello",
      "resolved_role": "groovy.peanuts.hello",
      "host": "testhost",
    }
  },
  {
    "event": "playbook_on_stats",
  }
]

@Ladas
Copy link

Ladas commented Mar 18, 2022

@AlanCoding I think the resolved_role looks good like this. And it's good it's present on runner_on_ok. The Insights is looking mostly on the tasks with terminal states, so having every important info there is good.

Lets chat with @sivel when he's back. Ideally the resolved_action will behave the same as resolved_role and I think it should be there only for content from the actual Collections. (unless I am missing something, having just the user inserted name [task_action, role] for non-collection content seems ok to me)

@sivel
Copy link
Member

sivel commented Mar 23, 2022

I'm a bit behind in here, and was having trouble following what if anything was needed of me. @Ladas provided these topics:

  1. We should have separate attr resolved_role. (to help recognize if the content comes from a collection and easier parsing)

I'd say that is fine. You can still use task._role.get_name() for both if you want. get_name(include_role_fqcn=True/False).

  1. resolved_action and resolved_role should be empty when the content doesn't come from a collection

I guess this just depends on the goals of how the data will be used. The core code doesn't concern itself with whether resolved_action represents a collection or not, but you are free to inspect it and use it however it's most relevant to you. The regex for matching an FQCN is defined as VALID_FQCR_RE = re.compile(r'^\w+(\.\w+){2,}$')

  1. resolved_action and resolved_role should be present in all events. (Insights is only looking at the terminal events, like runner_on_ok, to avoid unnecessary traversing of the related events)

I think this is the right thing to do. I don't think it should only be present in playbook_on_task_start

@Ladas
Copy link

Ladas commented Mar 23, 2022

@sivel right for 3.): One of our usecases is to recognize if the content comes from collection or not. Are you saying we're not able to do that?

Or are you saying that matching VALID_FQCR_RE = re.compile(r'^\w+(.\w+){2,}$') guarantees the resource is from a collection? From @AlanCoding comment #1021 (comment), it wasn't clear to me if we're e.g. able to craft standalone role, that would match the regexp.

@sivel
Copy link
Member

sivel commented Mar 23, 2022

Technically you could poke task._role._collection_name but it's supposed to be private, and exposed via get_name() instead. ._role is named like it should be private, but that was just lack of imagination that we didn't make it public. So effectively that is public, just the attrs hanging off of ._role really aren't.

As such, there isn't really a straight forward way to know if someone just created roles/my.cool.role, you wouldn't know if it was a collection or just a role name that looked like a collection resource.

If you do poke at ._collection_name just know that it could be renamed or modified in some backwards incompatible way.

@Ladas
Copy link

Ladas commented Mar 24, 2022

@AlanCoding @sivel Sounds like using private methods that can become backwards incompatible is not a great idea.

I think that if we'll put there FQCN in a format {namespace}.{name}.{dotted_path?}{resource}, Insights for AAP can parse the namespace.name from it and the we can join this to the installed_collections we have on job, to get accurate collection usage. That should filter out the edge cases, where non-collection resource has FQCN format. I think this would need to be done anyway, to get usage distribution for specific collection versions.

I assume the installed_collections sent in ansible/awx@0e80f66, should have a list of all valid collections, so this approach should be viable.

@AlanCoding
Copy link
Member Author

As such, there isn't really a straight forward way to know if someone just created roles/my.cool.role, you wouldn't know if it was a collection or just a role name that looked like a collection resource.

Is there not? It sounded like you were converging on:

  • role = task._role.get_name(include_role_fqcn=False)
  • resolved_role = task._role.get_name(include_role_fqcn=True)

Having both of these, you could get the collection of the role without much more complex references to the list of installed collections. The approach of comparing to installed collections won't event work for my test case because I'm using a collection in source control next to the playbook.

My concern about task._role.get_name(include_role_fqcn=True/False) is that I don't have an ironclad of the versions of ansible-core this works with. I'm pretty sure we're testing with Ansible 2.9, but I'm unclear if the executor team has a "soft" goal of actually working with versions further back in time. But I could just write this in in a super ugly defensive way and be done with it.

@sivel
Copy link
Member

sivel commented Mar 25, 2022

include_role_fqcn was added in 2.9.10 to role.get_name()

ansible/ansible@1f3185d

@AlanCoding
Copy link
Member Author

Well then, honestly, the current diff of this PR remains my preference unless someone has a highly specific change to request.

@Ladas
Copy link

Ladas commented Mar 31, 2022

@AlanCoding @sivel if there is no other way, I think that checking resolved_role with {namespace}.{name}.{dotted_path?}{resource} is acceptable (even when it is not 100%, that this means actual Collection content)

So I am fine with the current approach.


@AlanCoding do I need to file an issue for resolved_action being in all events? (and not just the playbook_on_task_start) Or is it captured somewhere? It'd be good to finish this for a current release. (or do you actually implement it here? Since I see the resolved_action is moving)

@AlanCoding
Copy link
Member Author

@Ladas why don't you subtract the "role" from "resolved_role", from the test case:

      "role": "hello",
      "resolved_role": "groovy.peanuts.hello",

Like you're saying, there's no other correct way to do this. In the current implementation, any task that has "resolved_role" will also have "role" (not necessarily the other way around, when a role isn't from a collection).

do I need to file an issue for resolved_action being in all events?

No, that's currently implemented in this PR.

Copy link
Member

@nitzmahone nitzmahone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than maybe a different default for old core versions, changes LGTM

@@ -409,6 +409,7 @@ def set_task(self, task, local=False):
task=(task.name or task.action),
task_uuid=str(task._uuid),
task_action=task.action,
resolved_action=getattr(task, 'resolved_action', ''),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to make the default value here task_action instead of empty string? resolved_action isn't guaranteed to be a module/action FQCN if the actual module run was a legacy library module anyway, so in the "old core" case where the value is missing, it seems like presenting a less-accurate value is preferable to no value at all...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nitzmahone I think both options should be ok. The processing side will likely do a regexp check for FQCN format on resolved_action, if it won't match we'll use task_action.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only option that could make more sense to me would be to exclude resolved_action if it's the same as task.

I pushed a commit that does the original suggestion here, just fallback to the task.action.

@softwarefactory-project-zuul
Copy link
Contributor

Unable to freeze job graph: Job ansible-buildset-registry does not specify a run playbook

@softwarefactory-project-zuul
Copy link
Contributor

Unable to freeze job graph: Job ansible-buildset-registry does not specify a run playbook

@AlanCoding
Copy link
Member Author

For purposes of final approval, I want to make sure I present the final data structure here. Script for doing this:

import json
import sys

import ansible_runner.interface


KEEP_KEYS = ['task', 'task_action', 'role', 'resolved_role', 'name', 'resolved_action']


r = ansible_runner.interface.run(
    private_data_dir=sys.argv[-2],
    playbook=sys.argv[-1]
)


events = []

for event in r.events:
    event_data = event['event_data']
    d = {"event": event['event']}
    for key in KEEP_KEYS:
        if key in event_data:
            d[key] = event_data[key]
    events.append(d)


print(json.dumps(events, indent=2))

Running that for the new test case python pair.py test/fixtures/projects/collection_role use_role.yml gives this structure:

[
  {
    "event": "playbook_on_start"
  },
  {
    "event": "playbook_on_play_start",
    "name": "all"
  },
  {
    "event": "playbook_on_task_start",
    "task": "debug",
    "task_action": "debug",
    "role": "hello",
    "resolved_role": "groovy.peanuts.hello",
    "name": "groovy.peanuts.hello : debug",
    "resolved_action": "ansible.builtin.debug"
  },
  {
    "event": "runner_on_start",
    "task": "debug",
    "task_action": "debug",
    "role": "hello",
    "resolved_role": "groovy.peanuts.hello",
    "resolved_action": "ansible.builtin.debug"
  },
  {
    "event": "runner_on_ok",
    "task": "debug",
    "task_action": "debug",
    "role": "hello",
    "resolved_role": "groovy.peanuts.hello",
    "resolved_action": "ansible.builtin.debug"
  },
  {
    "event": "playbook_on_stats",
    "resolved_role": "groovy.peanuts.hello",
    "resolved_action": "ansible.builtin.debug"
  }
]

Now that I look at it, the playbook_on_stats looks a bit problematic, but I don't believe that changes with this PR in any way.

@AlanCoding
Copy link
Member Author

No wait, that was a real implementation problem, will push a fix.

@softwarefactory-project-zuul
Copy link
Contributor

Unable to freeze job graph: Job ansible-buildset-registry does not specify a run playbook

@AlanCoding
Copy link
Member Author

Updated structure

[
  {
    "event": "playbook_on_start"
  },
  {
    "event": "playbook_on_play_start",
    "name": "all"
  },
  {
    "event": "playbook_on_task_start",
    "task": "debug",
    "task_action": "debug",
    "role": "hello",
    "resolved_role": "groovy.peanuts.hello",
    "name": "groovy.peanuts.hello : debug",
    "resolved_action": "ansible.builtin.debug"
  },
  {
    "event": "runner_on_start",
    "task": "debug",
    "task_action": "debug",
    "role": "hello",
    "resolved_role": "groovy.peanuts.hello",
    "resolved_action": "ansible.builtin.debug"
  },
  {
    "event": "runner_on_ok",
    "task": "debug",
    "task_action": "debug",
    "role": "hello",
    "resolved_role": "groovy.peanuts.hello",
    "resolved_action": "ansible.builtin.debug"
  },
  {
    "event": "playbook_on_stats"
  }
]

@Ladas
Copy link

Ladas commented Apr 11, 2022

@AlanCoding this looks great 👍

@AlanCoding
Copy link
Member Author

AlanCoding commented Apr 11, 2022

Looks like CI may be fixed in #1034

EDIT: pushed again to pick up CI fixes.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@AlanCoding
Copy link
Member Author

I have no idea about this new failure

+ /usr/bin/dnf install -y gcc krb5-devel krb5-workstation make python38-cffi python38-cryptography python38-devel python38-jinja2 python38-lxml python38-pycparser python38-requests python38-six python38-yaml rsync
Last metadata expiration check: 0:00:05 ago on Tue Apr 12 15:25:15 2022.
Error: 
 Problem: cannot install the best candidate for the job
  - nothing provides libgomp = 8.5.0-12.el8 needed by gcc-8.5.0-12.el8.x86_64
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
The command '/bin/sh -c assemble' returned a non-zero code: 1

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@AlanCoding
Copy link
Member Author

last CI failure I mentioned looks like it was flake, it's passing now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants