Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix notification timing issue by sending in the latter of 2 events #12110

Merged
merged 19 commits into from
Apr 29, 2022

Conversation

AlanCoding
Copy link
Member

SUMMARY

Connect #11422 as the primary issue for this.

As a secondary issue, I would suggest we close #8771 if we merge this. This PR adds the basic conflict management between event processing and status change. This does not attempt to assure that all events are processed, only that the "stats" event in processed, which is the "playbook_on_stats" event for playbooks, and EOF for other types.

See prior work at https://github.com/ansible/awx/compare/devel...chrismeyersfsu:job_done_done?expand=1 --> this tried to tackle the problem of a "job done" hook in conjunction with making the events queue durable. I'm not attacking the problem of event queue durability here.

I first implemented this fix in a different way here --> https://github.com/ansible/awx/compare/devel...AlanCoding:send_notifications?expand=1, which added a notifications_processed field, but that was probably poorly named. You could do what I'm doing here with a boolean like stats_event_processed, but that feels like it has no connection to other use cases, and would be useless for any other than this very narrow problem.

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME
  • API
ADDITIONAL INFORMATION

WIP upon opening because:

  • unit tests need to be updated
  • integration tests still need to be verified
  • I removed tracking of the stats event dispatch in the callback class, which I think is still needed to handle error events, and I'll probably add that back in and deal with that problem

@AlanCoding AlanCoding changed the title Host status counts Fix notification timing issue by sending in the later of 2 events Apr 26, 2022
@AlanCoding AlanCoding changed the title Fix notification timing issue by sending in the later of 2 events Fix notification timing issue by sending in the latter of 2 events Apr 26, 2022
@AlanCoding AlanCoding marked this pull request as ready for review April 27, 2022 11:38
Copy link
Contributor

@nixocio nixocio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve the UI changes.

Comment on lines 29 to 35
EVENT_MAP = {
'job_id': JobEvent,
'ad_hoc_command_id': AdHocCommandEvent,
'project_update_id': ProjectUpdateEvent,
'inventory_update_id': InventoryUpdateEvent,
'system_job_id': SystemJobEvent,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you inverted this map could you get rid of your new for loop on 185-189?

if key in body:
job_identifier = body[key]
break

self.last_event = f'\n\t- {cls.__name__} for #{job_identifier} ({body.get("event", "")} {body.get("uuid", "")})' # noqa

trigger_event_type = 'playbook_on_stats' if cls in (JobEvent, ProjectUpdateEvent, AdHocCommandEvent) else 'EOF'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(assuming you can invert EVENT_MAP from previous comment)
What about changing event map to include a field for this and then just doing:
trigger_event_map = EVENT_MAP[cls]['event_type']

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can dig that. Still need to consider the original loop that used event_map - if we made the values a dictionary, then that could look like

                for cls, props in EVENT_MAP.items():
                    if props['job_ref'] in body:
                        job_identifier = body[props['job_ref']]
                        break

There's a little bit of indirection going on there, but this data structure could be reused more broadly than what I have now. I would probably move it into another module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a commit addressing this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I still missed this exact line, pushed another commit which should wrap it up.

@AlanCoding
Copy link
Member Author

All test results are looking good for this, so I'm pretty serious about it at this point.

# events are persisted _before_ the UJ.status
# changes from running -> successful
retries += 1
time.sleep(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@AlanCoding
Copy link
Member Author

Tried to write up some details if someone wants to read. https://gist.github.com/AlanCoding/f7ae3d10ef04692c254c17507ef16b59

@@ -113,6 +113,32 @@ def work_loop(self, *args, **kw):
signal.signal(signal.SIGUSR1, self.toggle_profiling)
return super(CallbackBrokerWorker, self).work_loop(*args, **kw)

@staticmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only see you calling this as a method on self, can we drop this decorator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a commit to try and address this comment. I get what you're saying, I was calling self.job_stats_wrapup which isn't great.

@AlanCoding AlanCoding merged commit 29d6084 into ansible:devel Apr 29, 2022
AlanCoding added a commit to AlanCoding/awx that referenced this pull request May 25, 2022
…r of 2 events (ansible#12110)"

This reverts some of commit 29d6084.

The reversion concern the addition of the host status counts field
that field is removed here and migrated away, replaced with different field
AlanCoding added a commit to AlanCoding/awx that referenced this pull request May 31, 2022
…r of 2 events (ansible#12110)"

This reverts some of commit 29d6084.

The reversion concern the addition of the host status counts field
that field is removed here and migrated away, replaced with different field
AlanCoding added a commit to AlanCoding/awx that referenced this pull request Jun 9, 2022
…r of 2 events (ansible#12110)"

This reverts some of commit 29d6084.

The reversion concern the addition of the host status counts field
that field is removed here and migrated away, replaced with different field
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Job done isn't always correct
6 participants