Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch SIGTERM or SIGINT and send offline message #13858

Merged
merged 7 commits into from
Apr 26, 2023

Conversation

jessicamack
Copy link
Member

SUMMARY

Handle SIGTERM and SIGINT in the hearbeet daemon. Relates to #13320

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME
  • API
AWX VERSION
awx: 21.14.1
ADDITIONAL INFORMATION

@jessicamack jessicamack marked this pull request as draft April 13, 2023 19:24
@@ -56,6 +57,8 @@ def do_hearbeat_loop(self):
logger.debug('Sending heartbeat')
conn.notify('web_heartbeet', self.construct_payload())
time.sleep(settings.BROADCAST_WEBSOCKET_BEACON_FROM_WEB_RATE_SECONDS)
signal.signal(signal.SIGTERM, conn.notify('web_heartbeet', self.construct_payload(action='offline')))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the signal only needs to be called once I think. if so we may move this to before the while True loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need something in the loop to check that the signal had been received and break out of the loop too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the signal only needs to be called once I think. if so we may move this to before the while True loop

my understanding is that if we want to catch both we need to call signal on both similar to how we do in base.py https://github.com/ansible/awx/blob/devel/awx/main/dispatch/worker/base.py#L129-L130

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jessicamack you're right we need both for SIGTERM and SIGINT

what I meant is that I don't think we need to put them into the while True block

@AlanCoding
Copy link
Member

I tried to test this locally. Steps:

  • run docker-compose
  • in 2nd tab, exec into the awx_devel container, awx-manage run_heartbeeet, leave it
  • in 3rd tab, exec in again, ps aux --forest and find process, kill 1234

When I do this, in the 2nd tab, I find this:

(awx) bash-5.1$ awx-manage run_heartbeet

   **********   **********
 ************* *************
*****************************
 ***********HEART***********
  *************************
     *******************
       *************** _._
         *********** /`._ `'.      __
           *******   \  .\| \   _'`  `)
             ***  (``_)  \| ).'` /`- /
              *   `\ `;\_ `\\//`-'` /
                    \ `'.'.| /  __/`
                     `'--v_|/`'`
                         __||-._
                      /'` `-``  `'\\
                     /         .'` )
                     \  BEET  '    )
                      \.          /
                        '.     /'`
                           `) |
                            //
                            '(.
                             `\`.
                               ``
Traceback (most recent call last):
  File "/usr/local/bin/awx-manage", line 9, in <module>
    load_entry_point('awx', 'console_scripts', 'awx-manage')()
  File "/awx_devel/awx/__init__.py", line 200, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/__init__.py", line 413, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/base.py", line 354, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "/awx_devel/awx/main/management/commands/run_heartbeet.py", line 70, in handle
    self.do_hearbeat_loop()
  File "/awx_devel/awx/main/management/commands/run_heartbeet.py", line 60, in do_hearbeat_loop
    signal.signal(signal.SIGTERM, conn.notify('web_heartbeet', self.construct_payload(action='offline')))
  File "/usr/lib64/python3.9/signal.py", line 56, in signal
    handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
TypeError: signal handler must be signal.SIG_IGN, signal.SIG_DFL, or a callable object

Before your branch, it printed "Terminated" and exited, which is desirable.

@jessicamack jessicamack marked this pull request as ready for review April 19, 2023 20:38
@AlanCoding
Copy link
Member

you might have a problem with an outdated black version for the api-lint check failure. Just a guess.

Comment on lines 62 to 63
signal.signal(signal.SIGTERM, self.notify_listener_and_exit)
signal.signal(signal.SIGINT, self.notify_listener_and_exit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make more sense to register these in handle() instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we can drop the TODO above handle() then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signal catches have been moved to handle() and TODO has been removed

@jessicamack jessicamack requested a review from relrod April 24, 2023 14:52
def notify_listener_and_exit(self, *args):
with pg_bus_conn(new_connection=False) as conn:
conn.notify('web_heartbeet', self.construct_payload(action='offline'))
sys.exit(1)
Copy link
Member

@relrod relrod Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the normal way to exit the program, so we probably should exit with 0 instead.

Copy link
Member

@relrod relrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more inline comment. I think after that fix, this is good to go.

@TheRealHaoLiu
Copy link
Member

related to #13322

Signed-off-by: jessicamack <jmack@redhat.com>
Signed-off-by: jessicamack <jmack@redhat.com>
Signed-off-by: jessicamack <jmack@redhat.com>
Signed-off-by: jessicamack <jmack@redhat.com>
Signed-off-by: jessicamack <jmack@redhat.com>
Signed-off-by: jessicamack <jmack@redhat.com>
Signed-off-by: jessicamack <jmack@redhat.com>
@jessicamack jessicamack merged commit 66a3cb6 into ansible:devel Apr 26, 2023
14 checks passed
relrod added a commit to relrod/awx that referenced this pull request Apr 27, 2023
This fixes two different exceptions in wsrelay.

- One resulted from heartbeet getting ability in ansible#13858 to gracefully
  shut down. When we saw the message come through, we didn't fully clean
  up the connection to the web node.

- The second resulted when Redis disappeared. We still want to exit in
  that case, but it's better to log a message and exit gracefully
  instead of crashing out.

Signed-off-by: Rick Elrod <rick@elrod.me>
relrod added a commit to relrod/awx that referenced this pull request Apr 27, 2023
This fixes two different exceptions in wsrelay.

- One resulted from heartbeet getting ability in ansible#13858 to gracefully
  shut down. When we saw the message come through, we didn't fully clean
  up the connection to the web node.

- The second resulted when Redis disappeared. We still want to exit in
  that case, but it's better to log a message and exit gracefully
  instead of crashing out.

Signed-off-by: Rick Elrod <rick@elrod.me>
relrod added a commit to relrod/awx that referenced this pull request Apr 27, 2023
This fixes two different exceptions in wsrelay.

- One resulted from heartbeet getting ability in ansible#13858 to gracefully
  shut down. When we saw the message come through, we didn't fully clean
  up the connection to the web node.

- The second resulted when Redis disappeared. We still want to exit in
  that case, but it's better to log a message and exit gracefully
  instead of crashing out.

Signed-off-by: Rick Elrod <rick@elrod.me>
relrod added a commit to relrod/awx that referenced this pull request May 11, 2023
This fixes two different exceptions in wsrelay.

* One resulted from heartbeet getting ability in ansible#13858 to gracefully
  shut down. When we saw the message come through, we didn't fully
  clean up the connection to the web node.

* The second resulted when Redis disappeared. We still want to exit in
  that case, but it's better to log a message and exit gracefully
  instead of crashing out.

Signed-off-by: Rick Elrod <rick@elrod.me>
relrod added a commit that referenced this pull request May 15, 2023
This fixes two different exceptions in wsrelay.

* One resulted from heartbeet getting ability in #13858 to gracefully
  shut down. When we saw the message come through, we didn't fully
  clean up the connection to the web node.

* The second resulted when Redis disappeared. We still want to exit in
  that case, but it's better to log a message and exit gracefully
  instead of crashing out.

Signed-off-by: Rick Elrod <rick@elrod.me>
AlanCoding pushed a commit to AlanCoding/awx that referenced this pull request Jun 5, 2023
Catch SIGTERM or SIGINT and send offline message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants