-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catch SIGTERM or SIGINT and send offline message #13858
Conversation
@@ -56,6 +57,8 @@ def do_hearbeat_loop(self): | |||
logger.debug('Sending heartbeat') | |||
conn.notify('web_heartbeet', self.construct_payload()) | |||
time.sleep(settings.BROADCAST_WEBSOCKET_BEACON_FROM_WEB_RATE_SECONDS) | |||
signal.signal(signal.SIGTERM, conn.notify('web_heartbeet', self.construct_payload(action='offline'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the signal only needs to be called once I think. if so we may move this to before the while True loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need something in the loop to check that the signal had been received and break
out of the loop too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah you're right we need to do a sys.exit
we can follow the pattern here https://github.com/fosterseth/awx/blob/e42461d96ffe3979c8d5c4b9e978cee0809fef35/awx/main/scheduler/task_manager.py#L121
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the signal only needs to be called once I think. if so we may move this to before the while True loop
my understanding is that if we want to catch both we need to call signal on both similar to how we do in base.py https://github.com/ansible/awx/blob/devel/awx/main/dispatch/worker/base.py#L129-L130
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jessicamack you're right we need both for SIGTERM and SIGINT
what I meant is that I don't think we need to put them into the while True block
I tried to test this locally. Steps:
When I do this, in the 2nd tab, I find this:
Before your branch, it printed "Terminated" and exited, which is desirable. |
ea26861
to
24fce59
Compare
you might have a problem with an outdated |
signal.signal(signal.SIGTERM, self.notify_listener_and_exit) | ||
signal.signal(signal.SIGINT, self.notify_listener_and_exit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make more sense to register these in handle()
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we can drop the TODO
above handle()
then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signal catches have been moved to handle()
and TODO
has been removed
def notify_listener_and_exit(self, *args): | ||
with pg_bus_conn(new_connection=False) as conn: | ||
conn.notify('web_heartbeet', self.construct_payload(action='offline')) | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the normal way to exit the program, so we probably should exit with 0 instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more inline comment. I think after that fix, this is good to go.
related to #13322 |
Signed-off-by: jessicamack <jmack@redhat.com>
Signed-off-by: jessicamack <jmack@redhat.com>
Signed-off-by: jessicamack <jmack@redhat.com>
Signed-off-by: jessicamack <jmack@redhat.com>
Signed-off-by: jessicamack <jmack@redhat.com>
Signed-off-by: jessicamack <jmack@redhat.com>
Signed-off-by: jessicamack <jmack@redhat.com>
7799c7d
to
d282393
Compare
This fixes two different exceptions in wsrelay. - One resulted from heartbeet getting ability in ansible#13858 to gracefully shut down. When we saw the message come through, we didn't fully clean up the connection to the web node. - The second resulted when Redis disappeared. We still want to exit in that case, but it's better to log a message and exit gracefully instead of crashing out. Signed-off-by: Rick Elrod <rick@elrod.me>
This fixes two different exceptions in wsrelay. - One resulted from heartbeet getting ability in ansible#13858 to gracefully shut down. When we saw the message come through, we didn't fully clean up the connection to the web node. - The second resulted when Redis disappeared. We still want to exit in that case, but it's better to log a message and exit gracefully instead of crashing out. Signed-off-by: Rick Elrod <rick@elrod.me>
This fixes two different exceptions in wsrelay. - One resulted from heartbeet getting ability in ansible#13858 to gracefully shut down. When we saw the message come through, we didn't fully clean up the connection to the web node. - The second resulted when Redis disappeared. We still want to exit in that case, but it's better to log a message and exit gracefully instead of crashing out. Signed-off-by: Rick Elrod <rick@elrod.me>
This fixes two different exceptions in wsrelay. * One resulted from heartbeet getting ability in ansible#13858 to gracefully shut down. When we saw the message come through, we didn't fully clean up the connection to the web node. * The second resulted when Redis disappeared. We still want to exit in that case, but it's better to log a message and exit gracefully instead of crashing out. Signed-off-by: Rick Elrod <rick@elrod.me>
This fixes two different exceptions in wsrelay. * One resulted from heartbeet getting ability in #13858 to gracefully shut down. When we saw the message come through, we didn't fully clean up the connection to the web node. * The second resulted when Redis disappeared. We still want to exit in that case, but it's better to log a message and exit gracefully instead of crashing out. Signed-off-by: Rick Elrod <rick@elrod.me>
Catch SIGTERM or SIGINT and send offline message
SUMMARY
Handle SIGTERM and SIGINT in the hearbeet daemon. Relates to #13320
ISSUE TYPE
COMPONENT NAME
AWX VERSION
ADDITIONAL INFORMATION