fix: catch more shutdown issues to resolve shutdown without dropping #65

fubuloubu · 2024-04-08T21:20:19Z

What I did

Noticed when you shutdown unexpectedly, there's a lot of issues in the trace about not properly handling async tasks

How I did it

some try..catch statements

How to verify it

run and app and randomly press ctrl+C

Checklist

Passes all linting checks (pre-commit and CI jobs)
New test cases have been added and are passing
Documentation has been updated
PR title follows Conventional Commit standard (will be automatically included in the changelog)

mikeshultz

LGTM

johnson2427 · 2024-04-10T13:38:28Z

It looks to me like we have 1 connection to the message broker per application. If so, then we need to add a task to the task queue that handles the connection termination prior to the application terminating so we don't have accidental open connections remaining.

import signal

class SigtermHandler:

    def __init__(self):
        self.should_terminate = False
        signal.signal(signal.SIGTERM, self._set_terminate)

    def _set_terminate(self):
        self.should_terminate = True

    def check_quit_signal(self):
        return self.should_terminate

def __init__(self):
    ...
    self.sigterm_handler = SigtermHandler

async def close_conn(self): # Add this method to the list of tasks that you run in the gather
    while not self.sigterm_handler.check_quit_signal():
        await.sleep(1)
    terminate_connection()  # some function to terminate connection to message broker

This is the basic idea of this, I'll keep reviewing to make sure I'm not completely off base here. But this is what I've done with Kafka and with RabbitMQ in Python

fubuloubu · 2024-04-10T15:11:08Z

It looks to me like we have 1 connection to the message broker per application. If so, then we need to add a task to the task queue that handles the connection termination prior to the application terminating so we don't have accidental open connections remaining.
import signal

class SigtermHandler:

    def __init__(self):
        self.should_terminate = False
        signal.signal(signal.SIGTERM, self._set_terminate)

    def _set_terminate(self):
        self.should_terminate = True

    def check_quit_signal(self):
        return self.should_terminate
def __init__(self):
    ...
    self.sigterm_handler = SigtermHandler

async def close_conn(self): # Add this method to the list of tasks that you run in the gather
    while not self.sigterm_handler.check_quit_signal():
        await.sleep(1)
    terminate_connection()  # some function to terminate connection to message broker
This is the basic idea of this, I'll keep reviewing to make sure I'm not completely off base here. But this is what I've done with Kafka and with RabbitMQ in Python

That seems reasonable

Just to lay out what we have going on here, we have a couple of different things in motion that need to gracefully shutdown:

broker needs to stop producing tasks (tasks are created based on RPC subscriptions module)
RPC subscriptions should be independently unsubscribed (so they stop producing new tasks)
websocket connections (that provide conduict for RPC subscriptions to come in) should be disconnected
all pending tasks should be finished (using the silverback worker command this is a separate process)
finally, we can exit Runner.run() method

@mikeshultz is working on some of this for the cluster (chat more offline about what the needs are there e.g. upgrading worker process to a new revision of the container), so I think it'd be great to collab more about what a "proper shutdown" scenario looks like, both for local dev (here in this SDK) and for the cluster (talk more about that outside of github)

mikeshultz · 2024-04-10T16:06:00Z

@mikeshultz is working on some of this for the cluster (chat more offline about what the needs are there e.g. upgrading worker process to a new revision of the container), so I think it'd be great to collab more about what a "proper shutdown" scenario looks like, both for local dev (here in this SDK) and for the cluster (talk more about that outside of github)

Graceful shutdown might be something we should look into at some point, though I don't think that's what this PR is about.

fubuloubu · 2024-04-10T17:08:27Z

@mikeshultz is working on some of this for the cluster (chat more offline about what the needs are there e.g. upgrading worker process to a new revision of the container), so I think it'd be great to collab more about what a "proper shutdown" scenario looks like, both for local dev (here in this SDK) and for the cluster (talk more about that outside of github)

Graceful shutdown might be something we should look into at some point, though I don't think that's what this PR is about.

This PR was just some of my attempts at finding the spots where it is happening, I'm happy to close it in favor of a more comprehensive solution to the problem considering graceful shutdown

mikeshultz · 2024-04-10T17:34:37Z

@mikeshultz is working on some of this for the cluster (chat more offline about what the needs are there e.g. upgrading worker process to a new revision of the container), so I think it'd be great to collab more about what a "proper shutdown" scenario looks like, both for local dev (here in this SDK) and for the cluster (talk more about that outside of github)

Graceful shutdown might be something we should look into at some point, though I don't think that's what this PR is about.

This PR was just some of my attempts at finding the spots where it is happening, I'm happy to close it in favor of a more comprehensive solution to the problem considering graceful shutdown

No, this is a good improvement I think we should land. We shouldn't scrap incremental improvement just because it doesn't solve world hunger. We can iterate on graceful shutdown later.

fubuloubu · 2024-04-10T19:03:13Z

@mikeshultz is working on some of this for the cluster (chat more offline about what the needs are there e.g. upgrading worker process to a new revision of the container), so I think it'd be great to collab more about what a "proper shutdown" scenario looks like, both for local dev (here in this SDK) and for the cluster (talk more about that outside of github)

Graceful shutdown might be something we should look into at some point, though I don't think that's what this PR is about.

This PR was just some of my attempts at finding the spots where it is happening, I'm happy to close it in favor of a more comprehensive solution to the problem considering graceful shutdown

No, this is a good improvement I think we should land. We shouldn't scrap incremental improvement just because it doesn't solve world hunger. We can iterate on graceful shutdown later.

Alright, will capture some of this in a new issue, and we can merge this now.

fubuloubu mentioned this pull request Apr 8, 2024

fix: shutdown issues [SBK-358] #43

Closed

2 tasks

fubuloubu requested review from mikeshultz and johnson2427 April 8, 2024 21:23

fix: catch more shutdown issues to resolve shutdown without dropping

c934cbc

fubuloubu force-pushed the fix/shutdown-issues branch from aeec475 to c934cbc Compare April 8, 2024 22:02

fubuloubu requested a review from NotPeopling2day April 8, 2024 23:35

mikeshultz approved these changes Apr 9, 2024

View reviewed changes

fubuloubu mentioned this pull request Apr 9, 2024

refactor: allow multiple handlers #66

Merged

2 tasks

fubuloubu merged commit 4a1f70b into main Apr 10, 2024
26 checks passed

fubuloubu deleted the fix/shutdown-issues branch April 10, 2024 19:07

fubuloubu mentioned this pull request Apr 10, 2024

Gracefully resolve shutdown #67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: catch more shutdown issues to resolve shutdown without dropping #65

fix: catch more shutdown issues to resolve shutdown without dropping #65

fubuloubu commented Apr 8, 2024 •

edited

Loading

mikeshultz left a comment

johnson2427 commented Apr 10, 2024

fubuloubu commented Apr 10, 2024

mikeshultz commented Apr 10, 2024

fubuloubu commented Apr 10, 2024

mikeshultz commented Apr 10, 2024

fubuloubu commented Apr 10, 2024

fix: catch more shutdown issues to resolve shutdown without dropping #65

fix: catch more shutdown issues to resolve shutdown without dropping #65

Conversation

fubuloubu commented Apr 8, 2024 • edited Loading

What I did

How I did it

How to verify it

Checklist

mikeshultz left a comment

Choose a reason for hiding this comment

johnson2427 commented Apr 10, 2024

fubuloubu commented Apr 10, 2024

mikeshultz commented Apr 10, 2024

fubuloubu commented Apr 10, 2024

mikeshultz commented Apr 10, 2024

fubuloubu commented Apr 10, 2024

fubuloubu commented Apr 8, 2024 •

edited

Loading