Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: catch more shutdown issues to resolve shutdown without dropping #65

Merged
merged 1 commit into from
Apr 10, 2024

Conversation

fubuloubu
Copy link
Member

@fubuloubu fubuloubu commented Apr 8, 2024

What I did

Noticed when you shutdown unexpectedly, there's a lot of issues in the trace about not properly handling async tasks

How I did it

some try..catch statements

How to verify it

run and app and randomly press ctrl+C

Checklist

  • Passes all linting checks (pre-commit and CI jobs)
  • New test cases have been added and are passing
  • Documentation has been updated
  • PR title follows Conventional Commit standard (will be automatically included in the changelog)

Copy link
Member

@mikeshultz mikeshultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fubuloubu fubuloubu mentioned this pull request Apr 9, 2024
2 tasks
@johnson2427
Copy link
Contributor

It looks to me like we have 1 connection to the message broker per application. If so, then we need to add a task to the task queue that handles the connection termination prior to the application terminating so we don't have accidental open connections remaining.

import signal

class SigtermHandler:

    def __init__(self):
        self.should_terminate = False
        signal.signal(signal.SIGTERM, self._set_terminate)

    def _set_terminate(self):
        self.should_terminate = True

    def check_quit_signal(self):
        return self.should_terminate
def __init__(self):
    ...
    self.sigterm_handler = SigtermHandler

async def close_conn(self): # Add this method to the list of tasks that you run in the gather
    while not self.sigterm_handler.check_quit_signal():
        await.sleep(1)
    terminate_connection()  # some function to terminate connection to message broker

This is the basic idea of this, I'll keep reviewing to make sure I'm not completely off base here. But this is what I've done with Kafka and with RabbitMQ in Python

@fubuloubu
Copy link
Member Author

It looks to me like we have 1 connection to the message broker per application. If so, then we need to add a task to the task queue that handles the connection termination prior to the application terminating so we don't have accidental open connections remaining.

import signal

class SigtermHandler:

    def __init__(self):
        self.should_terminate = False
        signal.signal(signal.SIGTERM, self._set_terminate)

    def _set_terminate(self):
        self.should_terminate = True

    def check_quit_signal(self):
        return self.should_terminate
def __init__(self):
    ...
    self.sigterm_handler = SigtermHandler

async def close_conn(self): # Add this method to the list of tasks that you run in the gather
    while not self.sigterm_handler.check_quit_signal():
        await.sleep(1)
    terminate_connection()  # some function to terminate connection to message broker

This is the basic idea of this, I'll keep reviewing to make sure I'm not completely off base here. But this is what I've done with Kafka and with RabbitMQ in Python

That seems reasonable

Just to lay out what we have going on here, we have a couple of different things in motion that need to gracefully shutdown:

  • broker needs to stop producing tasks (tasks are created based on RPC subscriptions module)
  • RPC subscriptions should be independently unsubscribed (so they stop producing new tasks)
  • websocket connections (that provide conduict for RPC subscriptions to come in) should be disconnected
  • all pending tasks should be finished (using the silverback worker command this is a separate process)
  • finally, we can exit Runner.run() method

@mikeshultz is working on some of this for the cluster (chat more offline about what the needs are there e.g. upgrading worker process to a new revision of the container), so I think it'd be great to collab more about what a "proper shutdown" scenario looks like, both for local dev (here in this SDK) and for the cluster (talk more about that outside of github)

@mikeshultz
Copy link
Member

@mikeshultz is working on some of this for the cluster (chat more offline about what the needs are there e.g. upgrading worker process to a new revision of the container), so I think it'd be great to collab more about what a "proper shutdown" scenario looks like, both for local dev (here in this SDK) and for the cluster (talk more about that outside of github)

Graceful shutdown might be something we should look into at some point, though I don't think that's what this PR is about.

@fubuloubu
Copy link
Member Author

@mikeshultz is working on some of this for the cluster (chat more offline about what the needs are there e.g. upgrading worker process to a new revision of the container), so I think it'd be great to collab more about what a "proper shutdown" scenario looks like, both for local dev (here in this SDK) and for the cluster (talk more about that outside of github)

Graceful shutdown might be something we should look into at some point, though I don't think that's what this PR is about.

This PR was just some of my attempts at finding the spots where it is happening, I'm happy to close it in favor of a more comprehensive solution to the problem considering graceful shutdown

@mikeshultz
Copy link
Member

@mikeshultz is working on some of this for the cluster (chat more offline about what the needs are there e.g. upgrading worker process to a new revision of the container), so I think it'd be great to collab more about what a "proper shutdown" scenario looks like, both for local dev (here in this SDK) and for the cluster (talk more about that outside of github)

Graceful shutdown might be something we should look into at some point, though I don't think that's what this PR is about.

This PR was just some of my attempts at finding the spots where it is happening, I'm happy to close it in favor of a more comprehensive solution to the problem considering graceful shutdown

No, this is a good improvement I think we should land. We shouldn't scrap incremental improvement just because it doesn't solve world hunger. We can iterate on graceful shutdown later.

@fubuloubu
Copy link
Member Author

@mikeshultz is working on some of this for the cluster (chat more offline about what the needs are there e.g. upgrading worker process to a new revision of the container), so I think it'd be great to collab more about what a "proper shutdown" scenario looks like, both for local dev (here in this SDK) and for the cluster (talk more about that outside of github)

Graceful shutdown might be something we should look into at some point, though I don't think that's what this PR is about.

This PR was just some of my attempts at finding the spots where it is happening, I'm happy to close it in favor of a more comprehensive solution to the problem considering graceful shutdown

No, this is a good improvement I think we should land. We shouldn't scrap incremental improvement just because it doesn't solve world hunger. We can iterate on graceful shutdown later.

Alright, will capture some of this in a new issue, and we can merge this now.

@fubuloubu fubuloubu merged commit 4a1f70b into main Apr 10, 2024
26 checks passed
@fubuloubu fubuloubu deleted the fix/shutdown-issues branch April 10, 2024 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants