Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drain connections for python3-http #68

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

alexellis
Copy link
Member

@alexellis alexellis commented May 25, 2023

Description

Drain connections for python3-http

Motivation and Context

When used with OpenFaaS Standard/Enterprise, the python3-http template's handler will now ignore SIGTERM allowing the watchdog and Kubernetes to handle the shutdown.

When there are ongoing requests, these will be processed before exiting.

When there are no ongoing requests, the function will exit immediately.

How Has This Been Tested?

Tested with OpenFaaS Standard a long running sleep function which went into a Terminating status. The function continued to execute its sleep for the whole duration, whilst the new replica came online and was ready in the meantime.

This is the same approach tested for the golang-http templates.

After:

py-long-7cfd5f7699-swwrd py-long 2023/05/25 10:48:04 SIGTERM: no new connections in 5s
py-long-7cfd5f7699-swwrd py-long 2023/05/25 10:48:04 Removing lock-file : /tmp/.lock
py-long-7cfd5f7699-swwrd py-long 2023/05/25 10:48:04 stderr: Function got SIGTERM, hanging on for up to 10m2s
py-long-7cfd5f7699-swwrd py-long 2023/05/25 10:48:09 No new connections allowed, draining: 0 requests
py-long-7cfd5f7699-swwrd py-long 2023/05/25 10:48:09 Exiting. Active connections: 0

Code for function: https://github.com/alexellis/go-long/blob/master/stack.yml#L32

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

No harm change but may require a minimum Python 3 version. I was able to execute test code with 3.8, which is the lowest we advertise to support:

python3 --version
Python 3.8.10

Example:

#!/usr/bin/env python
import os
import sys
import signal
import time


def SignalHandler(SignalNumber, Frame):
    timeout = os.getenv("write_timeout")
    sys.stderr.write('Function got SIGTERM, draining for up to: {}\n'.format(timeout))
    sys.stderr.flush()

if __name__ == '__main__':

    signal.signal(signal.SIGTERM, SignalHandler)

    # Simulate HTTP server etc
    time.sleep(500)

The change is being made for Kubiya who needed a graceful drain of long-running functions.

This is not new behaviour and is already used in the Go templates. Over time we will add it to all officially supported templates.

cc @shakedaskayo @koss110 @LucasRoesler

I'll get this merged, if there is feedback, please let me know and I'll be happy to consider making changes from people who are more experienced with Python than myself.

https://github.com/openfaas/golang-http-template/blob/master/template/golang-middleware/main.go#L45

When used with OpenFaaS Standard/Enterprise, the python3-http
template's handler will now ignore SIGTERM allowing the
watchdog and Kubernetes to handle the shutdown.

When there are ongoing requests, these will be processed
before exiting.

When there are no ongoing requests, the function will
exit immediately.

Tested with OpenFaaS Standard a long running sleep function which
went into a Terminating status. The function continued to
execute its sleep for the whole duration, whilst the new
replica came online and was ready in the meantime.

This is the same approach tested for the golang-http templates.

Signed-off-by: Alex Ellis (OpenFaaS Ltd) <alexellis2@gmail.com>
@alexellis
Copy link
Member Author

As per testing on the community call, the function will sleep for a maximum of healthcheck_interval - then exit - when all connections to the watchdog have completed.


from function import handler

app = Flask(__name__)

def SignalHandler(SignalNumber, Frame):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recording my thoughts from the community call here.

I don't think this is required in any of the templates. The watchdog is capable of implementing the required graceful shutdown logic without any changes or even knowledge in the specific template.

My definition of graceful shutdown is that

  1. the orchestration sends the signal to stop the service/function.
  2. the function should now reject any new requests
  3. the function should start a timer and allow any already running requests to finish
  4. the function stops when (a) all requests are completed before the timer or (b) the timer is reached and the function forcefully stops

Due to the design of the watchdog, it is capable of implementing all of this logic because it handles all of the requests and the signals. Once we are satisfied with the graceful shutdown implementation in of-watchdog, then any template that uses the watchdog should be considered as having a graceful shutdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants