Liveness probe kills Seldon engine container with long init waiting time (Python wrapper) #674

axsaucedo · 2019-07-03T20:59:04Z

The Seldon engine containers get killed if the init function takes more than 20 seconds. This is the default behaviour as the init function of the python wrapper gets triggered after the flask server has been initialised and the pod has been marked as initialised. This is problematic if the container needs to perform init work that depends on parameters passed by the init function (through Seldon deploy params). An example of this is the PyTorch Hub integration which crashes if the model selected is large enough for the download to take longer than 20 seconds (#642).

It is possible to avoid this issue if the init work is done before the wrapping class definition (I.e accessing the parameters directly from the env variables). The liveness probe doesn't get triggered in this case as the container is not marked as ready given the download executes synchronously before even reaching the wrapper definition.

As we look to use reusable model servers as a more common design pattern, where the model weights or binaries are downloaded from an object store, we will need to think of an standard way to handle longer loading times during initialisation.

This could be tackled by making the parameters accessible as env variables by default, or alternatively by adding a PRE function that would execute before flask server initialisation.

ryandawsonuk · 2019-07-04T11:23:15Z

I think it's 20 seconds init and then 3 retries so effectively 35 seconds. I recently had to increase the retries so now it's effectively 55 seconds - SeldonIO/seldon-operator#22 It would be good to make that configurable as the linked issue shows this can be an issue even for cases that are not downloading a model.

axsaucedo · 2019-07-10T17:54:08Z

I've had a look + chat with @gsunner to see what are the changes necessary. It seems that the changes would require small modifications to the Python wrapper, together with potential changes to the Seldon Operator. Both of these are explained below.

For context, the readiness and liveness probes are currently both configured to the /ready endpoint. This endpoint becomes available once the Flask model gets initialised (even they are not explicitly defined). In order for us to be able to have a more flexible usage of the liveness and readiness probes we'll have to first separate the endpoints (i.e. have the readiness probe reach out to /ready and the liveness probe reach out to /liveness).

Python Wrapper Changes

In regards to the changes to the Python wrapper, there would be a new load function added to the Wrapper class that is run after flask is initialised. The workflow would be the following:

user_object has is_ready variable set to False
Flask is initialised with a new /ready and /liveness endpoints - the liveness which returns True always, and the ready endpoint which will only return True once the "load" function is ran
the requirements would include the addition of two API endpoints - namely /ready and /liveness.
The load function in the python wrapper is called
The user_object cchanges the is_ready variable to True

Operator changes

The operator would also have to change, so that the seldon engine containers are deplyoed with a liveness probe that reaches to the /liveness endpoint instead of the /ready endpoint.

It would be great to get thoughts on these planned changes, especially from @cliveseldon as it seems it will need changes on the operator (for the liveness endpoint).

ukclivecox · 2019-07-10T18:46:40Z

I think there are two issues to handle

Handle liveness/readiness in all our wrapped images. We would need to change the wrappers for all languages to conform to these new specs
Handle arbitrary images which may have custom servers.

We could say for 2. you need to provide your own readiness and liveness probes.

axsaucedo · 2019-07-11T05:43:28Z

Oh interesting. I agree with those points.

One thing that is in my mind as well is that we may not need to modify the current wrappers to change the liveness probe URL.

The reasoning behind this is because it seems that right now both readiness and liveness probes point to the /ready path, which is not implemented in the Python wrapper, and after having a look at the Java wrapper it seems it's also not implemented. Is there an implementation for the /ready url in the current wrappers? If not then it may just be that changing the liveness probe to point to /liveness wouldn't break at least these two wrappers. If that is the case, then it would be possible to start with the change for the probes. After that the work for the python wrapper could include the load functionality so that it's executed after the flask wrapper has started, and we can then think how this load SDK addition could be implemneted in the Java (and R) wrappers.

ukclivecox · 2019-07-11T07:43:05Z

I don't think this is correct. See here

ryandawsonuk · 2019-08-01T08:12:33Z

Note that we have two sets of probes - the probes for the model container and the probes for engine container. I suspect what @axsaucedo was referring to about liveness and readiness pointing to the same path was the engine probes. This came up yesterday with a timeout for RH opendatahub. The fix for that was:

SeldonIO/seldon-operator#45

This is relevant because the engine ready endpoint checks that the whole graph is ready, including the model container.

axsaucedo · 2019-08-13T12:40:31Z

This can be closed as it can be resolved by adding your own liveness probe to the model.

ukclivecox added this to To do in 0.4.0 via automation Jul 3, 2019

axsaucedo mentioned this issue Jul 5, 2019

PyTorch Hub General Seldon Deployment Example #642

Closed

ukclivecox mentioned this issue Jul 25, 2019

WIP: Update python wrapper to use gunicorn #684

Merged

ukclivecox moved this from To do to In progress in 0.4.0 Jul 25, 2019

ukclivecox self-assigned this Jul 25, 2019

ryandawsonuk mentioned this issue Aug 2, 2019

prepackaged/standalone servers to download models from initContainers #747

Closed

pisymbol mentioned this issue Aug 5, 2019

Liveness probe kills seldon engine container when model predict function takes a long time to send a result #753

Closed

ukclivecox assigned axsaucedo and unassigned ukclivecox Aug 9, 2019

axsaucedo closed this as completed Aug 13, 2019

0.4.0 automation moved this from In progress to Done Aug 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Liveness probe kills Seldon engine container with long init waiting time (Python wrapper) #674

Liveness probe kills Seldon engine container with long init waiting time (Python wrapper) #674

axsaucedo commented Jul 3, 2019

ryandawsonuk commented Jul 4, 2019

axsaucedo commented Jul 10, 2019

ukclivecox commented Jul 10, 2019

axsaucedo commented Jul 11, 2019

ukclivecox commented Jul 11, 2019

ryandawsonuk commented Aug 1, 2019 •

edited

Loading

axsaucedo commented Aug 13, 2019

Liveness probe kills Seldon engine container with long init waiting time (Python wrapper) #674

Liveness probe kills Seldon engine container with long init waiting time (Python wrapper) #674

Comments

axsaucedo commented Jul 3, 2019

ryandawsonuk commented Jul 4, 2019

axsaucedo commented Jul 10, 2019

Python Wrapper Changes

Operator changes

ukclivecox commented Jul 10, 2019

axsaucedo commented Jul 11, 2019

ukclivecox commented Jul 11, 2019

ryandawsonuk commented Aug 1, 2019 • edited Loading

axsaucedo commented Aug 13, 2019

ryandawsonuk commented Aug 1, 2019 •

edited

Loading