Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liveness probe kills Seldon engine container with long init waiting time (Python wrapper) #674

Closed
axsaucedo opened this issue Jul 3, 2019 · 7 comments · Fixed by #684
Closed
Assignees
Projects

Comments

@axsaucedo
Copy link
Contributor

The Seldon engine containers get killed if the init function takes more than 20 seconds. This is the default behaviour as the init function of the python wrapper gets triggered after the flask server has been initialised and the pod has been marked as initialised. This is problematic if the container needs to perform init work that depends on parameters passed by the init function (through Seldon deploy params). An example of this is the PyTorch Hub integration which crashes if the model selected is large enough for the download to take longer than 20 seconds (#642).

It is possible to avoid this issue if the init work is done before the wrapping class definition (I.e accessing the parameters directly from the env variables). The liveness probe doesn't get triggered in this case as the container is not marked as ready given the download executes synchronously before even reaching the wrapper definition.

As we look to use reusable model servers as a more common design pattern, where the model weights or binaries are downloaded from an object store, we will need to think of an standard way to handle longer loading times during initialisation.

This could be tackled by making the parameters accessible as env variables by default, or alternatively by adding a PRE function that would execute before flask server initialisation.

@ukclivecox ukclivecox added this to To do in 0.4.0 via automation Jul 3, 2019
@ryandawsonuk
Copy link
Contributor

I think it's 20 seconds init and then 3 retries so effectively 35 seconds. I recently had to increase the retries so now it's effectively 55 seconds - SeldonIO/seldon-operator#22 It would be good to make that configurable as the linked issue shows this can be an issue even for cases that are not downloading a model.

@axsaucedo
Copy link
Contributor Author

I've had a look + chat with @gsunner to see what are the changes necessary. It seems that the changes would require small modifications to the Python wrapper, together with potential changes to the Seldon Operator. Both of these are explained below.

For context, the readiness and liveness probes are currently both configured to the /ready endpoint. This endpoint becomes available once the Flask model gets initialised (even they are not explicitly defined). In order for us to be able to have a more flexible usage of the liveness and readiness probes we'll have to first separate the endpoints (i.e. have the readiness probe reach out to /ready and the liveness probe reach out to /liveness).

Python Wrapper Changes

In regards to the changes to the Python wrapper, there would be a new load function added to the Wrapper class that is run after flask is initialised. The workflow would be the following:

  1. user_object has is_ready variable set to False
  2. Flask is initialised with a new /ready and /liveness endpoints - the liveness which returns True always, and the ready endpoint which will only return True once the "load" function is ran
    the requirements would include the addition of two API endpoints - namely /ready and /liveness.
  3. The load function in the python wrapper is called
  4. The user_object cchanges the is_ready variable to True

Operator changes

The operator would also have to change, so that the seldon engine containers are deplyoed with a liveness probe that reaches to the /liveness endpoint instead of the /ready endpoint.

It would be great to get thoughts on these planned changes, especially from @cliveseldon as it seems it will need changes on the operator (for the liveness endpoint).

@ukclivecox
Copy link
Contributor

I think there are two issues to handle

  1. Handle liveness/readiness in all our wrapped images. We would need to change the wrappers for all languages to conform to these new specs
  2. Handle arbitrary images which may have custom servers.

We could say for 2. you need to provide your own readiness and liveness probes.

@axsaucedo
Copy link
Contributor Author

Oh interesting. I agree with those points.

One thing that is in my mind as well is that we may not need to modify the current wrappers to change the liveness probe URL.

The reasoning behind this is because it seems that right now both readiness and liveness probes point to the /ready path, which is not implemented in the Python wrapper, and after having a look at the Java wrapper it seems it's also not implemented. Is there an implementation for the /ready url in the current wrappers? If not then it may just be that changing the liveness probe to point to /liveness wouldn't break at least these two wrappers. If that is the case, then it would be possible to start with the change for the probes. After that the work for the python wrapper could include the load functionality so that it's executed after the flask wrapper has started, and we can then think how this load SDK addition could be implemneted in the Java (and R) wrappers.

@ukclivecox
Copy link
Contributor

I don't think this is correct. See here

@ukclivecox ukclivecox moved this from To do to In progress in 0.4.0 Jul 25, 2019
@ukclivecox ukclivecox self-assigned this Jul 25, 2019
@ryandawsonuk
Copy link
Contributor

ryandawsonuk commented Aug 1, 2019

Note that we have two sets of probes - the probes for the model container and the probes for engine container. I suspect what @axsaucedo was referring to about liveness and readiness pointing to the same path was the engine probes. This came up yesterday with a timeout for RH opendatahub. The fix for that was:

SeldonIO/seldon-operator#45

This is relevant because the engine ready endpoint checks that the whole graph is ready, including the model container.

@axsaucedo
Copy link
Contributor Author

This can be closed as it can be resolved by adding your own liveness probe to the model.

0.4.0 automation moved this from In progress to Done Aug 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
0.4.0
  
Done
Development

Successfully merging a pull request may close this issue.

3 participants