WIP: Update python wrapper to use gunicorn #684

ukclivecox · 2019-07-12T10:16:49Z

Adds a command line arg --workers defaults to 4 (only uses gunicorn if workers>1)
Adds an optional load() method to user_model prototype. This is called after init in each worker.
- Needed for Tensorflow where loading your graph needs to be done in each process and not before gunicorn forks

Fixes #453
Fixes #383
Fixes #674

axsaucedo · 2019-07-22T16:54:13Z

It woudl be worth adding to the flag to the s2i wrapper (the run file) so that the number of workers can be changed when deploying.

The reason why it is important to be able to change the number of workers is because there would be situations where you would only want to have 1 worker per pod - an example would be tasks that have heavy resource usage, such as memory.

If you have an ML graph that takes 50gb RAM for example, you may want your scaling to be horizontally, as if you request 50gb for the pods, as a user they may be confused that their pods keep going over their requests.

ukclivecox · 2019-07-24T12:48:34Z

@axsaucedo You can set the env vairaible GUNICORN_WORKERS to limit the number of workers so don't think an s2i change is needed.

axsaucedo · 2019-07-24T14:10:48Z

Ohh my bad, I had not seen the os.environ.get("GUNICORN_WORKERS", "4") - awesome! Thanks for the clarification @cliveseldon

alexlatchford · 2019-07-26T20:53:20Z

I've been evaluating Seldon this week and this was on my list, nice job as I'm guessing this will land well within our time frame!

Not sure if this is the place to request this but would be great to include an option to choose the worker type, some of our models at inference time go out to feature store to enrich input requests. We'd be doing at as a part of an input_transformer component likely which we could set to just have a single worker and have it feed a model component with many times more workers.

PS. Apologies if there is already a different pattern for this!

alexlatchford · 2019-07-26T21:18:53Z

Digging in some more, looking at the server implementations as they load the models in memory on the first prediction. This would be done post process fork resulting in the model being loaded into memory as many times are there are workers.

If you also expose the preload_app option for Gunicorn and set it to True and if you've loaded the model during the initial phase (ie. not on the first network call) then it'll keep only one copy in memory and not duplicate the memory space but be accessible for all child processes. This is something we do here at Zillow to keep the memory footprint down in our current iteration of serving infrastructure but keeping a larger number of worker processes 😄

This does have some downsides as it's incompatible with the reload option which enables hot-code reloading (useful for local development) but we just turn off preload_app locally to fix that.

axsaucedo · 2019-07-27T06:27:44Z

Very interesting @alexlatchford, that's one of the challenges I outlined above, the preload_app functionality would certainly allow for the workers to only load one copy of the model reducing memory footprint. Of course there is still a challenge if the features extracted are big but that is a different challenge. I am not sure how much work may be involved to extend the wrapper to support this, definitely worth checking out, but depending on that we may have to open a separate issue to do this once the initial/basic move to gunicorn is landed.

examples/models/deep_mnist/DeepMnist.py

axsaucedo · 2019-08-12T13:08:47Z

examples/models/deep_mnist/DeepMnist.py

        self.class_names = ["class:{}".format(str(i)) for i in range(10)]
+
+    def load(self):
+        print("Loading model",os.getpid())


We could use log.debug as convention, not critical tho

lennon310 · 2019-08-13T04:13:11Z

doc/source/python/python_component.md

+
+### Gunicorn workers
+
+By default 4 gunicorn workers will be created for your model. If you wish to change this default then add the environment variable GUNICORN_WORKERS to the container for your model. An example is shown below:


If you are using gunicorn workers and make this variable overridable, you probably also need to consider (1) the preload option to reduce the memory usage, and (2) max_requests (probably along with max_requests_jitter) to restart the workers to avoid their memory leak. These are the opts in gunicorn command, but I guess there should be corresponding APIs that provide those options too.

Ref: http://docs.gunicorn.org/en/stable/settings.html

I think we would need to make peload optional as not sure it will work for Tensorflow graphs that need a separate session in each fork.

Initial gunicorn work

522de0b

seldondev added the size/M label Jul 12, 2019

ukclivecox requested a review from gsunner July 12, 2019 10:17

ukclivecox changed the title ~~Update python wrapper to use gunicorn~~ WIP: Update python wrapper to use gunicorn Jul 12, 2019

ryandawsonuk mentioned this pull request Jul 22, 2019

use gunicorn #635

Closed

rebase

d5164f3

update docs and test with mean classifier

158a139

seldondev added size/L and removed size/M labels Jul 26, 2019

axsaucedo mentioned this pull request Aug 5, 2019

Liveness probe kills seldon engine container when model predict function takes a long time to send a result #753

Closed

ukclivecox added 2 commits August 7, 2019 16:32

remove redundant pass

a42bac6

updated notebooks

5c808e0

seldondev added size/XL and removed size/L labels Aug 7, 2019

Merge branch 'master' into gunicorn

bc4ad70

ukclivecox added this to In progress in 0.4.0 via automation Aug 8, 2019

ukclivecox requested review from axsaucedo and removed request for gsunner August 8, 2019 10:06

ukclivecox self-assigned this Aug 9, 2019

axsaucedo reviewed Aug 12, 2019

View reviewed changes

examples/models/deep_mnist/DeepMnist.py Show resolved Hide resolved

axsaucedo reviewed Aug 12, 2019

View reviewed changes

lennon310 reviewed Aug 13, 2019

View reviewed changes

ukclivecox added 2 commits August 14, 2019 10:45

Merge branch 'master' into gunicorn

91f983d

default workers to 1 for gunicorn so its not used by default for now

10a3f53

update gunicorn docs

a2f73c7

ukclivecox merged commit fa9b745 into SeldonIO:master Aug 14, 2019

0.4.0 automation moved this from In progress to Done Aug 14, 2019

ukclivecox deleted the gunicorn branch February 14, 2020 11:37

axsaucedo unassigned ukclivecox May 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Update python wrapper to use gunicorn #684

WIP: Update python wrapper to use gunicorn #684

ukclivecox commented Jul 12, 2019 •

edited

Loading

axsaucedo commented Jul 22, 2019

ukclivecox commented Jul 24, 2019

axsaucedo commented Jul 24, 2019

alexlatchford commented Jul 26, 2019

alexlatchford commented Jul 26, 2019 •

edited

Loading

axsaucedo commented Jul 27, 2019

axsaucedo Aug 12, 2019

lennon310 Aug 13, 2019

ukclivecox Aug 13, 2019


		### Gunicorn workers

		By default 4 gunicorn workers will be created for your model. If you wish to change this default then add the environment variable GUNICORN_WORKERS to the container for your model. An example is shown below:

WIP: Update python wrapper to use gunicorn #684

WIP: Update python wrapper to use gunicorn #684

Conversation

ukclivecox commented Jul 12, 2019 • edited Loading

axsaucedo commented Jul 22, 2019

ukclivecox commented Jul 24, 2019

axsaucedo commented Jul 24, 2019

alexlatchford commented Jul 26, 2019

alexlatchford commented Jul 26, 2019 • edited Loading

axsaucedo commented Jul 27, 2019

axsaucedo Aug 12, 2019

Choose a reason for hiding this comment

lennon310 Aug 13, 2019

Choose a reason for hiding this comment

ukclivecox Aug 13, 2019

Choose a reason for hiding this comment

ukclivecox commented Jul 12, 2019 •

edited

Loading

alexlatchford commented Jul 26, 2019 •

edited

Loading