Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Update python wrapper to use gunicorn #684

Merged
merged 9 commits into from
Aug 14, 2019
Merged

WIP: Update python wrapper to use gunicorn #684

merged 9 commits into from
Aug 14, 2019

Conversation

ukclivecox
Copy link
Contributor

@ukclivecox ukclivecox commented Jul 12, 2019

  • Adds a command line arg --workers defaults to 4 (only uses gunicorn if workers>1)
  • Adds an optional load() method to user_model prototype. This is called after init in each worker.
    • Needed for Tensorflow where loading your graph needs to be done in each process and not before gunicorn forks

Fixes #453
Fixes #383
Fixes #674

@ukclivecox ukclivecox requested a review from gsunner July 12, 2019 10:17
@ukclivecox ukclivecox changed the title Update python wrapper to use gunicorn WIP: Update python wrapper to use gunicorn Jul 12, 2019
@ryandawsonuk ryandawsonuk mentioned this pull request Jul 22, 2019
@axsaucedo
Copy link
Contributor

It woudl be worth adding to the flag to the s2i wrapper (the run file) so that the number of workers can be changed when deploying.

The reason why it is important to be able to change the number of workers is because there would be situations where you would only want to have 1 worker per pod - an example would be tasks that have heavy resource usage, such as memory.

If you have an ML graph that takes 50gb RAM for example, you may want your scaling to be horizontally, as if you request 50gb for the pods, as a user they may be confused that their pods keep going over their requests.

@ukclivecox
Copy link
Contributor Author

@axsaucedo You can set the env vairaible GUNICORN_WORKERS to limit the number of workers so don't think an s2i change is needed.

@axsaucedo
Copy link
Contributor

Ohh my bad, I had not seen the os.environ.get("GUNICORN_WORKERS", "4") - awesome! Thanks for the clarification @cliveseldon

@seldondev seldondev added size/L and removed size/M labels Jul 26, 2019
@alexlatchford
Copy link
Contributor

I've been evaluating Seldon this week and this was on my list, nice job as I'm guessing this will land well within our time frame!

Not sure if this is the place to request this but would be great to include an option to choose the worker type, some of our models at inference time go out to feature store to enrich input requests. We'd be doing at as a part of an input_transformer component likely which we could set to just have a single worker and have it feed a model component with many times more workers.

PS. Apologies if there is already a different pattern for this!

@alexlatchford
Copy link
Contributor

alexlatchford commented Jul 26, 2019

Digging in some more, looking at the server implementations as they load the models in memory on the first prediction. This would be done post process fork resulting in the model being loaded into memory as many times are there are workers.

If you also expose the preload_app option for Gunicorn and set it to True and if you've loaded the model during the initial phase (ie. not on the first network call) then it'll keep only one copy in memory and not duplicate the memory space but be accessible for all child processes. This is something we do here at Zillow to keep the memory footprint down in our current iteration of serving infrastructure but keeping a larger number of worker processes 😄

This does have some downsides as it's incompatible with the reload option which enables hot-code reloading (useful for local development) but we just turn off preload_app locally to fix that.

@axsaucedo
Copy link
Contributor

Very interesting @alexlatchford, that's one of the challenges I outlined above, the preload_app functionality would certainly allow for the workers to only load one copy of the model reducing memory footprint. Of course there is still a challenge if the features extracted are big but that is a different challenge. I am not sure how much work may be involved to extend the wrapper to support this, definitely worth checking out, but depending on that we may have to open a separate issue to do this once the initial/basic move to gunicorn is landed.

@seldondev seldondev added size/XL and removed size/L labels Aug 7, 2019
@ukclivecox ukclivecox added this to In progress in 0.4.0 via automation Aug 8, 2019
@ukclivecox ukclivecox requested review from axsaucedo and removed request for gsunner August 8, 2019 10:06
@ukclivecox ukclivecox self-assigned this Aug 9, 2019
self.class_names = ["class:{}".format(str(i)) for i in range(10)]

def load(self):
print("Loading model",os.getpid())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use log.debug as convention, not critical tho


### Gunicorn workers

By default 4 gunicorn workers will be created for your model. If you wish to change this default then add the environment variable GUNICORN_WORKERS to the container for your model. An example is shown below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are using gunicorn workers and make this variable overridable, you probably also need to consider (1) the preload option to reduce the memory usage, and (2) max_requests (probably along with max_requests_jitter) to restart the workers to avoid their memory leak. These are the opts in gunicorn command, but I guess there should be corresponding APIs that provide those options too.

Ref: http://docs.gunicorn.org/en/stable/settings.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would need to make peload optional as not sure it will work for Tensorflow graphs that need a separate session in each fork.

@ukclivecox ukclivecox merged commit fa9b745 into SeldonIO:master Aug 14, 2019
0.4.0 automation moved this from In progress to Done Aug 14, 2019
@ukclivecox ukclivecox deleted the gunicorn branch February 14, 2020 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
0.4.0
  
Done
5 participants