-
Notifications
You must be signed in to change notification settings - Fork 767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Refactor] fork server processes earlier #1476
Changes from 7 commits
78433d8
d6d5d2f
fa40527
b8a0981
393a90f
02dac88
7aeaa9a
0935aba
2feace9
e214050
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,3 +40,6 @@ tracing: | |
|
||
instrument: | ||
namespace: BENTOML | ||
|
||
logging: | ||
level: INFO |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
# Copyright 2019 Atalaya Tech, Inc. | ||
|
||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
|
||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import logging | ||
import multiprocessing | ||
from typing import Optional | ||
|
||
from bentoml.configuration.containers import BentoMLConfiguration, BentoMLContainer | ||
from bentoml.utils import reserve_free_port | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
def start_prod_server( | ||
saved_bundle_path: str, | ||
port: Optional[int] = None, | ||
workers: Optional[int] = None, | ||
timeout: Optional[int] = None, | ||
enable_microbatch: Optional[bool] = None, | ||
enable_swagger: Optional[bool] = None, | ||
mb_max_batch_size: Optional[int] = None, | ||
mb_max_latency: Optional[int] = None, | ||
microbatch_workers: Optional[int] = None, | ||
config_file: Optional[str] = None, | ||
): | ||
import psutil | ||
|
||
assert ( | ||
psutil.POSIX | ||
), "BentoML API Server production mode only supports POSIX platforms" | ||
|
||
config = BentoMLConfiguration(override_config_file=config_file) | ||
config.override(["api_server", "port"], port) | ||
config.override(["api_server", "workers"], workers) | ||
config.override(["api_server", "timeout"], timeout) | ||
config.override(["api_server", "enable_microbatch"], enable_microbatch) | ||
config.override(["api_server", "enable_swagger"], enable_swagger) | ||
config.override(["marshal_server", "max_batch_size"], mb_max_batch_size) | ||
config.override(["marshal_server", "max_latency"], mb_max_latency) | ||
config.override(["marshal_server", "workers"], microbatch_workers) | ||
|
||
if config.config['api_server'].get('enable_microbatch'): | ||
ssheng marked this conversation as resolved.
Show resolved
Hide resolved
|
||
prometheus_lock = multiprocessing.Lock() | ||
with reserve_free_port() as api_server_port: | ||
pass | ||
|
||
model_server_job = multiprocessing.Process( | ||
target=_start_prod_server, | ||
kwargs=dict( | ||
saved_bundle_path=saved_bundle_path, | ||
port=api_server_port, | ||
config=config, | ||
prometheus_lock=prometheus_lock, | ||
), | ||
daemon=True, | ||
) | ||
model_server_job.start() | ||
|
||
try: | ||
_start_prod_batching_server( | ||
saved_bundle_path=saved_bundle_path, | ||
config=config, | ||
api_server_port=api_server_port, | ||
prometheus_lock=prometheus_lock, | ||
) | ||
finally: | ||
model_server_job.terminate() | ||
else: | ||
_start_prod_server(saved_bundle_path=saved_bundle_path, config=config) | ||
|
||
|
||
def _start_prod_server( | ||
saved_bundle_path: str, | ||
config: BentoMLConfiguration, | ||
port: Optional[int] = None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need to explicitly have port here instead of from config? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not want to change the value of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand the concern. Here is why we should restructure the config keys and terminology. Before we do that, I think there is an opportunity here to make the intermediate port injectable as well. First we can introduce an intermediate port (for lack of a better name).
In the container, we can introduce a provider that first checks if there is an intermediate port defined, if not randomly reserve a port.
Basically, a lot of the logic here can be refactored as a provider in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's great to have a provider for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good call that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. UDS is superior but we might have to leave the TCP option open to make remote marshaling a possibility. We can structure the config meaningfully to reflect this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
like this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm thinking something like the following where the user can choose the connector type of provide related configs. Basically the schema for connector is a union of UDS and TCP connector schemas. If UDS is chosen,
Or, if TCP is chosen,
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For UDS support, I'd like to have a simple
Your schema is fancy but looks too powerful for me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After all, the config structure reflects our system design. We can continue the discussion in the new PR/issue. |
||
prometheus_lock: Optional[multiprocessing.Lock] = None, | ||
): | ||
|
||
logger.info("Starting BentoML API server in production mode..") | ||
|
||
container = BentoMLContainer() | ||
container.config.from_dict(config.as_dict()) | ||
|
||
from bentoml import server | ||
|
||
container.wire(packages=[server]) | ||
|
||
if port is None: | ||
gunicorn_app = server.gunicorn_server.GunicornBentoServer( | ||
saved_bundle_path, prometheus_lock=prometheus_lock, | ||
) | ||
else: | ||
gunicorn_app = server.gunicorn_server.GunicornBentoServer( | ||
saved_bundle_path, port=port, prometheus_lock=prometheus_lock, | ||
) | ||
gunicorn_app.run() | ||
|
||
|
||
def _start_prod_batching_server( | ||
saved_bundle_path: str, | ||
api_server_port: int, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same question here. Why do we need to explicitly have api_server_port here instead of from config? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. |
||
config: BentoMLConfiguration, | ||
prometheus_lock: Optional[multiprocessing.Lock] = None, | ||
): | ||
|
||
logger.info("Starting BentoML Batching server in production mode..") | ||
|
||
container = BentoMLContainer() | ||
container.config.from_dict(config.as_dict()) | ||
|
||
from bentoml import marshal, server | ||
|
||
container.wire(packages=[server, marshal]) | ||
|
||
# avoid load model before gunicorn fork | ||
marshal_server = server.marshal_server.GunicornMarshalServer( | ||
bundle_path=saved_bundle_path, | ||
prometheus_lock=prometheus_lock, | ||
outbound_host="localhost", | ||
outbound_port=api_server_port, | ||
) | ||
marshal_server.run() |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -150,7 +150,9 @@ def __init__( | |
BentoMLContainer.config.api_server.max_request_size | ||
], | ||
zipkin_api_url: str = Provide[BentoMLContainer.config.tracing.zipkin_api_url], | ||
outbound_unix_socket: str = None, | ||
): | ||
self.outbound_unix_socket = outbound_unix_socket | ||
self.outbound_host = outbound_host | ||
self.outbound_port = outbound_port | ||
self.outbound_workers = outbound_workers | ||
|
@@ -178,6 +180,7 @@ def __init__( | |
"or launch more microbatch instances to accept more concurrent connection.", | ||
self.CONNECTION_LIMIT, | ||
) | ||
self._client = None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We probably don't need to lazy initialize client here. Client is pretty much always needed, correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah. But IMO we best only do value assignment operations in the In addition, in this case, an aiohttp client session should be initialized with a running asyncio event loop. |
||
|
||
def set_outbound_port(self, outbound_port): | ||
self.outbound_port = outbound_port | ||
|
@@ -187,6 +190,22 @@ def fetch_sema(self): | |
self._outbound_sema = NonBlockSema(self.outbound_workers) | ||
return self._outbound_sema | ||
|
||
def get_client(self): | ||
if self._client is None: | ||
jar = aiohttp.DummyCookieJar() | ||
if self.outbound_unix_socket: | ||
conn = aiohttp.UnixConnector(path=self.outbound_unix_socket,) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 for UDS. |
||
else: | ||
conn = aiohttp.TCPConnector(limit=30) | ||
self._client = aiohttp.ClientSession( | ||
connector=conn, auto_decompress=False, cookie_jar=jar, | ||
) | ||
return self._client | ||
|
||
def __del__(self): | ||
if self._client is not None and not self._client.closed: | ||
self._client.close() | ||
|
||
def add_batch_handler(self, api_route, max_latency, max_batch_size): | ||
''' | ||
Params: | ||
|
@@ -268,11 +287,14 @@ async def relay_handler(self, request): | |
span_name=f"[2]{url.path} relay", | ||
) as trace_ctx: | ||
headers.update(make_http_headers(trace_ctx)) | ||
async with aiohttp.ClientSession(auto_decompress=False) as client: | ||
try: | ||
client = self.get_client() | ||
async with client.request( | ||
request.method, url, data=data, headers=request.headers | ||
) as resp: | ||
body = await resp.read() | ||
except aiohttp.client_exceptions.ClientConnectionError: | ||
return aiohttp.web.Response(status=503, body=b"Service Unavailable") | ||
return aiohttp.web.Response( | ||
status=resp.status, body=body, headers=resp.headers, | ||
) | ||
|
@@ -298,11 +320,9 @@ async def _batch_handler_template(self, requests, api_route): | |
headers.update(make_http_headers(trace_ctx)) | ||
reqs_s = DataLoader.merge_requests(requests) | ||
try: | ||
async with aiohttp.ClientSession(auto_decompress=False) as client: | ||
async with client.post( | ||
api_url, data=reqs_s, headers=headers | ||
) as resp: | ||
raw = await resp.read() | ||
client = self.get_client() | ||
async with client.post(api_url, data=reqs_s, headers=headers) as resp: | ||
raw = await resp.read() | ||
except aiohttp.client_exceptions.ClientConnectionError as e: | ||
raise RemoteException( | ||
e, payload=HTTPResponse(status=503, body=b"Service Unavailable") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you envision will be added under the /entrypoint package? What do we gain by moving to a new package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The place before is
/bentoml/server/__init__.py
. I don't think it's a proper place to wire packages includingbentoml.server
itself.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fine to wire one's own package. Taking a step back, we should think about how we'd like to expose our public APIs. Having a module like
BentoMLController
is one choice. What are some of the best practice in Python?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bojiang @ssheng I think what @bojiang suggested last time, having an
/api/
module for exposing public APIs is probably the convention among python-based DS/ML tools:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, we will keep these in the original location under
server/__init__.py
.