You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a configuration with a not embedded runner, without batching, without parallelization of workers, requests must be executed one after another according to the FIFO principle. At the moment requests are executed in LIFO mode. In some cases, this causes earlier requests to time out because they do not get a chance to execute in the required time.
I made a workaround for myself by replacing _queue.pop() with _queue.popleft() in the code below:
I'm not sure that this workaround doesn't have any side effects. But for me it works.
To reproduce
Make a simple service with a custom runner.
Set the runner parameters:
SUPPORTED RESOURCES = "nvidia.com/gpu"
SUPPORTS CPU MULTITHREADING = True
And BentoML configuration:
Make sure the runner method is performed for a sufficient amount of time, such as 10 seconds.
Start bentoml server. Execute several requests to your runner's API one after another, but without waiting for the response from the previous launch. I used several regular browser tabs.
The server response is returned out of order. For example, you made 3 requests. But the answer will come first for the first request, then for the third, and only then for the second request.
Expected behavior
The BentoML server must respond in the order in which requests are received.
Describe the bug
In a configuration with a not embedded runner, without batching, without parallelization of workers, requests must be executed one after another according to the FIFO principle. At the moment requests are executed in LIFO mode. In some cases, this causes earlier requests to time out because they do not get a chance to execute in the required time.
I made a workaround for myself by replacing _queue.pop() with _queue.popleft() in the code below:
BentoML/src/bentoml/_internal/marshal/dispatcher.py
Line 263 in eb0ad1a
and
BentoML/src/bentoml/_internal/marshal/dispatcher.py
Line 361 in eb0ad1a
I'm not sure that this workaround doesn't have any side effects. But for me it works.
To reproduce
Set the runner parameters:
SUPPORTED RESOURCES = "nvidia.com/gpu"
SUPPORTS CPU MULTITHREADING = True
And BentoML configuration:
Expected behavior
The BentoML server must respond in the order in which requests are received.
Environment
Environment variable
System information
bentoml
: 1.2.2python
: 3.10.13platform
: Linux-5.15.0-100-generic-x86_64-with-glibc2.35uid_gid
: 1000:1000conda
: 23.11.0in_conda_env
: Trueconda_packages
pip_packages
The text was updated successfully, but these errors were encountered: