You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a machine with two GPUs, I run the model with openllm start command and everything went well. CUDA_VISIBLE_DEVICES=0,1 TRANSFORMERS_OFFLINE=1 openllm start mistral --model-id mymodel --dtype float16 --gpu-memory-utilization 0.95 --workers-per-resource 0.5
there are two process appear on the two GPUs in this case one for the service and another for ray instance.
when I run start command without --gpu-memory-utilization 0.95 --workers-per-resource 0.5, only one GPU is running the service and CUDA out of memory is occured.
When I build the image and follow the steps to create container, however when i run the docker image, it issue error of cuda out of memory, such as the second case without passing these args: --gpu-memory-utilization 0.95 --workers-per-resource 0.5
@aarnphm What is the difference between the previous two cases, so the first case can launch two processes one for ray worker and other for bentoml service (that when using --gpu-memory-utilization 0.95 --workers-per-resource 0.5
Describe the bug
I have a machine with two GPUs, I run the model with openllm start command and everything went well.
CUDA_VISIBLE_DEVICES=0,1 TRANSFORMERS_OFFLINE=1 openllm start mistral --model-id mymodel --dtype float16 --gpu-memory-utilization 0.95 --workers-per-resource 0.5
When I build the image and follow the steps to create container, however when i run the docker image, it issue error of cuda out of memory, such as the second case without passing these args:
--gpu-memory-utilization 0.95 --workers-per-resource 0.5
steps:
openllm build mymodel --backend vllm --serialization safetensors
bentoml containerize mymodel-service:12345 --opt progress=plain
docker run --rm --gpus all -p 3000:3000 -it mymodel-service:12345
To reproduce
No response
Logs
No response
Environment
$ bentoml -v
bentoml, version 1.1.11
$openllm -v
openllm, 0.4.45.dev2 (compiled: False)
Python (CPython) 3.11.7
System information (Optional)
No response
The text was updated successfully, but these errors were encountered: