[Bug]: single lora request error make all processing requests error #4879

jinzhen-lin · 2024-05-17T04:58:39Z

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

Vllm load lora checkpoints when executing model

https://github.com/vllm-project/vllm/blob/v0.4.2/vllm/worker/model_runner.py#L789-L790

https://github.com/vllm-project/vllm/blob/v0.4.2/vllm/lora/worker_manager.py#L138-L172

Then when we get an error when loading lora checkpoint (e.g. lora rank > max_lora_rank), all processing requests would fail (no matter whether other requests use lora).

The text was updated successfully, but these errors were encountered:

jinzhen-lin added the bug Something isn't working label May 17, 2024

DarkLight1337 linked a pull request Jun 3, 2024 that will close this issue

bug fixed: cuda out of memory lead to 'AsyncEngineDeadError: Background loop has errored already. #5173

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: single lora request error make all processing requests error #4879

[Bug]: single lora request error make all processing requests error #4879

jinzhen-lin commented May 17, 2024

[Bug]: single lora request error make all processing requests error #4879

[Bug]: single lora request error make all processing requests error #4879

Comments

jinzhen-lin commented May 17, 2024

Your current environment

🐛 Describe the bug