Model freezing when handling simultaneous user requests (is multi-threading issue?)


﻿Description:
﻿
I am using the llama-cpp-python library to handle multiple simultaneous user requests. However, when simulating two users making requests at the same time, the model freezes and does not respond. I suspect there might be an issue with multi-threading or concurrency.
﻿
Steps to Reproduce:
use model:unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF:DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf
﻿
Scene 1: Single User Request
Initialize the model using llama-cpp-python.
﻿
Send a single user request.
﻿
Observe that the model responds correctly and outputs the expected result.
﻿
Scene 2: Simulating Two Concurrent User Requests
Initialize the model using llama-cpp-python.
﻿
Simulate two users by sending two requests at the same time (in separate threads or processes).
﻿
Observe that the model freezes and does not respond to either request, causing a timeout or hanging.
﻿
Expected Behavior:
The model should handle both requests concurrently, providing responses to both users without freezing or encountering delays.
﻿
Actual Behavior:
When simulating two users, the model freezes and does not respond to either request.
﻿
System Information:
Operating System: [ Ubuntu 20.04]
﻿
Python version: [3.10]
﻿
llama-cpp-python version: [0.3.7]
﻿
Hardware: [4090*4]
﻿
Additional Information:
I suspect this issue is related to multi-threading or multi-processing, as the model might be sharing resources (e.g., memory, GPU) between threads or processes. I have tried running both requests in separate threads, but it results in a freeze. This problem might be caused by a lack of thread-safety in the library or resource contention.
﻿
Questions:
Is llama-cpp-python thread-safe, and does it support concurrent requests in a multi-threaded environment?
﻿
Are there recommended practices or solutions for handling multiple concurrent user requests?
﻿
Would using a multi-process approach be a better solution for this use case?
﻿
I would appreciate any insights or guidance on how to resolve this issue.
﻿
Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model freezing when handling simultaneous user requests (is multi-threading issue?) #1995

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Model freezing when handling simultaneous user requests (is multi-threading issue?) #1995

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions