Skip to content

Model freezing when handling simultaneous user requests (is multi-threading issue?) #1995

@zyj13547810610

Description

@zyj13547810610

Description:

I am using the llama-cpp-python library to handle multiple simultaneous user requests. However, when simulating two users making requests at the same time, the model freezes and does not respond. I suspect there might be an issue with multi-threading or concurrency.

Steps to Reproduce:
use model:unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF:DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf

Scene 1: Single User Request
Initialize the model using llama-cpp-python.

Send a single user request.

Observe that the model responds correctly and outputs the expected result.

Scene 2: Simulating Two Concurrent User Requests
Initialize the model using llama-cpp-python.

Simulate two users by sending two requests at the same time (in separate threads or processes).

Observe that the model freezes and does not respond to either request, causing a timeout or hanging.

Expected Behavior:
The model should handle both requests concurrently, providing responses to both users without freezing or encountering delays.

Actual Behavior:
When simulating two users, the model freezes and does not respond to either request.

System Information:
Operating System: [ Ubuntu 20.04]

Python version: [3.10]

llama-cpp-python version: [0.3.7]

Hardware: [4090*4]

Additional Information:
I suspect this issue is related to multi-threading or multi-processing, as the model might be sharing resources (e.g., memory, GPU) between threads or processes. I have tried running both requests in separate threads, but it results in a freeze. This problem might be caused by a lack of thread-safety in the library or resource contention.

Questions:
Is llama-cpp-python thread-safe, and does it support concurrent requests in a multi-threaded environment?

Are there recommended practices or solutions for handling multiple concurrent user requests?

Would using a multi-process approach be a better solution for this use case?

I would appreciate any insights or guidance on how to resolve this issue.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions