-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: Not able to start tiiuae/falcon-7b #41
Comments
falcon requires a lot of resource to run, even during inference. This has to do with the model having to compute all of the matrices through the attention layer. On a 4 A10G, The average latency I'm seeing is around 140s |
Hey, Yes, I know, but in my case I do not think it is a resource problem. It not about the response time, it is not responding at all. With paperspace I created a dedicated GPU instance A100 GPU instance. With 12 CPU, 90GB Memory, without any additional services running on this instance. That's why I thought the problem is in the logs:
|
Got it, i will take a look |
I was only able to run Falcon on g5.24xlarge, which has 96GB GPU mem, 384GB ram :) |
Wow |
Describe the bug
Hi there,
I followed the instruction on GitHub to start tiiuae/falcon-7b.
pip install "openllm[falcon]" openllm start falcon --model-id tiiuae/falcon-7b
Then, when calling the localhost:3000 for the first time, it's giving timeouts for 30 seconds.
The second time it gives back the next output (check the logs) and also timeouts after some time.
Thanks in advance!
To reproduce
No response
Logs
Environment
bentoml: 1.0.22
openllm: 0.1.8
platform: paperspace
The text was updated successfully, but these errors were encountered: