-
Notifications
You must be signed in to change notification settings - Fork 836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is the latest version 2x slower? #71
Comments
I also experienced a significant slowdown that made performance go from "barely tolerable" to "unusable" on my system. It's particularly felt when initially loading the model (in oobabooga), which used to take single digit seconds and now takes minutes. |
yes very slow, seems the problem is on llama cpp's end, seen a few issues about speed recently. |
Can we install the older version again until this is fixed? How? |
@CyberTimon totally, you can check out the PyPI history here -> https://pypi.org/project/llama-cpp-python/#history and try out older versions. You can just |
I'm feeling this as well. Running llama-cpp manually is noticeably faster than running it through the API. |
Just checked it. LLaMA CPP is 40ms per token for me and the python bindings are 200ms per token so it's much slower. Sadly downgrading to version 0.1.27 is still slow. |
Lots of performance enhancements in the upstream Can someone confirm that this performance regression is fixed? |
0.1.32 is 2x slower than 0.1.27
I tried using
use_mlock=TRUE
, warned me about RLIMIT and had toulimit -l unlimited
temporarily, but it still didn't improve.Is anyone else getting the same speed dip?
The text was updated successfully, but these errors were encountered: