Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the latest version 2x slower? #71

Closed
Bloob-beep opened this issue Apr 12, 2023 · 7 comments
Closed

Why is the latest version 2x slower? #71

Bloob-beep opened this issue Apr 12, 2023 · 7 comments

Comments

@Bloob-beep
Copy link

0.1.32 is 2x slower than 0.1.27

I tried using use_mlock=TRUE, warned me about RLIMIT and had to ulimit -l unlimited temporarily, but it still didn't improve.
Is anyone else getting the same speed dip?

@cmoncure
Copy link

I also experienced a significant slowdown that made performance go from "barely tolerable" to "unusable" on my system. It's particularly felt when initially loading the model (in oobabooga), which used to take single digit seconds and now takes minutes.

@nigh8w0lf
Copy link

yes very slow, seems the problem is on llama cpp's end, seen a few issues about speed recently.

@CyberTimon
Copy link

Can we install the older version again until this is fixed? How?

@abetlen
Copy link
Owner

abetlen commented Apr 12, 2023

@CyberTimon totally, you can check out the PyPI history here -> https://pypi.org/project/llama-cpp-python/#history and try out older versions.

You can just pip install llama-cpp-python==<version>

@AndreiSva
Copy link

I'm feeling this as well. Running llama-cpp manually is noticeably faster than running it through the API.

@CyberTimon
Copy link

Just checked it. LLaMA CPP is 40ms per token for me and the python bindings are 200ms per token so it's much slower. Sadly downgrading to version 0.1.27 is still slow.

@gjmulder
Copy link
Contributor

gjmulder commented May 15, 2023

Lots of performance enhancements in the upstream llama.cpp and I am getting very good performance.

Can someone confirm that this performance regression is fixed?

@gjmulder gjmulder closed this as not planned Won't fix, can't repro, duplicate, stale May 22, 2023
xaptronic pushed a commit to xaptronic/llama-cpp-python that referenced this issue Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants