Why is the latest version 2x slower? #71

Bloob-beep · 2023-04-12T03:32:52Z

0.1.32 is 2x slower than 0.1.27

I tried using use_mlock=TRUE, warned me about RLIMIT and had to ulimit -l unlimited temporarily, but it still didn't improve.
Is anyone else getting the same speed dip?

The text was updated successfully, but these errors were encountered:

cmoncure · 2023-04-12T12:52:17Z

I also experienced a significant slowdown that made performance go from "barely tolerable" to "unusable" on my system. It's particularly felt when initially loading the model (in oobabooga), which used to take single digit seconds and now takes minutes.

nigh8w0lf · 2023-04-12T21:16:49Z

yes very slow, seems the problem is on llama cpp's end, seen a few issues about speed recently.

CyberTimon · 2023-04-12T21:44:49Z

Can we install the older version again until this is fixed? How?

abetlen · 2023-04-12T21:55:06Z

@CyberTimon totally, you can check out the PyPI history here -> https://pypi.org/project/llama-cpp-python/#history and try out older versions.

You can just pip install llama-cpp-python==<version>

AndreiSva · 2023-04-13T05:54:31Z

I'm feeling this as well. Running llama-cpp manually is noticeably faster than running it through the API.

CyberTimon · 2023-04-13T09:21:20Z

Just checked it. LLaMA CPP is 40ms per token for me and the python bindings are 200ms per token so it's much slower. Sadly downgrading to version 0.1.27 is still slow.

gjmulder · 2023-05-15T11:43:46Z

Lots of performance enhancements in the upstream llama.cpp and I am getting very good performance.

Can someone confirm that this performance regression is fixed?

avdosev mentioned this issue Apr 17, 2023

Default value for the number of threads #89

Closed

StellaAthena mentioned this issue May 1, 2023

Support for ggml EleutherAI/lm-evaluation-harness#417

Closed

rnwang04 mentioned this issue May 4, 2023

llama-cpp manually is noticeably faster than running it through the python API #148

Closed

gjmulder added the performance label May 15, 2023

gjmulder closed this as not planned Won't fix, can't repro, duplicate, stale May 22, 2023

xaptronic pushed a commit to xaptronic/llama-cpp-python that referenced this issue Jun 13, 2023

Inifinite generation via context swapping (abetlen#71)

e2d490d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the latest version 2x slower? #71

Why is the latest version 2x slower? #71

Bloob-beep commented Apr 12, 2023

cmoncure commented Apr 12, 2023

nigh8w0lf commented Apr 12, 2023

CyberTimon commented Apr 12, 2023

abetlen commented Apr 12, 2023 •

edited

Loading

AndreiSva commented Apr 13, 2023

CyberTimon commented Apr 13, 2023

gjmulder commented May 15, 2023 •

edited

Loading

Why is the latest version 2x slower? #71

Why is the latest version 2x slower? #71

Comments

Bloob-beep commented Apr 12, 2023

cmoncure commented Apr 12, 2023

nigh8w0lf commented Apr 12, 2023

CyberTimon commented Apr 12, 2023

abetlen commented Apr 12, 2023 • edited Loading

AndreiSva commented Apr 13, 2023

CyberTimon commented Apr 13, 2023

gjmulder commented May 15, 2023 • edited Loading

abetlen commented Apr 12, 2023 •

edited

Loading

gjmulder commented May 15, 2023 •

edited

Loading