-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Garbled output on very long prompts #339
Comments
Thank you for reporting this. I think that this may be a problem with needing llama cache shifting. Edit: I opened #341, can you please try to reproduce it there? |
@LLukas22, looks like this issue stretches as far back as 4ffe68d (v0.1.2). I can reproduce it with Mistral, and also Llama, but not Phi3 128k. It appears that the sliding window is at fault for Mistral, but for Llama, it is especially strange because the context length is 8k. The v0.1.2 code used the same masking strategy as the current Candle method, while we currently use one similar to the implantation here. Code from v0.1.2, using the Candle method:
Core of current method:
Do you see anything wrong with the way the sliding window is done, even with the v0.1.2 code? I will try to get this fixed soon, perhaps we can try to reproduce it with the Candle version. |
@LLukas22, I think I figured it out. It looks like it works when not using ISQ, but if ISQ is used then it breaks. This is probably because we quantize slightly differently than the GGUF implementation? |
Perhaps #377 will help this? |
Describe the bug
Models seam to produce garbled output on very long prompts.
If i use the following script:
To send a 7368 token long prompt to a
mistralrs
server i recieve the following output:Meaning , that the server just filled the rest of the context length with
!
.If i send the same prompt to an
ollama
server i get the following result:Which is the correct answer for the given prompt.
The prompt i used:
prompt.txt
The server parameters:
--isq Q4K plain -m meta-llama/Meta-Llama-3-8B-Instruct -a llama
Latest commit
Release
0.1.9
The text was updated successfully, but these errors were encountered: