-
Notifications
You must be signed in to change notification settings - Fork 184
Endless response using the Phi-4-mini-instruct model #1450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
After further testing, it now appears that the space character in the prompt appears to be unrelated as I continue to see the problem with or without it, which I suppose is not all that surprising. |
This is a known issue. You need use latest onnxruntime-genai model builder to regenerate the model (select k_quant_mixed as quantization algo), and use latest onnxruntime-1.22 (we just released) to run it. |
I don't know if I followed your instructions correctly but I still see the problem with the regenerated model. This is the command line I used to run the model builder. Can you confirm if it is correct?
Note that I had to use onnxruntime-1.21.1 to run model builder as I was getting the following error with onnxruntime-1.22.
However, I did run the HelloPhi app with the latest ONNX runtime and the regenerated model. To do this, I added a direct reference to the 1.22.0 package.
|
@f2bo As of today, if you want to regenerate this phi-4 onnx model, you need to build both onnxruntime and onnxruntime-genai from source, because part of k_quant work is not included in onnxruntime 1.22. After that, this command is good to me: And then when you run the model, you can use onnxruntime 1.22 |
I've never built from source either one. I suspect that it's probably going to take more time than I had originally anticipated. Let me see if I can find the time to give it a try and see how it goes. Thanks! |
I built the onnxruntime and onnxruntime-genai from source and regenerated the model again and this time there's a difference. I only tested very briefly but prompts that previously resulted in repeating text no longer do so. I do notice slower performance (about 25% drop in tokens/sec). I don't know if that's expected. Thanks again! |
@f2bo I think this is expected. Previously it was a pure int4 model (MatMul), now what you generated should be a mixed precision model with some int4 and some int8 (MatMul). Then you should observe slower performance. I haven't checked the number about how much percent drop is expected. |
@jiafatom Got it. I'll test a bit more to make sure and then I'll close the issue. Thank you |
@f2bo Sure, we are working on some performance improvement, it may help your case. (You may need install onnxruntime from source later to include the new feature) |
Uh oh!
There was an error while loading. Please reload this page.
I'm experiencing problems using the Phi-4-mini-instruct model where it will generate responses that begin to repeat text until
max_length
is reached.To Reproduce
I see this problem with my application, currently using the
OnnxRuntimeGenAIChatClient
, but also with the sample applications in this repo, for example, the HelloPhi app.To reproduce it, execute this app using the
cpu
execution provider and the Phi-4-mini-instruct model using the prompt "Explain how lasers work".Run:
This results in the following output (only showing bottom lines for the screenshot).
Expected behavior
Response should end normally.
Desktop:
Additional context
The problem does not exist running the HelloPhi app with a different model, for example, Phi-3-mini-128K-instruct
I also noticed that the problem does not seem to exist in the Python sample chat app. After some digging, I narrowed the difference to the prompt template being used.
The HelloPhi app uses the following template:
onnxruntime-genai/examples/csharp/HelloPhi/Program.cs
Line 119 in c6ee481
whereas the Python app uses:
onnxruntime-genai/examples/python/model-chat.py
Line 53 in c6ee481
Notice that besides the newline characters, which do not seem to matter, there's a single space character following the {input} placeholder. Adding this single space character to the prompt in the HelloPhi app seems to fix the problem. Alternatively, the problem is also reproducible in the Python code if the space character is removed.
Perhaps this is a known problem and was it was purposely added to the Python code, though it seems quite unexpected.
The text was updated successfully, but these errors were encountered: