-
-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: expose min_p #254
Comments
min_p really is a game changer |
what api? local? |
Local tbh, it doesn't have any kind of big performance hit, raises coherency of interactions noticably with just a minor tweak (used to have a laptop with an experimental kobold.cpp built w/min-p and it fixed the majority of coherence issues even in much smaller models). I'm biased to that RN as Android is literally all I have for inference so, I'm dedicated on-device Android feedback lol. |
It will be easy enough to add it. But what is it meant to do out of curiosity? Is it a new parameter of llama.cpp or something? |
I first heard of min-p through kalimaze's experimental kobold.cpp builds; it's quite fascinating - personally I am not 100% certain if llama.cpp has implemented this but it should be standardized. I can scour and see if llama.cpp has it in recent releases. "Every possible token has a probability percentage attached to it. https://github.com/kalomaze/koboldcpp/releases/tag/minP Also some interesting work from kalimaze that gave great results for me was dynamic temp and noisy sampling. Not sure if that's tests only in their releases, but interesting nonetheless. |
Ahh, OK. If its a kobold.cpp exclusive feature I probably won't add it but if its in llama.cpp I'll add it in no problem. |
Good news! Apparently llama.cpp did merge it as a feature :D PR Merge: Reddit: |
Please add min_p to user parameters!
Thanks
The text was updated successfully, but these errors were encountered: