-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can you run inference with a local GGUF file? #295
Comments
Something to be aware of is that PruneAI uses a custom encoding scheme. I can’t get their models to run under llama.cpp either. I don’t know if that’s related or not. But it is something to be aware of. |
Hi @jett06, thanks for letting me know. I think we have room to improve in the verbosity of the loading of local files. As documented here, the way to load files locally is to specify a local path as the model ID. I'll try to improve the visibility and clarity of that section. In your case, loading from a local file may look like:
I made a few changes:
In the backend, we start by searching locally, but since For brevity, you can simplify it by using the short arguments:
In general, the model ID is a "path" to the files: a local path, or a Hugging Face Model ID. |
@EricLBuehler Phi-3-mini-128k-instruct-q3_K_S.gguf actually is the path to my local GGUF file. I've symlinked it from my downloads directory to the current dir (Phi-3-mini-128k-instruct-q3_K_S.gguf => /home/jett/Downloads/llms/Phi-3-mini-128k-instruct-q3_K_S.gguf). Here's what happens when I run the final command you've given me (with the GGUF file path filled in): $ ./mistralrs-server gguf --tokenizer-json tokenizer.json -m ./Phi-3-mini-128k-instruct-q3_K_S.gguf -f ./Phi-3-mini-128k-instruct-q3_K_S.gguf
error: the following required arguments were not provided:
--tok-model-id <TOK_MODEL_ID>
Usage: mistralrs-server gguf --tok-model-id <TOK_MODEL_ID> --quantized-model-id <QUANTIZED_MODEL_ID> --quantized-filename <QUANTIZED_FILENAME> --tokenizer-json <TOKENIZER_JSON>
For more information, try '--help'. To be sure the issue didn't lie with the fact my model file was a symbolic link, I deleted the symlink and copied the file directly to the current directory (/home/jett/Downloads/llms/Phi-3-Mini-128k-instruct-q3_K_S.gguf copied to ./Phi-3-Mini-128k-instruct-q3_K_S.gguf), which bore the same error as above: $ ./mistralrs-server gguf --tokenizer-json tokenizer.json -m ./Phi-3-mini-128k-instruct-q3_K_S.gguf -f ./Phi-3-mini-128k-instruct-q3_K_S.gguf
error: the following required arguments were not provided:
--tok-model-id <TOK_MODEL_ID>
Usage: mistralrs-server gguf --tok-model-id <TOK_MODEL_ID> --quantized-model-id <QUANTIZED_MODEL_ID> --quantized-filename <QUANTIZED_FILENAME> --tokenizer-json <TOKENIZER_JSON>
For more information, try '--help'. This is with the latest commit on master. |
I made a small mistake with the command, I actually deleted the wrong arg:
We tell |
@EricLBuehler I see, that makes sense! I didn't realize the command arguments were so similar conceptually, regardless of if you're using a local file or pulling from HuggingFace (eg either local directory or HF directory, then either local file or HF file), but I get it now. Thank you so much for your help, and for maintaining a wonderful project :) |
Thank you! Glad to help. |
I'm trying to play around with
mistralrs-server
using a file I've already downloaded (fromPrunaAI/Phi-3-mini-128k-instruct-GGUF-Imatrix-smashed
), but the program arguments are very confusing. I've messed with it a lot and this is the closest I've come to what I want to do:As you can see, this command fails. I'm very confused, and I'm not sure why it's this confusing or if I'm missing something when it comes to running inference on a local GGUF file that's already downloaded, like llama.cpp can. I just want to do something like
./mistralrs-server gguf -m ./Phi-3-mini-128k-instruct-q3_K_S.gguf
and have it start a server with that local file, but my goal seems very complicated to achieve based on this program's arguments and the error messages that imply it works best when you have to download a remote file. Thanks in advance for any responses that'll help :)The text was updated successfully, but these errors were encountered: