The document says local inference server can load huggingface model. It is not clear what the "load" means ? Does it download the model again ? If I have a model saved on disk, how do I ask the local inference server to use it instead of downloading another copy again ?
Actually I am interested in loading mlx-optiq models. I know I can use the OpenAI API from mlx-optiq. But if the local inference server can load the model from disk straight, it saves a layer of communication.
The document says local inference server can load huggingface model. It is not clear what the "load" means ? Does it download the model again ? If I have a model saved on disk, how do I ask the local inference server to use it instead of downloading another copy again ?
Actually I am interested in loading mlx-optiq models. I know I can use the OpenAI API from mlx-optiq. But if the local inference server can load the model from disk straight, it saves a layer of communication.