Where to put local LLM models for Local inference server ?

The document says local inference server can load huggingface model. It is not clear what the "load" means ? Does it download the model again ? If I have a model saved on disk, how do I ask the local inference server to use it instead of downloading another copy again ?

Actually I am interested in loading mlx-optiq models. I know I can use the OpenAI API from mlx-optiq. But if the local inference server can load the model from disk straight, it saves a layer of communication.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where to put local LLM models for Local inference server ? #306

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Where to put local LLM models for Local inference server ? #306

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions