Skip to content

Where to put local LLM models for Local inference server ? #306

@rguiscard

Description

@rguiscard

The document says local inference server can load huggingface model. It is not clear what the "load" means ? Does it download the model again ? If I have a model saved on disk, how do I ask the local inference server to use it instead of downloading another copy again ?

Actually I am interested in loading mlx-optiq models. I know I can use the OpenAI API from mlx-optiq. But if the local inference server can load the model from disk straight, it saves a layer of communication.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions