Skip to content

Feat/centralized model resolution#49

Merged
alez007 merged 3 commits intomainfrom
feat/centralized-model-resolution
May 1, 2026
Merged

Feat/centralized model resolution#49
alez007 merged 3 commits intomainfrom
feat/centralized-model-resolution

Conversation

@alez007
Copy link
Copy Markdown
Owner

@alez007 alez007 commented Apr 30, 2026

No description provided.

Alex M added 2 commits April 30, 2026 20:47
Pre-flight every built-in-loader model on the driver before any Ray
actor spins up: download with a universal `*.safetensors`-preferred
filter, validate local paths, and surface auth / missing-repo /
missing-file errors at startup instead of inside an UNHEALTHY replica.

GGUF variants are picked via a `model: repo:filename` syntax (glob
supported); the resolver detects multi-variant GGUF repos with no
selector and raises with the variant list. `hf_filename` is gone —
selectors live on the `model:` field for both HF and local sources.

Built-in loaders (vllm, transformers, diffusers, llama_cpp) now consume
`config._resolved_path` and fail fast when it isn't populated. Plugins
are unchanged: they keep managing their own downloads, and `model:` is
optional for `loader=custom`.

The resolver runs after the gateway comes up so /health and /readyz
answer during long downloads.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a centralized model resolution system that supports HuggingFace repositories, local paths, and a new repo:filename syntax for specific file selection. By moving resolution to the driver during startup, the system can catch configuration errors early. The llama_cpp, vllm, transformers, and diffusers loaders have been updated to utilize these resolved paths, and the hf_filename field in the llama_cpp configuration has been removed. Review feedback highlights critical issues where the resolver returns directory paths instead of specific file paths for GGUF models—both in single-file and sharded scenarios—which would cause the llama_cpp loader to fail.

Comment thread modelship/infer/model_resolver.py Outdated
Comment thread modelship/infer/model_resolver.py
Comment thread modelship/infer/model_resolver.py Outdated
Three follow-up fixes after the initial integration:

- Set HF_HOME / VLLM_CACHE_ROOT / FLASHINFER_CACHE_DIR at the very top of
  mship_deploy.py, before any import that pulls in huggingface_hub. Its
  HF_HOME constant is latched at import time, so driver-side downloads
  were landing in ~/.cache/huggingface instead of MSHIP_CACHE_DIR.

- Always return a file path (never a directory) when resolving GGUF-style
  models. Three branches updated to match what llama.cpp expects:
    * Local dir + selector matching multiple shards: sort and return the
      first shard's path (was raising RuntimeError).
    * HF repo + selector matching multiple shards: snapshot_download all
      shards, then return the first shard's full path inside the snapshot
      (was returning the snapshot dir).
    * HF repo with exactly one .gguf file and no selector: hf_hub_download
      it directly and return the file path (was returning the snapshot dir
      via the universal-filter snapshot_download fallback).
@alez007 alez007 merged commit 6725ee9 into main May 1, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant