Feat/centralized model resolution by alez007 · Pull Request #49 · alez007/modelship

alez007 · 2026-04-30T21:16:37Z

No description provided.

…iver

Pre-flight every built-in-loader model on the driver before any Ray actor spins up: download with a universal `*.safetensors`-preferred filter, validate local paths, and surface auth / missing-repo / missing-file errors at startup instead of inside an UNHEALTHY replica. GGUF variants are picked via a `model: repo:filename` syntax (glob supported); the resolver detects multi-variant GGUF repos with no selector and raises with the variant list. `hf_filename` is gone — selectors live on the `model:` field for both HF and local sources. Built-in loaders (vllm, transformers, diffusers, llama_cpp) now consume `config._resolved_path` and fail fast when it isn't populated. Plugins are unchanged: they keep managing their own downloads, and `model:` is optional for `loader=custom`. The resolver runs after the gateway comes up so /health and /readyz answer during long downloads.

gemini-code-assist

Code Review

This pull request introduces a centralized model resolution system that supports HuggingFace repositories, local paths, and a new repo:filename syntax for specific file selection. By moving resolution to the driver during startup, the system can catch configuration errors early. The llama_cpp, vllm, transformers, and diffusers loaders have been updated to utilize these resolved paths, and the hf_filename field in the llama_cpp configuration has been removed. Review feedback highlights critical issues where the resolver returns directory paths instead of specific file paths for GGUF models—both in single-file and sharded scenarios—which would cause the llama_cpp loader to fail.

Three follow-up fixes after the initial integration: - Set HF_HOME / VLLM_CACHE_ROOT / FLASHINFER_CACHE_DIR at the very top of mship_deploy.py, before any import that pulls in huggingface_hub. Its HF_HOME constant is latched at import time, so driver-side downloads were landing in ~/.cache/huggingface instead of MSHIP_CACHE_DIR. - Always return a file path (never a directory) when resolving GGUF-style models. Three branches updated to match what llama.cpp expects: * Local dir + selector matching multiple shards: sort and return the first shard's path (was raising RuntimeError). * HF repo + selector matching multiple shards: snapshot_download all shards, then return the first shard's full path inside the snapshot (was returning the snapshot dir). * HF repo with exactly one .gguf file and no selector: hf_hub_download it directly and return the file path (was returning the snapshot dir via the universal-filter snapshot_download fallback).

Alex M added 2 commits April 30, 2026 20:47

refactor: simplify LlamaCpp plugin by delegating resolution to the dr…

1ecc6ca

…iver

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread modelship/infer/model_resolver.py Outdated

Comment thread modelship/infer/model_resolver.py

Comment thread modelship/infer/model_resolver.py Outdated

alez007 merged commit 6725ee9 into main May 1, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/centralized model resolution#49

Feat/centralized model resolution#49
alez007 merged 3 commits intomainfrom
feat/centralized-model-resolution

alez007 commented Apr 30, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alez007 commented Apr 30, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant