Skip to content

v0.1.6 — Hugging Face model download

Choose a tag to compare

@DominguesM DominguesM released this 15 Jun 17:41
· 8 commits to develop since this release
d6779f8

Highlights

LlamaParams now accepts a Hugging Face repository id directly. Pass "TheBloke/Llama-2-7B-Chat-GGUF" and the library downloads the GGUF to the official HF cache (~/.cache/huggingface/hub) before loading. Local paths still work unchanged; the Tauri plugin inherits the new behavior automatically.

use llama_crab::{Llama, LlamaParams};

let mut llama = Llama::load(
    LlamaParams::new("TheBloke/Llama-2-7B-Chat-GGUF")
        .with_hf_filename("llama-2-7b-chat.Q4_K_M.gguf")
        .with_n_ctx(2048),
)?;

What's new

Library

  • hf-hub cargo feature (opt-in) — gates the new functionality. Mirror of the existing mtmd pattern.
  • HfDownloader trait + MockHfDownloader (always compiled, for tests) + RealHfDownloader (gated, uses hf-hub 0.5 sync API).
  • 5 new builders on LlamaParams: with_hf_filename, with_hf_revision, with_hf_token, with_hf_cache_dir, with_hf_endpoint.
  • LlamaError::ModelDownload(String) variant for download errors.
  • HF_TOKEN and HF_ENDPOINT env vars honored (read in RealHfDownloader::new, never logged).
  • HF_HOME respected for cache location.
  • Auto-detect heuristic: ^[A-Za-z0-9._-]+(/[A-Za-z0-9._-]+)?$ + !Path::new(s).exists() — falls through to local for existing paths and ambiguous local-path names (models/, model/).
  • Auto-pick logic: 0 .gguf → error; 1 → auto-pick; >1 → error suggesting with_hf_filename.
  • tracing::info! at download start/end with repo, filename, size_bytes, elapsed_ms.

Server

  • --hf-filename <NAME> CLI flag (env LLAMA_CRAB_HF_FILENAME).
  • hf-hub server feature (opt-in): cargo install llama-crab-server --features hf-hub --force.

Tauri plugin

  • Always pulls in the hf-hub feature so end-user Tauri apps can use HF repo ids without extra build config.

Install / Upgrade

# Library
cargo add llama-crab --features hf-hub

# Server
cargo install llama-crab-server --features hf-hub --force

Test

# Skip state
cargo test -p llama-crab --features hf-hub --test hf_download

# End-to-end (downloads TinyLlama, ~636 MB, verifies cache hit, loads into Metal)
LLAMA_CRAB_RUN_HF_INTEGRATION=1 cargo test \
  -p llama-crab --features hf-hub --test hf_download

Verification

Check Result
cargo build -p llama-crab --no-default-features OK
cargo build -p llama-crab --features hf-hub OK
cargo build -p llama-crab-server --features hf-hub OK
cargo clippy --all-targets --features hf-hub -- -D warnings clean
cargo test --lib (no-default-features) 120/120 pass
cargo test --lib (hf-hub) 120/120 pass (2 env-gated ignored)
cargo test --doc (both states) 11/11 pass
cargo test --test hf_download (skip) clean skip
CI (16 jobs, Linux + macOS) 16/16 pass
Release workflow (crates + npm) success

PRs

  • #12: feat: add Hugging Face model download from LlamaParams
  • #13: chore(release): bump version to 0.1.6
  • #14: chore(release): bump npm package versions to 0.1.6

Guardrails (per design review)

  • No dep:tokio in the hf-hub feature (sync API only)
  • No #[from] hf_hub::Error (would leak the gated type into the always-compiled error enum)
  • No SHA256 verification (delegated to hf-hub etag mechanism; documented limitation)
  • No async / progress callback API in v1
  • No hf: URL prefix syntax
  • No token / auth-bearing URLs at tracing::info! level
  • Server hf-hub is opt-in (kept out of default = [])