Releases: DominguesM/llama-crab
v0.1.8
What Changed
Dependency Updates
| Crate | From | To |
|---|---|---|
thiserror |
1.x | 2.x |
bindgen |
0.69.x | 0.72.x |
tokenizers |
0.20.x | 0.23.x |
axum |
0.7.x | 0.8.x |
tower-http |
0.6.x | 0.7.x |
cc |
1.2.63 | 1.2.64 |
smallvec |
1.15.1 | 1.15.2 |
Version Bump
All crates and npm packages bumped from 0.1.7 to 0.1.8.
Documentation
- Fixed documentation URLs to include
.htmlextensions.
Links
v0.1.7 — Fix use-after-move of LlamaModel
What's Changed
Critical Bug Fix
Fixed a use-after-move of LlamaModel that caused SIGSEGV when Llama crossed a return boundary (e.g. Llama::load returning by value, or Llama being moved across scope).
The self-referential &'a LlamaModel field on LlamaContext, papered over by a PhantomData<*mut ()> + transmute in Llama::load, has been replaced with a heap-allocated Box<LlamaModel> and a NonNull<LlamaModel> raw pointer on the context.
Symptoms fixed:
n_embdreading as0in the embedding contextn_vocabreading stale or zeroed data inlogits_ith/sampled_probs_ith- Random crashes in
rerankand any flow that crossed areturnboundary with aLlamavalue
🔧 Changes
Llama.modelis nowBox<LlamaModel>(heap-allocated, stable address)LlamaContext.modelis nowNonNull<LlamaModel>(raw pointer owned by the context)- Removed the
'alifetime parameter fromLlamaContext - Fields in
Llamareordered tocontext → model → _backend → _not_send_syncto enforce correct drop order - Added 9 regression tests covering embeddings, rerank, infill, and streaming APIs
- Bumped version to 0.1.7
- Workspace members now use
crates/*glob pattern - Updated changelog and versioned docs.rs links
Verification
All 9 new regression tests pass:
embeddings_seq_returns_768_dim_unit_norm_vectorembed_called_twice_returns_consistent_dimlogits_ith_after_decode_reads_n_vocab_floatsrerank_scores_documents_and_top_match_is_rustrerank_empty_documents_returns_empty_vecinfill_returns_some_contentinfill_called_twice_is_consistentstreaming_completion_collects_tokensstreaming_completion_can_stop_early
Migration Notes
No public API breakage. All existing code continues to work without changes.
v0.1.6 — Hugging Face model download
Highlights
LlamaParams now accepts a Hugging Face repository id directly. Pass "TheBloke/Llama-2-7B-Chat-GGUF" and the library downloads the GGUF to the official HF cache (~/.cache/huggingface/hub) before loading. Local paths still work unchanged; the Tauri plugin inherits the new behavior automatically.
use llama_crab::{Llama, LlamaParams};
let mut llama = Llama::load(
LlamaParams::new("TheBloke/Llama-2-7B-Chat-GGUF")
.with_hf_filename("llama-2-7b-chat.Q4_K_M.gguf")
.with_n_ctx(2048),
)?;What's new
Library
hf-hubcargo feature (opt-in) — gates the new functionality. Mirror of the existingmtmdpattern.HfDownloadertrait +MockHfDownloader(always compiled, for tests) +RealHfDownloader(gated, useshf-hub0.5 sync API).- 5 new builders on
LlamaParams:with_hf_filename,with_hf_revision,with_hf_token,with_hf_cache_dir,with_hf_endpoint. LlamaError::ModelDownload(String)variant for download errors.HF_TOKENandHF_ENDPOINTenv vars honored (read inRealHfDownloader::new, never logged).HF_HOMErespected for cache location.- Auto-detect heuristic:
^[A-Za-z0-9._-]+(/[A-Za-z0-9._-]+)?$+!Path::new(s).exists()— falls through to local for existing paths and ambiguous local-path names (models/,model/). - Auto-pick logic: 0
.gguf→ error; 1 → auto-pick; >1 → error suggestingwith_hf_filename. tracing::info!at download start/end with repo, filename, size_bytes, elapsed_ms.
Server
--hf-filename <NAME>CLI flag (envLLAMA_CRAB_HF_FILENAME).hf-hubserver feature (opt-in):cargo install llama-crab-server --features hf-hub --force.
Tauri plugin
- Always pulls in the
hf-hubfeature so end-user Tauri apps can use HF repo ids without extra build config.
Install / Upgrade
# Library
cargo add llama-crab --features hf-hub
# Server
cargo install llama-crab-server --features hf-hub --forceTest
# Skip state
cargo test -p llama-crab --features hf-hub --test hf_download
# End-to-end (downloads TinyLlama, ~636 MB, verifies cache hit, loads into Metal)
LLAMA_CRAB_RUN_HF_INTEGRATION=1 cargo test \
-p llama-crab --features hf-hub --test hf_downloadVerification
| Check | Result |
|---|---|
cargo build -p llama-crab --no-default-features |
OK |
cargo build -p llama-crab --features hf-hub |
OK |
cargo build -p llama-crab-server --features hf-hub |
OK |
cargo clippy --all-targets --features hf-hub -- -D warnings |
clean |
cargo test --lib (no-default-features) |
120/120 pass |
cargo test --lib (hf-hub) |
120/120 pass (2 env-gated ignored) |
cargo test --doc (both states) |
11/11 pass |
cargo test --test hf_download (skip) |
clean skip |
| CI (16 jobs, Linux + macOS) | 16/16 pass |
| Release workflow (crates + npm) | success |
PRs
- #12: feat: add Hugging Face model download from LlamaParams
- #13: chore(release): bump version to 0.1.6
- #14: chore(release): bump npm package versions to 0.1.6
Guardrails (per design review)
- No
dep:tokioin thehf-hubfeature (sync API only) - No
#[from] hf_hub::Error(would leak the gated type into the always-compiled error enum) - No SHA256 verification (delegated to
hf-hubetag mechanism; documented limitation) - No async / progress callback API in v1
- No
hf:URL prefix syntax - No token / auth-bearing URLs at
tracing::info!level - Server
hf-hubis opt-in (kept out ofdefault = [])
v0.1.5
[0.1.5] - 2026-06-15
Changed
- Moved the documentation site out of this repository. The site is now
published at https://llama-crab.nlp.rocks/ instead of the previous
GitHub Pages URLs. Thedocs/folder and the
Publish docs siteGitHub Actions workflow have been removed from this
repo. README files and crate-level docs throughout this workspace were
updated to point at the new URL.
Added
tauri-plugin-llama-crab: added aConfigstruct andinit_with_config
entry point so consumers can apply plugin-wide defaults (n_ctx, n_batch,
n_ubatch, n_threads, n_threads_batch, n_gpu_layers, default model name)
at startup. Anything left asNonelets the per-request field win, with
thellama-crabdefaults as the final fallback.tauri-plugin-llama-crab: added themtmdcargo feature. When enabled,
load_modelcan take anmmproj_pathand the chat pipeline runs
multimodal (vision) inference throughllama.cpp'smtmdprojector.
Image inputs are accepted asdata:image/...;base64,...URLs and as
local file paths.tauri-plugin-llama-crab: added granularPluginErrorkinds
(workerSpawnFailed,workerDisconnected,workerPanicked,
multimodalNotEnabled,multimodalSetup,mediaDecode) so the
TypeScript client can distinguish failure modes instead of collapsing
every error intoworker.
Changed
tauri-plugin-llama-crab:JoinErrorfromspawn_blockingnow maps
toworkerPanicked;mpsc::RecvErrormaps toworkerDisconnected;
thread-spawn failures map toworkerSpawnFailed.@llama-crab/tauri: the Support Matrix entry for multimodal now
reflects that the Rust plugin must be built with themtmdcargo
feature for image parts to be processed.
v0.1.3x
Added
- Added high-level streaming completion APIs, including
create_completion_stream,create_completion_stream_with_sampler,
CompletionChunk,StreamControland richer completion logprob
metadata. - Added
llama-crab-server, an HTTP server binary for local inference
with completions, chat completions, embeddings, reranking,
tokenization, detokenization, SSE streaming and optional multimodal
chat support. - Added OpenAI-style high-level convenience helpers for text, chat and
embeddings with token accounting. - Added the
server_lfmexample wrapper and anlfm-textdownload
target for launching the HTTP server with LFM text models. - Added the
streamingexample to demonstrate callback-driven text
generation. - Added mobile-oriented runtime presets through
MobilePresetand
LlamaParams::with_mobile_preset. - Added broader tool-call streaming support, including OpenAI-style
tool-call deltas. - Added documentation deployment for the project guide.
Changed
- Migrated the user guide from mdBook to Material for MkDocs, with
English and Portuguese documentation trees and expanded server,
mobile, streaming, chat, embeddings and grammar coverage. - README files now point users to the new MkDocs guide hosted at the
GitHub Pages site. - CI and release workflows now build, test and publish
llama-crab-serveralongside the library crates. - CI workflows now run through manual dispatch instead of push triggers,
and documentation jobs use nightly Cargo where required. - The
hf-tokenizerdependency now enables theonigfeature for
tokenizer compatibility. - Rustdoc crate logos now reference the current Canarim Crab asset.
Fixed
- Removed unused placeholder OpenAI-compat wrapper bindings from
llama-crab-sysand the old chat module export. - Gated the Metal backend build configuration to macOS targets.
- Hardened documentation builds and docs deployment workflow behavior.
- Cleaned up server and example runner support for the new server and
mobile workflows.
v0.1.4
[0.1.4] - 2026-06-14
Added
- Added high-level streaming completion APIs, including
create_completion_stream,create_completion_stream_with_sampler,
CompletionChunk,StreamControland richer completion logprob
metadata. - Added
llama-crab-server, an HTTP server binary for local inference
with completions, chat completions, embeddings, reranking,
tokenization, detokenization, SSE streaming and optional multimodal
chat support. - Added OpenAI-style high-level convenience helpers for text, chat and
embeddings with token accounting. - Added the
server_lfmexample wrapper and anlfm-textdownload
target for launching the HTTP server with LFM text models. - Added the
streamingexample to demonstrate callback-driven text
generation. - Added
tauri-plugin-llama-crab, a Tauri IPC runtime for loading
GGUF models and exposing OpenAI-like chat, completion, embedding,
rerank, tokenization and model-management commands. - Added the
@llama-crab/coreand@llama-crab/tauriTypeScript
packages with shared OpenAI-like contracts, request mappers and a
Tauri client. - Added the
tauri-chat-lfmdesktop example and smoke coverage for
the Tauri chat workflow. - Added mobile-oriented runtime presets through
MobilePresetand
LlamaParams::with_mobile_preset. - Added broader tool-call streaming support, including OpenAI-style
tool-call deltas. - Added documentation deployment for the project guide.
Changed
- Migrated the user guide from mdBook/MkDocs-era documentation to
Docusaurus, with expanded server, mobile, Tauri, TypeScript,
streaming, chat, embeddings and grammar coverage. - Reorganized the repository into
crates/andpackages/workspaces
so Rust crates, TypeScript packages and examples share one release
surface. - README files now point users to the new Docusaurus guide hosted at the
GitHub Pages site. - CI and release workflows now build, test and publish
llama-crab-server,tauri-plugin-llama-craband TypeScript
packages alongside the library crates. - CI workflows now run through manual dispatch instead of push triggers,
and documentation jobs use nightly Cargo where required. - The
hf-tokenizerdependency now enables theonigfeature for
tokenizer compatibility. - Rustdoc crate logos now reference the current Canarim Crab asset.
Fixed
- Removed unused placeholder OpenAI-compat wrapper bindings from
llama-crab-sysand the old chat module export. - Gated the Metal backend build configuration to macOS targets.
- Hardened documentation builds and docs deployment workflow behavior.
- Cleaned up server and example runner support for the new server and
mobile workflows.
llama-crab v0.1.201
What's Changed
- chore(release): prepare v0.1.201 by @DominguesM in #1
New Contributors
- @DominguesM made their first contribution in #1
Full Changelog: v0.1.2...v0.1.201