Releases · DominguesM/llama-crab

16 Jun 14:56

DominguesM

v0.1.8

cbd040b

v0.1.8 Latest

Latest

What Changed

Dependency Updates

Crate	From	To
`thiserror`	1.x	2.x
`bindgen`	0.69.x	0.72.x
`tokenizers`	0.20.x	0.23.x
`axum`	0.7.x	0.8.x
`tower-http`	0.6.x	0.7.x
`cc`	1.2.63	1.2.64
`smallvec`	1.15.1	1.15.2

Version Bump

All crates and npm packages bumped from 0.1.7 to 0.1.8.

Documentation

Fixed documentation URLs to include .html extensions.

Links

Assets 2

16 Jun 12:42

DominguesM

v0.1.7

8f8aa41

v0.1.7 — Fix use-after-move of LlamaModel

What's Changed

Critical Bug Fix

Fixed a use-after-move of LlamaModel that caused SIGSEGV when Llama crossed a return boundary (e.g. Llama::load returning by value, or Llama being moved across scope).

The self-referential &'a LlamaModel field on LlamaContext, papered over by a PhantomData<*mut ()> + transmute in Llama::load, has been replaced with a heap-allocated Box<LlamaModel> and a NonNull<LlamaModel> raw pointer on the context.

Symptoms fixed:

n_embd reading as 0 in the embedding context
n_vocab reading stale or zeroed data in logits_ith / sampled_probs_ith
Random crashes in rerank and any flow that crossed a return boundary with a Llama value

🔧 Changes

Llama.model is now Box<LlamaModel> (heap-allocated, stable address)
LlamaContext.model is now NonNull<LlamaModel> (raw pointer owned by the context)
Removed the 'a lifetime parameter from LlamaContext
Fields in Llama reordered to context → model → _backend → _not_send_sync to enforce correct drop order
Added 9 regression tests covering embeddings, rerank, infill, and streaming APIs
Bumped version to 0.1.7
Workspace members now use crates/* glob pattern
Updated changelog and versioned docs.rs links

Verification

All 9 new regression tests pass:

embeddings_seq_returns_768_dim_unit_norm_vector
embed_called_twice_returns_consistent_dim
logits_ith_after_decode_reads_n_vocab_floats
rerank_scores_documents_and_top_match_is_rust
rerank_empty_documents_returns_empty_vec
infill_returns_some_content
infill_called_twice_is_consistent
streaming_completion_collects_tokens
streaming_completion_can_stop_early

Migration Notes

No public API breakage. All existing code continues to work without changes.

Assets 2

15 Jun 17:41

DominguesM

v0.1.6

d6779f8

v0.1.6 — Hugging Face model download

Highlights

LlamaParams now accepts a Hugging Face repository id directly. Pass "TheBloke/Llama-2-7B-Chat-GGUF" and the library downloads the GGUF to the official HF cache (~/.cache/huggingface/hub) before loading. Local paths still work unchanged; the Tauri plugin inherits the new behavior automatically.

use llama_crab::{Llama, LlamaParams};

let mut llama = Llama::load(
    LlamaParams::new("TheBloke/Llama-2-7B-Chat-GGUF")
        .with_hf_filename("llama-2-7b-chat.Q4_K_M.gguf")
        .with_n_ctx(2048),
)?;

What's new

Library

hf-hub cargo feature (opt-in) — gates the new functionality. Mirror of the existing mtmd pattern.
HfDownloader trait + MockHfDownloader (always compiled, for tests) + RealHfDownloader (gated, uses hf-hub 0.5 sync API).
5 new builders on LlamaParams: with_hf_filename, with_hf_revision, with_hf_token, with_hf_cache_dir, with_hf_endpoint.
LlamaError::ModelDownload(String) variant for download errors.
HF_TOKEN and HF_ENDPOINT env vars honored (read in RealHfDownloader::new, never logged).
HF_HOME respected for cache location.
Auto-detect heuristic: ^[A-Za-z0-9._-]+(/[A-Za-z0-9._-]+)?$ + !Path::new(s).exists() — falls through to local for existing paths and ambiguous local-path names (models/, model/).
Auto-pick logic: 0 .gguf → error; 1 → auto-pick; >1 → error suggesting with_hf_filename.
tracing::info! at download start/end with repo, filename, size_bytes, elapsed_ms.

Server

--hf-filename <NAME> CLI flag (env LLAMA_CRAB_HF_FILENAME).
hf-hub server feature (opt-in): cargo install llama-crab-server --features hf-hub --force.

Tauri plugin

Always pulls in the hf-hub feature so end-user Tauri apps can use HF repo ids without extra build config.

Install / Upgrade

# Library
cargo add llama-crab --features hf-hub

# Server
cargo install llama-crab-server --features hf-hub --force

Test

# Skip state
cargo test -p llama-crab --features hf-hub --test hf_download

# End-to-end (downloads TinyLlama, ~636 MB, verifies cache hit, loads into Metal)
LLAMA_CRAB_RUN_HF_INTEGRATION=1 cargo test \
  -p llama-crab --features hf-hub --test hf_download

Verification

Check	Result
`cargo build -p llama-crab --no-default-features`	OK
`cargo build -p llama-crab --features hf-hub`	OK
`cargo build -p llama-crab-server --features hf-hub`	OK
`cargo clippy --all-targets --features hf-hub -- -D warnings`	clean
`cargo test --lib` (no-default-features)	120/120 pass
`cargo test --lib` (hf-hub)	120/120 pass (2 env-gated ignored)
`cargo test --doc` (both states)	11/11 pass
`cargo test --test hf_download` (skip)	clean skip
CI (16 jobs, Linux + macOS)	16/16 pass
Release workflow (crates + npm)	success

PRs

#12: feat: add Hugging Face model download from LlamaParams
#13: chore(release): bump version to 0.1.6
#14: chore(release): bump npm package versions to 0.1.6

Guardrails (per design review)

No dep:tokio in the hf-hub feature (sync API only)
No #[from] hf_hub::Error (would leak the gated type into the always-compiled error enum)
No SHA256 verification (delegated to hf-hub etag mechanism; documented limitation)
No async / progress callback API in v1
No hf: URL prefix syntax
No token / auth-bearing URLs at tracing::info! level
Server hf-hub is opt-in (kept out of default = [])

Assets 2

15 Jun 10:31

DominguesM

v0.1.5

4e51e49

v0.1.5

[0.1.5] - 2026-06-15

Changed

Moved the documentation site out of this repository. The site is now
published at https://llama-crab.nlp.rocks/ instead of the previous
GitHub Pages URLs. The docs/ folder and the
Publish docs site GitHub Actions workflow have been removed from this
repo. README files and crate-level docs throughout this workspace were
updated to point at the new URL.

Added

tauri-plugin-llama-crab: added a Config struct and init_with_config
entry point so consumers can apply plugin-wide defaults (n_ctx, n_batch,
n_ubatch, n_threads, n_threads_batch, n_gpu_layers, default model name)
at startup. Anything left as None lets the per-request field win, with
the llama-crab defaults as the final fallback.
tauri-plugin-llama-crab: added the mtmd cargo feature. When enabled,
load_model can take an mmproj_path and the chat pipeline runs
multimodal (vision) inference through llama.cpp's mtmd projector.
Image inputs are accepted as data:image/...;base64,... URLs and as
local file paths.
tauri-plugin-llama-crab: added granular PluginError kinds
(workerSpawnFailed, workerDisconnected, workerPanicked,
multimodalNotEnabled, multimodalSetup, mediaDecode) so the
TypeScript client can distinguish failure modes instead of collapsing
every error into worker.

Changed

tauri-plugin-llama-crab: JoinError from spawn_blocking now maps
to workerPanicked; mpsc::RecvError maps to workerDisconnected;
thread-spawn failures map to workerSpawnFailed.
@llama-crab/tauri: the Support Matrix entry for multimodal now
reflects that the Rust plugin must be built with the mtmd cargo
feature for image parts to be processed.

Assets 2

14 Jun 13:23

DominguesM

v0.1.300

4bc82c1

v0.1.3x

Added

Added high-level streaming completion APIs, including
create_completion_stream, create_completion_stream_with_sampler,
CompletionChunk, StreamControl and richer completion logprob
metadata.
Added llama-crab-server, an HTTP server binary for local inference
with completions, chat completions, embeddings, reranking,
tokenization, detokenization, SSE streaming and optional multimodal
chat support.
Added OpenAI-style high-level convenience helpers for text, chat and
embeddings with token accounting.
Added the server_lfm example wrapper and an lfm-text download
target for launching the HTTP server with LFM text models.
Added the streaming example to demonstrate callback-driven text
generation.
Added mobile-oriented runtime presets through MobilePreset and
LlamaParams::with_mobile_preset.
Added broader tool-call streaming support, including OpenAI-style
tool-call deltas.
Added documentation deployment for the project guide.

Changed

Migrated the user guide from mdBook to Material for MkDocs, with
English and Portuguese documentation trees and expanded server,
mobile, streaming, chat, embeddings and grammar coverage.
README files now point users to the new MkDocs guide hosted at the
GitHub Pages site.
CI and release workflows now build, test and publish
llama-crab-server alongside the library crates.
CI workflows now run through manual dispatch instead of push triggers,
and documentation jobs use nightly Cargo where required.
The hf-tokenizer dependency now enables the onig feature for
tokenizer compatibility.
Rustdoc crate logos now reference the current Canarim Crab asset.

Fixed

Removed unused placeholder OpenAI-compat wrapper bindings from
llama-crab-sys and the old chat module export.
Gated the Metal backend build configuration to macOS targets.
Hardened documentation builds and docs deployment workflow behavior.
Cleaned up server and example runner support for the new server and
mobile workflows.

Assets 2

14 Jun 23:08

DominguesM

v0.1.4

9357775

v0.1.4

[0.1.4] - 2026-06-14

Added

Added high-level streaming completion APIs, including
create_completion_stream, create_completion_stream_with_sampler,
CompletionChunk, StreamControl and richer completion logprob
metadata.
Added llama-crab-server, an HTTP server binary for local inference
with completions, chat completions, embeddings, reranking,
tokenization, detokenization, SSE streaming and optional multimodal
chat support.
Added OpenAI-style high-level convenience helpers for text, chat and
embeddings with token accounting.
Added the server_lfm example wrapper and an lfm-text download
target for launching the HTTP server with LFM text models.
Added the streaming example to demonstrate callback-driven text
generation.
Added tauri-plugin-llama-crab, a Tauri IPC runtime for loading
GGUF models and exposing OpenAI-like chat, completion, embedding,
rerank, tokenization and model-management commands.
Added the @llama-crab/core and @llama-crab/tauri TypeScript
packages with shared OpenAI-like contracts, request mappers and a
Tauri client.
Added the tauri-chat-lfm desktop example and smoke coverage for
the Tauri chat workflow.
Added mobile-oriented runtime presets through MobilePreset and
LlamaParams::with_mobile_preset.
Added broader tool-call streaming support, including OpenAI-style
tool-call deltas.
Added documentation deployment for the project guide.

Changed

Migrated the user guide from mdBook/MkDocs-era documentation to
Docusaurus, with expanded server, mobile, Tauri, TypeScript,
streaming, chat, embeddings and grammar coverage.
Reorganized the repository into crates/ and packages/ workspaces
so Rust crates, TypeScript packages and examples share one release
surface.
README files now point users to the new Docusaurus guide hosted at the
GitHub Pages site.
CI and release workflows now build, test and publish
llama-crab-server, tauri-plugin-llama-crab and TypeScript
packages alongside the library crates.
CI workflows now run through manual dispatch instead of push triggers,
and documentation jobs use nightly Cargo where required.
The hf-tokenizer dependency now enables the onig feature for
tokenizer compatibility.
Rustdoc crate logos now reference the current Canarim Crab asset.

Fixed

Removed unused placeholder OpenAI-compat wrapper bindings from
llama-crab-sys and the old chat module export.
Gated the Metal backend build configuration to macOS targets.
Hardened documentation builds and docs deployment workflow behavior.
Cleaned up server and example runner support for the new server and
mobile workflows.

Assets 2

13 Jun 17:07

DominguesM

v0.1.201

c973b69

llama-crab v0.1.201

What's Changed

chore(release): prepare v0.1.201 by @DominguesM in #1

New Contributors

@DominguesM made their first contribution in #1

Full Changelog: v0.1.2...v0.1.201

Contributors

DominguesM

Assets 2

Releases: DominguesM/llama-crab

v0.1.8

What Changed

Dependency Updates

Version Bump

Documentation

Links

Uh oh!

v0.1.7 — Fix use-after-move of LlamaModel

What's Changed

Critical Bug Fix

🔧 Changes

Verification

Migration Notes

Uh oh!

v0.1.6 — Hugging Face model download

Highlights

What's new

Library

Server

Tauri plugin

Install / Upgrade

Test

Verification

PRs

Guardrails (per design review)

Uh oh!

v0.1.5

[0.1.5] - 2026-06-15

Changed

Added

Changed

Uh oh!

v0.1.3x

Added

Changed

Fixed

Uh oh!

v0.1.4

[0.1.4] - 2026-06-14

Added

Changed

Fixed

Uh oh!

llama-crab v0.1.201

What's Changed

New Contributors

Contributors

Uh oh!