Skip to content

Releases: DominguesM/llama-crab

v0.1.8

16 Jun 14:56
cbd040b

Choose a tag to compare

What Changed

Dependency Updates

Crate From To
thiserror 1.x 2.x
bindgen 0.69.x 0.72.x
tokenizers 0.20.x 0.23.x
axum 0.7.x 0.8.x
tower-http 0.6.x 0.7.x
cc 1.2.63 1.2.64
smallvec 1.15.1 1.15.2

Version Bump

All crates and npm packages bumped from 0.1.7 to 0.1.8.

Documentation

  • Fixed documentation URLs to include .html extensions.

Links

v0.1.7 — Fix use-after-move of LlamaModel

16 Jun 12:42
8f8aa41

Choose a tag to compare

What's Changed

Critical Bug Fix

Fixed a use-after-move of LlamaModel that caused SIGSEGV when Llama crossed a return boundary (e.g. Llama::load returning by value, or Llama being moved across scope).

The self-referential &'a LlamaModel field on LlamaContext, papered over by a PhantomData<*mut ()> + transmute in Llama::load, has been replaced with a heap-allocated Box<LlamaModel> and a NonNull<LlamaModel> raw pointer on the context.

Symptoms fixed:

  • n_embd reading as 0 in the embedding context
  • n_vocab reading stale or zeroed data in logits_ith / sampled_probs_ith
  • Random crashes in rerank and any flow that crossed a return boundary with a Llama value

🔧 Changes

  • Llama.model is now Box<LlamaModel> (heap-allocated, stable address)
  • LlamaContext.model is now NonNull<LlamaModel> (raw pointer owned by the context)
  • Removed the 'a lifetime parameter from LlamaContext
  • Fields in Llama reordered to context → model → _backend → _not_send_sync to enforce correct drop order
  • Added 9 regression tests covering embeddings, rerank, infill, and streaming APIs
  • Bumped version to 0.1.7
  • Workspace members now use crates/* glob pattern
  • Updated changelog and versioned docs.rs links

Verification

All 9 new regression tests pass:

  • embeddings_seq_returns_768_dim_unit_norm_vector
  • embed_called_twice_returns_consistent_dim
  • logits_ith_after_decode_reads_n_vocab_floats
  • rerank_scores_documents_and_top_match_is_rust
  • rerank_empty_documents_returns_empty_vec
  • infill_returns_some_content
  • infill_called_twice_is_consistent
  • streaming_completion_collects_tokens
  • streaming_completion_can_stop_early

Migration Notes

No public API breakage. All existing code continues to work without changes.

v0.1.6 — Hugging Face model download

15 Jun 17:41
d6779f8

Choose a tag to compare

Highlights

LlamaParams now accepts a Hugging Face repository id directly. Pass "TheBloke/Llama-2-7B-Chat-GGUF" and the library downloads the GGUF to the official HF cache (~/.cache/huggingface/hub) before loading. Local paths still work unchanged; the Tauri plugin inherits the new behavior automatically.

use llama_crab::{Llama, LlamaParams};

let mut llama = Llama::load(
    LlamaParams::new("TheBloke/Llama-2-7B-Chat-GGUF")
        .with_hf_filename("llama-2-7b-chat.Q4_K_M.gguf")
        .with_n_ctx(2048),
)?;

What's new

Library

  • hf-hub cargo feature (opt-in) — gates the new functionality. Mirror of the existing mtmd pattern.
  • HfDownloader trait + MockHfDownloader (always compiled, for tests) + RealHfDownloader (gated, uses hf-hub 0.5 sync API).
  • 5 new builders on LlamaParams: with_hf_filename, with_hf_revision, with_hf_token, with_hf_cache_dir, with_hf_endpoint.
  • LlamaError::ModelDownload(String) variant for download errors.
  • HF_TOKEN and HF_ENDPOINT env vars honored (read in RealHfDownloader::new, never logged).
  • HF_HOME respected for cache location.
  • Auto-detect heuristic: ^[A-Za-z0-9._-]+(/[A-Za-z0-9._-]+)?$ + !Path::new(s).exists() — falls through to local for existing paths and ambiguous local-path names (models/, model/).
  • Auto-pick logic: 0 .gguf → error; 1 → auto-pick; >1 → error suggesting with_hf_filename.
  • tracing::info! at download start/end with repo, filename, size_bytes, elapsed_ms.

Server

  • --hf-filename <NAME> CLI flag (env LLAMA_CRAB_HF_FILENAME).
  • hf-hub server feature (opt-in): cargo install llama-crab-server --features hf-hub --force.

Tauri plugin

  • Always pulls in the hf-hub feature so end-user Tauri apps can use HF repo ids without extra build config.

Install / Upgrade

# Library
cargo add llama-crab --features hf-hub

# Server
cargo install llama-crab-server --features hf-hub --force

Test

# Skip state
cargo test -p llama-crab --features hf-hub --test hf_download

# End-to-end (downloads TinyLlama, ~636 MB, verifies cache hit, loads into Metal)
LLAMA_CRAB_RUN_HF_INTEGRATION=1 cargo test \
  -p llama-crab --features hf-hub --test hf_download

Verification

Check Result
cargo build -p llama-crab --no-default-features OK
cargo build -p llama-crab --features hf-hub OK
cargo build -p llama-crab-server --features hf-hub OK
cargo clippy --all-targets --features hf-hub -- -D warnings clean
cargo test --lib (no-default-features) 120/120 pass
cargo test --lib (hf-hub) 120/120 pass (2 env-gated ignored)
cargo test --doc (both states) 11/11 pass
cargo test --test hf_download (skip) clean skip
CI (16 jobs, Linux + macOS) 16/16 pass
Release workflow (crates + npm) success

PRs

  • #12: feat: add Hugging Face model download from LlamaParams
  • #13: chore(release): bump version to 0.1.6
  • #14: chore(release): bump npm package versions to 0.1.6

Guardrails (per design review)

  • No dep:tokio in the hf-hub feature (sync API only)
  • No #[from] hf_hub::Error (would leak the gated type into the always-compiled error enum)
  • No SHA256 verification (delegated to hf-hub etag mechanism; documented limitation)
  • No async / progress callback API in v1
  • No hf: URL prefix syntax
  • No token / auth-bearing URLs at tracing::info! level
  • Server hf-hub is opt-in (kept out of default = [])

v0.1.5

15 Jun 10:31
4e51e49

Choose a tag to compare

[0.1.5] - 2026-06-15

Changed

  • Moved the documentation site out of this repository. The site is now
    published at https://llama-crab.nlp.rocks/ instead of the previous
    GitHub Pages URLs. The docs/ folder and the
    Publish docs site GitHub Actions workflow have been removed from this
    repo. README files and crate-level docs throughout this workspace were
    updated to point at the new URL.

Added

  • tauri-plugin-llama-crab: added a Config struct and init_with_config
    entry point so consumers can apply plugin-wide defaults (n_ctx, n_batch,
    n_ubatch, n_threads, n_threads_batch, n_gpu_layers, default model name)
    at startup. Anything left as None lets the per-request field win, with
    the llama-crab defaults as the final fallback.
  • tauri-plugin-llama-crab: added the mtmd cargo feature. When enabled,
    load_model can take an mmproj_path and the chat pipeline runs
    multimodal (vision) inference through llama.cpp's mtmd projector.
    Image inputs are accepted as data:image/...;base64,... URLs and as
    local file paths.
  • tauri-plugin-llama-crab: added granular PluginError kinds
    (workerSpawnFailed, workerDisconnected, workerPanicked,
    multimodalNotEnabled, multimodalSetup, mediaDecode) so the
    TypeScript client can distinguish failure modes instead of collapsing
    every error into worker.

Changed

  • tauri-plugin-llama-crab: JoinError from spawn_blocking now maps
    to workerPanicked; mpsc::RecvError maps to workerDisconnected;
    thread-spawn failures map to workerSpawnFailed.
  • @llama-crab/tauri: the Support Matrix entry for multimodal now
    reflects that the Rust plugin must be built with the mtmd cargo
    feature for image parts to be processed.

v0.1.3x

14 Jun 13:23
4bc82c1

Choose a tag to compare

Added

  • Added high-level streaming completion APIs, including
    create_completion_stream, create_completion_stream_with_sampler,
    CompletionChunk, StreamControl and richer completion logprob
    metadata.
  • Added llama-crab-server, an HTTP server binary for local inference
    with completions, chat completions, embeddings, reranking,
    tokenization, detokenization, SSE streaming and optional multimodal
    chat support.
  • Added OpenAI-style high-level convenience helpers for text, chat and
    embeddings with token accounting.
  • Added the server_lfm example wrapper and an lfm-text download
    target for launching the HTTP server with LFM text models.
  • Added the streaming example to demonstrate callback-driven text
    generation.
  • Added mobile-oriented runtime presets through MobilePreset and
    LlamaParams::with_mobile_preset.
  • Added broader tool-call streaming support, including OpenAI-style
    tool-call deltas.
  • Added documentation deployment for the project guide.

Changed

  • Migrated the user guide from mdBook to Material for MkDocs, with
    English and Portuguese documentation trees and expanded server,
    mobile, streaming, chat, embeddings and grammar coverage.
  • README files now point users to the new MkDocs guide hosted at the
    GitHub Pages site.
  • CI and release workflows now build, test and publish
    llama-crab-server alongside the library crates.
  • CI workflows now run through manual dispatch instead of push triggers,
    and documentation jobs use nightly Cargo where required.
  • The hf-tokenizer dependency now enables the onig feature for
    tokenizer compatibility.
  • Rustdoc crate logos now reference the current Canarim Crab asset.

Fixed

  • Removed unused placeholder OpenAI-compat wrapper bindings from
    llama-crab-sys and the old chat module export.
  • Gated the Metal backend build configuration to macOS targets.
  • Hardened documentation builds and docs deployment workflow behavior.
  • Cleaned up server and example runner support for the new server and
    mobile workflows.

v0.1.4

14 Jun 23:08
9357775

Choose a tag to compare

[0.1.4] - 2026-06-14

Added

  • Added high-level streaming completion APIs, including
    create_completion_stream, create_completion_stream_with_sampler,
    CompletionChunk, StreamControl and richer completion logprob
    metadata.
  • Added llama-crab-server, an HTTP server binary for local inference
    with completions, chat completions, embeddings, reranking,
    tokenization, detokenization, SSE streaming and optional multimodal
    chat support.
  • Added OpenAI-style high-level convenience helpers for text, chat and
    embeddings with token accounting.
  • Added the server_lfm example wrapper and an lfm-text download
    target for launching the HTTP server with LFM text models.
  • Added the streaming example to demonstrate callback-driven text
    generation.
  • Added tauri-plugin-llama-crab, a Tauri IPC runtime for loading
    GGUF models and exposing OpenAI-like chat, completion, embedding,
    rerank, tokenization and model-management commands.
  • Added the @llama-crab/core and @llama-crab/tauri TypeScript
    packages with shared OpenAI-like contracts, request mappers and a
    Tauri client.
  • Added the tauri-chat-lfm desktop example and smoke coverage for
    the Tauri chat workflow.
  • Added mobile-oriented runtime presets through MobilePreset and
    LlamaParams::with_mobile_preset.
  • Added broader tool-call streaming support, including OpenAI-style
    tool-call deltas.
  • Added documentation deployment for the project guide.

Changed

  • Migrated the user guide from mdBook/MkDocs-era documentation to
    Docusaurus, with expanded server, mobile, Tauri, TypeScript,
    streaming, chat, embeddings and grammar coverage.
  • Reorganized the repository into crates/ and packages/ workspaces
    so Rust crates, TypeScript packages and examples share one release
    surface.
  • README files now point users to the new Docusaurus guide hosted at the
    GitHub Pages site.
  • CI and release workflows now build, test and publish
    llama-crab-server, tauri-plugin-llama-crab and TypeScript
    packages alongside the library crates.
  • CI workflows now run through manual dispatch instead of push triggers,
    and documentation jobs use nightly Cargo where required.
  • The hf-tokenizer dependency now enables the onig feature for
    tokenizer compatibility.
  • Rustdoc crate logos now reference the current Canarim Crab asset.

Fixed

  • Removed unused placeholder OpenAI-compat wrapper bindings from
    llama-crab-sys and the old chat module export.
  • Gated the Metal backend build configuration to macOS targets.
  • Hardened documentation builds and docs deployment workflow behavior.
  • Cleaned up server and example runner support for the new server and
    mobile workflows.

llama-crab v0.1.201

13 Jun 17:07
c973b69

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.1.2...v0.1.201