lfm 0.1.0 — Rust ONNX inference for LFM2.5-VL with llmtask Task contract by uqio · Pull Request #1 · Findit-AI/lfm

uqio · 2026-05-09T08:30:05Z

First publishable cut of the crate. Rust ONNX Runtime port of LiquidAI/LFM2.5-VL-450M, with schema-constrained sampling via llguidance and the full engine-agnostic llmtask::Task surface.

Three layers

lfm::Engine — sync, single-threaded; built on ort 2.0. Engine::generate(messages, images, opts) is the unconstrained free-form path. Engine::run<T: llmtask::Task>(task, messages, images, opts) is the constrained path: any Task whose Grammar is JSON Schema, Lark, or Regex routes through llguidance and the result is decoded by the task's parse impl.
lfm::ImageAnalysisTask — built-in image-analysis preset that produces the canonical llmtask::ImageAnalysis output type, sharing the schema and resilient parser with qwen3-vl.
lfm::preproc — wasm-friendly image preprocessing surface (Preprocessor, TileGrid, EXIF-aware decode helpers). Compiles under --no-default-features --features decoders for use in contexts that don't need the inference runtime.

llmtask-driven generic engine

The whole inference path takes &impl Task<Value = Value>, so a Task written once against the llmtask contract runs through lfm (llguidance) and qwen3-vl (mistralrs) without translation. Because lfm's backend is llguidance, all three Grammar variants (JSON Schema, Lark, Regex) are accepted; engines that only speak JSON Schema reject the others via UnsupportedGrammar and the caller can route to lfm.

Strict from_dir + bundled escape hatch

Engine::from_dir byte-validates the supplied tokenizer.json, chat_template.jinja, preprocessor_config.json, and text_config.max_position_embeddings field of config.json against the bundled blobs. A model directory whose ONNX shapes pass but whose tokenizer/template/preprocessor drifted would silently corrupt prompts; this fail-closed check forces the drift into a clear load-time error.
Engine::from_onnx_dir (under bundled feature) accepts an ONNX-only directory; the bundled tokenizer / chat template / configs are written to a temp file on first use.
Engine::from_paths is the unchecked escape hatch for advanced callers pairing custom tokenizers with custom ONNX.

Architecture

Per-image vision encoding → text+image embedding splice → hybrid KV/conv-state cache decoder loop → optional schema-constrained sampling.

Graph	Role
`vision_encoder.onnx`	SigLIP2 encoder, single image per call (multi-image batching produces silently-wrong embeddings).
`embed_tokens.onnx`	Token embedding lookup.
`decoder_model_merged.onnx`	LFM2 hybrid LM: 10 conv-state + 6 KV-attn layers at sparse indices. The `Decoder` manages the non-contiguous cache layout transparently.

Compatibility with the current LFM2.5-VL-450M-ONNX exports

Two fixes were needed against the published HF repo:

Cross-axis pad for pixel_values. The SigLIP2 NaFlex pos_embed Resize-target is computed as (max_h, max_w) = ReduceMax(spatial_shapes, axis=0) per axis and reshaped to [max_h * max_w, dim]. So pixel_values.shape[1] must equal max_h * max_w (cross-axis product), not the per-entry max(h * w). flatten_to_patches now pads accordingly; per-entry spatial_shapes and pixel_attention_mask still describe each entry's actual layout.
Empty KV cache via allocator path. ort 2.0's Tensor::from_array rejects any zero-dim shape with Invalid dimension #N; all dimensions must be >= 1. This broke Decoder::new_cache initialising the empty [1, 8, 0, 64] attn cache. Routed the empty cache through Tensor::<f32>::new(allocator, shape) (ONNX Runtime allocator path), which accepts zero-element shapes.

Admission-control DoS guards

Bounded request-shape cap (max messages, max content parts), text-size cap, image-count lower bound from min_image_tokens, header-time decoded-buffer cap, and a special-token denylist seeded from the live tokenizer's added_vocabulary. All run BEFORE any image decode or template render.

Hardware backends

ORT execution providers gated behind feature flags: cuda, tensorrt, directml, rocm, coreml. None are required for CPU inference; each requires its vendor SDK at build time.

Test plan

cargo test --lib --all-features — 138 lib tests, 0 failures
cargo check --no-default-features — bare wasm-friendly preproc surface compiles
cargo check --no-default-features --features decoders — pure preprocessing build (no ort, no tokenizers)
Integration suite vs LiquidAI/LFM2.5-VL-450M-ONNX (HEAD revision): 9/9 tests pass, including a cross-engine ImageAnalysis comparison against the airport thumbnails shared with qwen3-vl.
Codex adversarial review approved on round 1.

🤖 Generated with Claude Code

First publishable cut of the crate. Rust ONNX Runtime port of LiquidAI/LFM2.5-VL-450M, with schema-constrained sampling via llguidance and the full engine-agnostic `llmtask::Task` surface. ## Three layers - **`lfm::Engine`** — sync, single-threaded; built on `ort` 2.0. `Engine::generate(messages, images, opts)` is the unconstrained free-form path. `Engine::run<T: llmtask::Task>(task, messages, images, opts)` is the constrained path: any `Task` whose `Grammar` is JSON Schema, Lark, or Regex routes through llguidance and the result is decoded by the task's `parse` impl. - **`lfm::ImageAnalysisTask`** — built-in image-analysis preset that produces the canonical `llmtask::ImageAnalysis` output type, sharing the schema and resilient parser with `qwen3-vl`. - **`lfm::preproc`** — wasm-friendly image preprocessing surface: `Preprocessor`, `TileGrid`, EXIF-aware decode helpers. Compiles under `--no-default-features --features decoders` for use in contexts that don't need the inference runtime. ## llmtask-driven generic engine The whole inference path takes `&impl Task<Value = Value>`, so a `Task` written once against the `llmtask` contract runs through `lfm` (llguidance) and `qwen3-vl` (mistralrs) without translation. Because lfm's backend is llguidance, all three `Grammar` variants (JSON Schema, Lark, Regex) are accepted; engines that only speak JSON Schema reject the others via `UnsupportedGrammar` and the caller can route to lfm. ## Strict from_dir + bundled escape hatch - `Engine::from_dir` byte-validates the supplied `tokenizer.json`, `chat_template.jinja`, `preprocessor_config.json`, and the `text_config.max_position_embeddings` field of `config.json` against the bundled blobs. A model directory whose ONNX shapes pass but whose tokenizer/template/preprocessor drifted would silently corrupt prompts; this fail-closed check forces the drift into a clear load-time error. - `Engine::from_onnx_dir` (under `bundled` feature) accepts an ONNX-only directory; the bundled tokenizer / chat template / configs are written to a temp file on first use. - `Engine::from_paths` is the unchecked escape hatch for advanced callers pairing custom tokenizers with custom ONNX. ## Architecture (per-image vision encoding) Per-image vision encoding → text+image embedding splice → hybrid KV/conv-state cache decoder loop → optional schema-constrained sampling. Three ONNX graphs: - `vision_encoder.onnx` — SigLIP2 encoder, single image per call (multi-image batching produces silently-wrong embeddings). - `embed_tokens.onnx` — token embedding lookup. - `decoder_model_merged.onnx` — LFM2 hybrid LM (10 conv-state + 6 KV-attn layers at sparse indices). The `Decoder` manages the non-contiguous cache layout transparently. ## Compatibility with the current LFM2.5-VL-450M-ONNX exports Two fixes were needed against the published HF repo: 1. The SigLIP2 NaFlex `pos_embed` Resize-target is computed as `(max_h, max_w) = ReduceMax(spatial_shapes, axis=0)` per axis and reshaped to `[max_h * max_w, dim]`. So `pixel_values.shape[1]` must equal `max_h * max_w` (cross-axis product), not the per- entry `max(h * w)`. `flatten_to_patches` now pads accordingly; per-entry `spatial_shapes` and `pixel_attention_mask` still describe each entry's actual layout. 2. `ort 2.0`'s `Tensor::from_array` rejects any zero-dim shape with `Invalid dimension #N; all dimensions must be >= 1`. This broke `Decoder::new_cache` initialising the empty `[1, 8, 0, 64]` attn cache. Routed the empty cache through `Tensor::<f32>::new( allocator, shape)` (ONNX Runtime allocator path), which accepts zero-element shapes. ## Admission-control DoS guards Bounded request-shape cap (max messages, max content parts), text-size cap, image-count lower bound from `min_image_tokens`, header-time decoded-buffer cap, and a special-token denylist seeded from the live tokenizer's `added_vocabulary`. All run BEFORE any image decode or template render. ## Hardware backends ORT execution providers gated behind feature flags: `cuda`, `tensorrt`, `directml`, `rocm`, `coreml`. None are required for CPU inference; each requires its vendor SDK at build time. ## Verification - `cargo test --lib --all-features` — 138 lib tests, 0 failures - `cargo check --no-default-features` — bare wasm-friendly preproc surface compiles - `cargo check --no-default-features --features decoders` — pure preprocessing build (no `ort`, no `tokenizers`) - Integration suite vs `LiquidAI/LFM2.5-VL-450M-ONNX` (HEAD revision): 9/9 tests pass, including a cross-engine `ImageAnalysis` comparison against the airport thumbnails shared with `qwen3-vl`. - Codex adversarial review approved (round 1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… mismatch The Windows CI test job fails to link with: libesaxx_rs … error LNK2038: mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MD_DynamicRelease' in libort_sys fatal error LNK1319: 1 mismatches detected `esaxx-rs` is the C++ suffix-array trainer for tokenizers' Unigram training. It's pulled in by tokenizers' default `esaxx_fast` feature and built with the static CRT `/MT`, which conflicts with `ort_sys` built with the dynamic CRT `/MD`. lfm only uses `tokenizers::Tokenizer::from_file` + encode/decode at inference time — the trainer paths are dead code for us. Drop default features (`progressbar`, `onig`, `esaxx_fast`) and re-enable just `fancy-regex`, the pure-Rust regex backend that JSON-defined BPE tokenizers need at runtime. This matches the feature set the sibling `toktrie_hf_tokenizers` 1.7 already uses on tokenizers 0.21 transitively, so the dep tree stays consistent. 138/138 lib tests pass locally with the new feature set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…act (#1)

#1)

uqio force-pushed the 0.1.0 branch from 53d8394 to e1af99d Compare May 9, 2026 09:39

uqio force-pushed the 0.1.0 branch from 54d5a5b to 93733a4 Compare May 10, 2026 05:09

uqio changed the title ~~0.1.0~~ lfm 0.1.0 — Rust ONNX inference for LFM2.5-VL with llmtask Task contract May 10, 2026

uqio merged commit 0e957d1 into main May 10, 2026
2 of 11 checks passed

uqio deleted the 0.1.0 branch May 10, 2026 05:54

uqio added a commit that referenced this pull request May 10, 2026

lfm 0.1.0 — Rust ONNX inference for LFM2.5-VL with llmtask Task contr…

40d6184

…act (#1)

uqio added a commit that referenced this pull request May 10, 2026

lfm 0.1.0 — Rust ONNX inference for LFM2.5-VL with llmtask Task contr…

bed209c

…act (#1)

uqio added a commit that referenced this pull request May 10, 2026

lfm 0.1 — Rust ONNX inference for LFM2.5-VL with llmtask Task contract (

a416cbe

#1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lfm 0.1.0 — Rust ONNX inference for LFM2.5-VL with llmtask Task contract#1

lfm 0.1.0 — Rust ONNX inference for LFM2.5-VL with llmtask Task contract#1
uqio merged 2 commits into
mainfrom
0.1.0

uqio commented May 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

uqio commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Three layers

llmtask-driven generic engine

Strict from_dir + bundled escape hatch

Architecture

Compatibility with the current LFM2.5-VL-450M-ONNX exports

Admission-control DoS guards

Hardware backends

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

uqio commented May 9, 2026 •

edited

Loading