Try LFM • Documentation • LEAP • Blog
ONNX export and inference tools for LFM2 models.
| Family | Quant Formats |
|---|---|
| LFM2.5, LFM2 | fp32, fp16, q4, q8 |
| LFM2.5-VL, LFM2-VL | fp32, fp16, q4, q8 |
| LFM2-MoE | fp32, fp16, q4, q4f16 |
git clone https://github.com/Liquid4All/onnx-export.git
cd onnx-export
uv sync
# For GPU inference support
uv sync --extra gpu
# For development (testing, benchmarking)
uv sync --extra dev# All precisions
uv run lfm2-export LiquidAI/LFM2.5-1.2B-Instruct --precision# All precisions
uv run lfm2-vl-export LiquidAI/LFM2.5-VL-1.6B --precision
# Conv2d vision format (alternative to default tiled)
uv run lfm2-vl-export LiquidAI/LFM2.5-VL-1.6B --vision-format conv2d# All precisions
uv run lfm2-moe-export LiquidAI/LFM2-MoE-8B-A1B --precisionAll inference commands provide interactive multi-turn chat with streaming output. They automatically detect CUDA availability and fall back to CPU if needed.
# Interactive chat (starts conversation loop)
uv run lfm2-infer --model LFM2.5-1.2B-Instruct-ONNX/onnx/model_q4.onnx
# Single prompt (non-interactive)
uv run lfm2-infer --model LFM2.5-1.2B-Instruct-ONNX/onnx/model_q4.onnx \
--prompt "Explain quantum computing"
# Force CPU execution
uv run lfm2-infer --model LFM2.5-1.2B-Instruct-ONNX/onnx/model_q4.onnx --cpu# Single image analysis
uv run lfm2-vl-infer --model LFM2.5-VL-1.6B-ONNX \
--images photo.jpg \
--prompt "What do you see in this image?"
# Multi-image comparison (up to 2 images)
uv run lfm2-vl-infer --model LFM2.5-VL-1.6B-ONNX \
--images image1.jpg image2.jpg \
--prompt "Compare these two images"
# Text-only (no images)
uv run lfm2-vl-infer --model LFM2.5-VL-1.6B-ONNX \
--prompt "Hello, how are you?"Note: VL inference requires the model directory path (not a single .onnx file) since it loads multiple components:
embed_tokens.onnx,embed_images.onnx, anddecoder.onnx.
# Interactive chat
uv run lfm2-moe-infer --model LFM2-MoE-8B-A1B-ONNX/onnx/model_q4.onnx
# Force CPU (when model does not fit VRAM)
uv run lfm2-moe-infer --model LFM2-MoE-8B-A1B-ONNX/onnx/model_q4.onnx --cpuTests verify ONNX exports against PyTorch reference models.
# Install dev dependencies
uv sync --extra dev
# LFM2 text model tests
uv run pytest tests/test_lfm2/test_decoder.py -v -k "q4"
# LFM2-VL vision-language tests
uv run pytest tests/test_lfm2_vl/test_decoder.py -v -k "450M"
uv run pytest tests/test_lfm2_vl/test_vision_encoder.py -v
# LFM2-MoE tests
uv run pytest tests/test_lfm2_moe/test_decoder.py -vBenchmarking, compare the CPU
# Text model benchmark
uv run lfm2-bench --model LiquidAI/LFM2.5-1.2B-Instruct \
--onnx LFM2.5-1.2B-Instruct-ONNX/onnx/model_q4.onnxText models:
- LiquidAI/LFM2.5-1.2B-Base-ONNX
- LiquidAI/LFM2.5-1.2B-Instruct-ONNX
- LiquidAI/LFM2.5-1.2B-JP-ONNX
- LiquidAI/LFM2-2.6B-Transcript-ONNX
Vision-Language:
Text models:
- onnx-community/LFM2-350M-ONNX
- onnx-community/LFM2-700M-ONNX
- onnx-community/LFM2-1.2B-ONNX
- onnx-community/LFM2-2.6B-ONNX
- onnx-community/LFM2-2.6B-Exp-ONNX
Specialized:
- onnx-community/LFM2-350M-ENJP-MT-ONNX — translation
- onnx-community/LFM2-350M-Extract-ONNX
- onnx-community/LFM2-350M-Math-ONNX
- onnx-community/LFM2-1.2B-Extract-ONNX
- onnx-community/LFM2-1.2B-RAG-ONNX
- onnx-community/LFM2-1.2B-Tool-ONNX
Vision-Language:
MoE:
Note: The onnx-community models are exported using Transformers.js tooling with a different export pipeline. This project aims to produce compatible graph structures and file naming conventions to ensure interoperability with Transformers.js and other ONNX consumers.
Special thanks to Joshua Lochner for his work on Transformers.js and the onnx-community models, which inspired and informed this project's ONNX export approach.
See LICENSE for details.