Correctness first. Transparency always.
LIS is a CPU-only local inference runtime for causal decoder-only models, built for engineers and researchers who need a system they can inspect, validate, reproduce, and optimise with confidence. It prioritises correctness, clear diagnostics, reproducibility, and performance transparency over broad feature coverage.
LIS is an independent personal project. The initial codebase is personally authored.
- Correctness-first — reference execution path with verified token parity
- Inspectability — opt-in machine-readable execution artifacts and diagnostics
- Reproducibility — bounded, versioned run reports with content-addressable fingerprints
- Performance transparency — opt-in per-stage and per-token wall-clock instrumentation
- Artifact-friendly execution — structured JSON reports, Markdown companions, and diagnostic traces without telemetry or uploads
- Conservative support boundaries — documented subset, explicit rejection of unsupported inputs
- CPU-only local execution
- Causal decoder-only models within the documented plain-RoPE Llama-family scope
- A narrow Qwen3 Dense BF16 merged-safetensors path (does not imply broad Qwen-family support)
- Prompts are passed as raw tokenizer text. LIS does not apply model-specific chat templates or expose thinking-mode controls, so reasoning-oriented models may produce extended explanatory output even for short prompts.
- Local HuggingFace-style directories containing
config.json, a mergedmodel.safetensors, and a compatibletokenizer.json - Supported floating dtypes:
- Llama-family path: F32, F16, BF16
- Qwen3 Dense path: BF16 only
- HuggingFace BPE
tokenizer.jsonsubset,LIS_VOCAB_V1, and direct token IDs - Greedy decode only
- Opt-in artifact and diagnostic outputs
- LIS Inspect currently supports
run_reportJSON and optional perf stderr logs
- GPU backend
- Serving / HTTP endpoint
- Distributed inference
- Continuous batching
- Sampling frameworks (temperature, top-p, top-k, beam search, speculative decoding)
- GGUF / GGML
- PyTorch
.bin,.pt,.pth - Index-only sharded safetensors loading
- LoRA / QLoRA / adapters
- Quantised formats beyond current floating dtype scope
- Broad Qwen-family support, Qwen2/Qwen2.5, Qwen3 MoE, multimodal/VL
- Mistral, GPT-2, and other model families unless separately implemented
- RoPE scaling, YaRN, sliding window, long-context variants
- Chat-template / Jinja execution
- LIS Inspect rendering for
decode_trace,layer_trace, or KV visualisation (deferred)
LIS requires a C11 compiler, standard library, and POSIX threads (pthreads). No external dependencies.
git clone <repo-url> LIS
cd LIS
make buildThe built binary is srcs/libs/lis. make build requires no private model artifacts.
make testmake test requires no private model artifacts. It builds the binary and runs the core, loader, backend, runtime, CLI, tokenizer, and threading test suites.
Model-backed execution requires a user-supplied local model. The examples below use a placeholder path; replace it with your own plain-RoPE Llama-family model directory.
MODEL_DIR=/path/to/plain-rope-llama
./srcs/libs/lis \
--model "$MODEL_DIR" \
--config "$MODEL_DIR/config.json" \
--hf-tokenizer "$MODEL_DIR/tokenizer.json" \
--prompt "Write one short sentence about the sea." \
--context 128 \
--batch 1 \
--generate 8 \
--threads 1 \
--report-json /tmp/lis_run.json/tmp/lis_* paths are example output locations; you may choose any writable path.
Model-backed targets require explicit environment variables. Unset variables yield a clear error message; no target falls back to a private path.
make verify-token-parity VERIFY_MODEL=/path/to/plain-rope-llama
make verify-qwen3-sanity VERIFY_QWEN3_MODEL=/path/to/qwen3-dense
make bench BENCH_MODEL=/path/to/plain-rope-llamaVERIFY_CONFIG and VERIFY_HF_TOKENIZER may be supplied explicitly when the default derived paths are not suitable.
All artifact and diagnostic surfaces are opt-in.
| Flag | Purpose |
|---|---|
--report-json PATH |
Canonical machine-readable execution artifact (lis.execution_artifact/v1) |
--report-md PATH |
Human-readable Markdown companion report |
--trace-json PATH |
Bounded decode-step trace artifact |
--layer-trace-json PATH |
Compact per-layer / per-stage trace artifact (requires --layer-checkpoints) |
--diagnostics |
Opt-in generation diagnostics to stderr |
--perf |
Per-stage wall-clock timings and summary to stderr |
--perf-per-token |
Implies --perf; adds per-decode-step latency lines |
--forced-prefix "ID ..." |
Forced token IDs for diagnostic comparison |
--layer-checkpoints STEP |
Layer checkpoint stats at the given step |
lis: perf-stage/lis: perf-summary/lis: perf-per-token— performance instrumentationlis: generation-diagnostic*— token-selection diagnosticslis: precision path=— resolved precision path summarylis: kv-cache:— KV cache diagnostics
report.kv_cache— deterministic KV cache structural accountingmanifest.runtime.precision_path— run precision summary inf32_accum;weights=<dtype>;kv=<dtype>form
The JSON run_report is the canonical machine-readable source of truth. The Markdown report is a human-readable companion. decode_trace and layer_trace are bounded artifact outputs; current LIS Inspect is not required to render them.
LIS Inspect is a post-execution TUI inspector (Textual-based) that reads the canonical --report-json artifact and optional captured stderr from a --perf run. It provides Overview, Perf, Per-Token, Artifact, Raw, and Issues tabs. With two report inputs it launches a two-run compare view.
PYTHONPATH=tools python -m lis_inspect \
--report-json /tmp/lis_run.json \
--stderr-log /tmp/lis_run.stderrCurrently supports run_report JSON and optional perf stderr logs. Trace, layer, and KV rendering are deferred.
- Reproducibility and Execution Artifacts
- Precision Policy
- HuggingFace tokenizer.json Compatibility
- HuggingFace Llama Compatibility
- Loader Format Scope
See SECURITY.md for vulnerability reporting. Please use GitHub private vulnerability reporting / GitHub Security Advisories.
See CONTRIBUTING.md for development setup, coding style, compatibility expectations, and pull request guidelines.
Licensed under the Apache License, Version 2.0 (LICENSE). SPDX identifier: Apache-2.0.
See NOTICE for attribution, including third-party dependency attribution.