NeuroBrix 0.2.1
First publicly usable release of the universal inference runtime — not final, just usable by people. One engine, any model, any modality, zero model-specific code.
Highlights
- 4-mode execution matrix green on orpheus-snac, hat-l-x4, Flex.1-alpha, deepseek-moe-16b — each validated in a clean-room venv (none of the 14 vendor libraries installed), greedy and cross-mode-consistent. The four modes are two independent branches × two variants: PyTorch (
--sequentialop-by-op oracle,--compiledfused torch+cuDNN/cuBLAS) and Triton (--triton-sequential,--triton— NeuroBrix@triton.jitkernels, zero torch). - Audio, LLM, image, and upscaler families run end-to-end.
- New
aten::im2colTriton kernel (HAT OCAB) →hat-l-x4now 4/4. - Flex/FLUX CLIP-pooler advanced-index fix (
_meta_index) →Flex.1-alphatriton coherent. - R30 triton-sequential cross-device (pipeline-parallel) →
deepseek-moe-16b4/4. - Zero Outsider: SNAC traced into orpheus's
.nbx(no runtime HF download); internal tokenizer/mel/g2p/filterbank runners; engine runs from the.nbxwith onlytorch(compiled) /triton+NBXTensor(triton). - Engine docs aligned to the real CLI (9 families, 4 modes, upscalers).
Documented debt — Qwen3-30B-A3B --sequential
Validated by construction, not executed. The original frozen-seq-dim view crash is eliminated by re-tracing (graph carries {mul,s0,s1}, 0 frozen views), and correctness is established via compiled mode (same graph, "Paris.") plus the SymbolicShapeResolver shared by both paths. The op-by-op sequential oracle was not run to completion on 30B (≈115k ops × per-op dispatch is impractically slow). This is honest debt — not counted as a 4/4 pass. Qwen3 ships/validated in compiled. The other four matrix models are 4/4.
Install
pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install neurobrixFull details: CHANGELOG.md. Model licenses are each vendor's own (engine is Apache-2.0; it carries no model-license machinery).