Releases · bwiemz/NSL

19 Mar 17:52

github-actions

v0.9.0

f5531a7

v0.9.0 Latest

Latest

Full Changelog: v0.8.0...v0.9.0

Assets 6

19 Mar 00:08

bwiemz

v0.8.0

7ae7a4d

v0.8.0: Full Roadmap Complete — M9 through M51

Milestone: Full Roadmap Delivered

NeuralScript v0.8.0 marks the completion of the entire M9–M51 roadmap. Every milestone now has its infrastructure layer implemented with analysis modules, runtime FFI, semantic validation, and unit tests.

New in v0.8.0 (Phase 9: Type System Extensions)

M49: Shape Algebra

Symbolic dimension solver with equality, divisibility, and range proofs
DimExpr extended with Mod variant + Eq/Hash derives
shape_assert decorator recognition

M50: Sparse Tensors

NslSparseTensor repr(C) struct with COO/CSR/CSC/BSR format support
Format-aware kernel dispatch (SparseOp × Format × Device)
Sparsity-preserving type inference rules
sparse(pattern="2:4") decorator validation

M51: Effect System

EffectSet bitset tracking IO, Random, Mutation, Communication
3-phase EffectChecker: local inference → call graph propagation → assertion validation
pure enforcement (no effects), checkpoint (requires pure), deterministic (no Random)
~40 known-pure builtins, conservative default for unknowns

Code Quality (from external review)

All CLI flags now wired through to compiler
5 hotspot files refactored (tensor.rs, expr.rs, compiler.rs, checker.rs, autodiff.rs)
14 panic points replaced with graceful error handling
Clippy strict clean (--all-targets)
CHANGELOG covers all versions
README honestly separates shipped vs infrastructure features

Stats

726 unit tests passing
43 milestones (M9–M51) with infrastructure complete
Clippy strict clean

Full Changelog: v0.7.0...v0.8.0

Assets 6

18 Mar 22:11

bwiemz

v0.7.0

08a84ff

v0.7.0: Phase 8 — Developer Experience, Debugging & Multimodal

What's New

Tensor Debugger (M45)

Binary trace recording (124-byte fixed-size entries) with per-op stats (min/max/mean/std)
NaN/Inf sentinel detection with automatic halt
Compile-time NaN risk analysis (log/sqrt/div patterns)
Trace diffing for non-determinism diagnosis
Chrome tracing export
@no_trace and @trace_breakpoint decorators

Reproducibility Mode (M46)

--deterministic flag with compile-time non-determinism detection
4 non-determinism categories: GPU atomics (auto-fixed), algorithm selection (auto-fixed), implicit RNG (error), external (warning)
Deterministic kernel variant selection (sort-based reduction, fixed cuBLAS)
RNG seed tracking (ExplicitSeed/Derived/Implicit)
Graph hash computation for checkpoint fingerprinting

Multimodal Primitives (M48)

PatchEmbed config with compile-time validation (image_size % patch_size)
MelSpectrogram with compile-time mel filterbank (hz-to-mel triangular filters)
CrossAttention config with Q/K dim matching and head divisibility
Modality classification heuristic (Vision/Audio/Text by rank+dtype)
@multimodal decorator validation
7 preprocessing FFI stubs (patch_embed, mel, cross_attention, resize, normalize, stft, resample)

Stats

678 unit tests passing
Clippy clean

Full Changelog: v0.5.0...v0.7.0

Contributors

multimodal

Assets 6

18 Mar 20:14

bwiemz

v0.5.0

0f66638

v0.5.0: Phase 6 — Deployment, Portability & Testing Infrastructure

Multi-Backend KIR Foundation (M47a)

Kernel IR — 40+ instruction SSA-form intermediate representation
PTX Backend — KIR to PTX lowering with typed register allocation
GpuTarget — CUDA/ROCm/Metal/WebGPU with per-backend feature capability tables
GpuBackend trait — alloc/free/copy/launch/sync interface for all backends
target(backend) — conditional compilation per GPU target
--target — CLI flag for backend selection

vmap AST Transform (M39b)

VmapTransformer — FnDef-to-FnDef AST rewriting producing _batched variants
Matmul/reduction/transpose rewriting with batch status propagation
nsl_vmap_check_batch runtime FFI

Testing Infrastructure

Snapshot testing (insta) — 7 PTX/KIR/fusion snapshots catching silent codegen regressions
Differential oracle testing — same script with/without --disable-fusion, assert numerical equivalence

Full Changelog: v0.4.0...v0.5.0

Assets 6

18 Mar 18:16

bwiemz

v0.4.0

11bade0

v0.4.0

v0.4.0: Phase 5 — Inference Optimization & Compile-Time Moat Features

Milestones M41, M42, M44 complete (M36, M37 shipped in v0.3.0).

New in v0.4.0:

M41: Disaggregated inference (prefill/decode worker separation, KV transfer, router scheduling)
M42: KV-cache compression (INT8/INT4/FP8 quantization, sliding window, H2O eviction)
M44: Constrained decoding (compiled FSM, JSON Schema/BNF grammars, token-level DFA, logit masking)

Full Changelog: v0.3.0...v0.4.0

Assets 6

18 Mar 02:56

bwiemz

v0.3.0

9ef22e7

v0.3.0

What's New

Scaling Infrastructure (M32-M34)

Mixture of Experts — @moe decorator with top-k gating, capacity-based routing, and load-balancing aux loss
Speculative Decoding — @speculative with tree attention, rejection sampling, and @medusa multi-head prediction
Ring Attention — @context_parallel(ring_size=N) for cross-GPU sequence parallelism with causal masking

Quantization (M35)

FP8 Compute — @fp8_compute decorator with E4M3/E5M2 scale management and automatic Tensor Core dispatch
AWQ 4-bit — quant { dtype: awq4 } with in-register dequantize-in-GEMM (zero memory round-trip)
GPTQ 4-bit/8-bit — quant { dtype: gptq4 } with Hessian-based optimal quantization

Compiler Intelligence (M36-M37)

Memory Planning — compile-time tensor liveness analysis, interference graph, first-fit-decreasing slab assignment with 256-byte GPU alignment
Roofline Cost Model — per-operation FLOP/byte/arithmetic-intensity analysis against a built-in GPU database (A100, H100, RTX-4090, RTX-3090, L40S); table, JSON, and Chrome tracing output formats

Language Features (M38-M40)

Linear Types — ownership checker with use-after-move detection, branch consumption symmetry, loop consumption prevention, and @shared escape hatch
Autodiff Safety — BackwardAccess classification for all 36 TapeOp variants (ShapeOnly/DataRequired/AuxDataRequired)
vmap — @vmap(batch_dim=0) automatic batching with batch-variant/invariant tracking, dimension shifting, and matmul rewrite classification
Source-to-Source AD — Wengert list extraction, 18 reverse-mode adjoint rules (reviewer-verified correct), dead gradient elimination, saved tensor analysis

Stats

408 unit tests passing across all crates
4,738 new lines of code across 42 files
18 new source modules + 7 implementation plans
Clippy clean, release build verified

Breaking Changes

None. All new features are additive. Existing code compiles unchanged.

Known Limitations

--linear-types, --vram-budget, --perf CLI flags are parsed but not yet wired through compile_entry() (same status as --fusion-report)
Source AD and vmap are infrastructure-only (analysis libraries complete, codegen integration in progress)
E2E tests that invoke nsl run require a C compiler (gcc/clang/MSVC) in PATH for linking

Full Changelog: v0.2.0...v0.3.0

Assets 6

15 Mar 18:26

github-actions

v0.2.0

ff100d3

v0.2.0

Full Changelog: v0.1.0...v0.2.0

Assets 6

13 Mar 03:12

github-actions

v0.1.0

a7b6e71

v0.1.0

What's Changed

feat(m19): Data Pipeline + Inference Sampling by @bwiemz in #1

New Contributors

@bwiemz made their first contribution in #1

Full Changelog: https://github.com/bwiemz/NSL/commits/v0.1.0

Contributors

bwiemz

Assets 6

Releases: bwiemz/NSL

v0.9.0

Uh oh!

v0.8.0: Full Roadmap Complete — M9 through M51

Milestone: Full Roadmap Delivered

New in v0.8.0 (Phase 9: Type System Extensions)

Code Quality (from external review)

Stats

Uh oh!

v0.7.0: Phase 8 — Developer Experience, Debugging & Multimodal

What's New

Tensor Debugger (M45)

Reproducibility Mode (M46)

Multimodal Primitives (M48)

Stats

Contributors

Uh oh!

v0.5.0: Phase 6 — Deployment, Portability & Testing Infrastructure

Multi-Backend KIR Foundation (M47a)

vmap AST Transform (M39b)

Testing Infrastructure

Uh oh!

v0.4.0

Uh oh!

v0.3.0

What's New

Scaling Infrastructure (M32-M34)

Quantization (M35)

Compiler Intelligence (M36-M37)

Language Features (M38-M40)

Stats

Breaking Changes

Known Limitations

Uh oh!

v0.2.0

Uh oh!

v0.1.0

What's Changed

New Contributors

Contributors

Uh oh!