GitHub - MenkeTechnologies/fusevm

 ███████╗██╗   ██╗███████╗███████╗██╗   ██╗███╗   ███╗
 ██╔════╝██║   ██║██╔════╝██╔════╝██║   ██║████╗ ████║
 █████╗  ██║   ██║███████╗█████╗  ██║   ██║██╔████╔██║
 ██╔══╝  ██║   ██║╚════██║██╔══╝  ╚██╗ ██╔╝██║╚██╔╝██║
 ██║     ╚██████╔╝███████║███████╗ ╚████╔╝ ██║ ╚═╝ ██║
 ╚═╝      ╚═════╝ ╚══════╝╚══════╝  ╚═══╝  ╚═╝     ╚═╝

`[LANGUAGE-AGNOSTIC BYTECODE VM WITH FUSED SUPERINSTRUCTIONS]`

"One VM to run them all."

A language-agnostic bytecode virtual machine with fused superinstructions and Cranelift JIT. Any language frontend compiles to fusevm opcodes and gets fused hot-loop dispatch, extension opcode tables, stack-based execution with slot-indexed fast paths, and native code compilation via Cranelift — for free. 129 opcodes across 10 categories. Cranelift 0.130 behind jit feature flag.

cargo add fusevm --features jit   # with Cranelift JIT
cargo add fusevm                  # interpreter only

`Docs` · `API Reference` · `Crates.io` · `strykelang` · `zshrs`

[0x00] OVERVIEW

fusevm is the shared execution engine behind strykelang, zshrs, and awkrs. All three compile to the same Op enum. The VM doesn't care which language produced the bytecodes.

stryke source ──► stryke compiler ──┐
                                     │
zshrs source  ──► shell compiler  ──┼──► fusevm::Op ──► VM::run()
                                     │         │
awkrs source  ──► awk compiler    ──┘    JitCompiler::try_run_linear()
                                                │
                                          Cranelift 0.130
                                         native x86-64 / aarch64

Fused superinstructions — the compiler detects hot patterns and emits single ops instead of multi-op sequences
Extension dispatch — language-specific opcodes via Extended(u16, u8) with registered handler tables
Stack + slots — stack-based execution with slot-indexed fast paths for locals
Cranelift JIT — eligibility analysis and compilation for hot chunks
Zero-clone dispatch — ops borrowed from chunk, in-place array/hash mutation, Cow<str> string coercion
Zero runtime dependencies — pure Rust, no allocator tricks, no unsafe

[0x01] INSTALL

cargo add fusevm
# or from source
git clone https://github.com/MenkeTechnologies/fusevm && cd fusevm && cargo build

[0x02] USAGE

use fusevm::{Op, ChunkBuilder, VM, VMResult, Value};

let mut b = ChunkBuilder::new();
b.emit(Op::LoadInt(40), 1);
b.emit(Op::LoadInt(2), 1);
b.emit(Op::Add, 1);

let mut vm = VM::new(b.build());
match vm.run() {
    VMResult::Ok(val) => println!("result: {}", val.to_str()),  // "42"
    VMResult::Error(e) => eprintln!("error: {}", e),
    VMResult::Halted => {}
}

[0x03] ARCHITECTURE

                  ┌──────────────────────────────────┐
                  │         Language Frontend         │
                  │   (stryke, zshrs, or your own)    │
                  └──────────────┬───────────────────┘
                                 │ compile
                                 ▼
                  ┌──────────────────────────────────┐
                  │       ChunkBuilder::emit()       │
                  │   Op enum ──► Chunk (bytecodes)  │
                  └──────────────┬───────────────────┘
                                 │
                    ┌────────────┴────────────┐
                    ▼                         ▼
          ┌─────────────────┐     ┌─────────────────────┐
          │   VM::run()     │     │   JitCompiler       │
          │  match-dispatch │     │  Cranelift codegen   │
          ��  interpreter    │     │  (eligible chunks)   │
          └─────────────────┘     └─────────────────────┘

[0x04] FUSED SUPERINSTRUCTIONS

The performance secret. The compiler detects hot patterns and emits single ops instead of multi-op sequences:

Fused Op	Replaces	Effect
`AccumSumLoop(sum, i, limit)`	`GetSlot + GetSlot + Add + SetSlot + PreInc + NumLt + JumpIfFalse`	Entire counted sum loop in one dispatch
`SlotIncLtIntJumpBack(slot, limit, target)`	`PreIncSlot + SlotLtIntJumpIfFalse`	Loop backedge in one dispatch
`ConcatConstLoop(const, s, i, limit)`	`LoadConst + ConcatAppendSlot + SlotIncLtIntJumpBack`	String append loop in one dispatch
`PushIntRangeLoop(arr, i, limit)`	`GetSlot + PushArray + ArrayLen + Pop + SlotIncLtIntJumpBack`	Array push loop in one dispatch

Each fused op eliminates N-1 dispatch cycles, stack pushes, and branch mispredictions from the hot path.

[0x05] OP CATEGORIES

129 opcodes across 10 categories:

Category	Count	Examples
Constants & Stack	~12	`LoadInt`, `LoadFloat`, `Pop`, `Dup`, `Swap`
Variables	~8	`GetVar`, `SetVar`, `GetSlot`, `SetSlot`
Arrays & Hashes	~25	`ArrayPush`, `HashGet`, `MakeArray`, `HashKeys`
Arithmetic	~9	`Add`, `Sub`, `Mul`, `Div`, `Pow`
Comparison	~14	`NumEq`, `StrLt`, `Spaceship`
Control Flow	~5	`Jump`, `JumpIfFalse`, `JumpIfTrueKeep`
Functions	~3	`Call`, `Return`, `PushFrame`
Shell Ops	~24	`Exec`, `PipelineBegin`, `Redirect`, `Glob`, `TestFile`
Fused	~8	`AccumSumLoop`, `SlotIncLtIntJumpBack`
Extension	2	`Extended(u16, u8)`, `ExtendedWide(u16, usize)`

[0x06] EXTENSION MECHANISM

Language-specific opcodes use Extended(u16, u8) which dispatches through a handler table registered by the frontend:

let mut vm = VM::new(chunk);
vm.set_extension_handler(Box::new(|vm, id, arg| {
    match id {
        0 => { /* language-specific op 0 */ }
        1 => { /* language-specific op 1 */ }
        _ => {}
    }
}));

stryke registers ~450 extended ops. zshrs registers ~20. awkrs registers ~95. They don't conflict — each frontend owns its own ID space.

[0x07] JIT COMPILATION

The JitCompiler compiles eligible chunks to native code via Cranelift 0.130. Enable with cargo add fusevm --features jit.

use fusevm::{JitCompiler, ChunkBuilder, Op, Value};

let mut b = ChunkBuilder::new();
b.emit(Op::LoadInt(40), 1);
b.emit(Op::LoadInt(2), 1);
b.emit(Op::Add, 1);
let chunk = b.build();

let jit = JitCompiler::new();
if jit.is_linear_eligible(&chunk) {
    // Compiles to native x86-64/aarch64, caches, and runs
    let result = jit.try_run_linear(&chunk, &[]);  // Some(Int(42))
}

Linear JIT — eligible ops

Category	JIT'd Ops
Constants	`LoadInt`, `LoadFloat`, `LoadConst` (int/float), `LoadTrue`, `LoadFalse`
Arithmetic	`Add`, `Sub`, `Mul`, `Div`, `Mod`, `Pow`, `Negate`, `Inc`, `Dec`
Comparison	`NumEq`/`Ne`/`Lt`/`Gt`/`Le`/`Ge`, `Spaceship`
Bitwise	`BitAnd`/`Or`/`Xor`/`Not`, `Shl`, `Shr`
Logic	`LogNot`
Stack	`Pop`, `Dup`, `Swap`, `Rot`
Slots	`GetSlot`, `SetSlot`, `PreIncSlot`, `PreIncSlotVoid`, `AddAssignSlotVoid`

Int/float promotion: when either operand is float, both are promoted to f64. Cranelift emits iadd/fadd/fcvt_from_sint as needed. Runtime helpers for Pow (wrapping integer + f64::powf) and Mod (float fmod).

[0x08] VALUE REPRESENTATION

Value is a tagged enum with fast-path immediates:

Variant	Representation	Size
`Undef`	Tag only	0 bytes payload
`Int(i64)`	Inline	8 bytes
`Float(f64)`	Inline	8 bytes
`Bool(bool)`	Inline	1 byte
`Str(Arc<String>)`	Heap	pointer
`Array(Vec<Value>)`	Heap, in-place mutation	3 words
`Hash(HashMap<String, Value>)`	Heap, in-place mutation	7 words
`Status(i32)`	Inline	4 bytes
`Ref(Box<Value>)`	Heap	pointer
`NativeFn(u16)`	Inline	2 bytes

String coercion returns Cow<str> via as_str_cow() — borrows the inner Arc<String> for Str variants, avoiding allocation on string comparisons, concatenation, hash key lookup, and I/O.

Array and hash mutations (ArrayPush, ArrayPop, ArrayShift, ArraySet, HashSet, HashDelete) operate in-place on globals — no clone-modify-writeback cycle. Read-only access (ArrayGet, ArrayLen, HashGet, HashExists, HashKeys, HashValues) borrows directly from the globals vector.

[0x09] BENCHMARKS

All benchmarks run via criterion on Apple M-series. cargo bench for all, cargo bench --features jit --bench jit_vs_interp for JIT comparisons. HTML report at target/criterion/report/index.html.

Classic algorithms

Benchmark	Time	Ops/sec
`fib_iterative(35)`	2.7 µs	374k
`fib_recursive(20)` — 21,891 calls	1.28 ms	783
`ackermann(3,4)` — 10,547 calls	774 µs	1.3k
`sum(1..1M)` fused `AccumSumLoop`	142 ns	7.0M
`sum(1..1M)` unfused loop ops	31.0 ms	32
`nested_loop(100×100)`	352 µs	2.8k
`dispatch_nop_1M` — raw dispatch overhead	819 µs	1.22 Gops/sec
`string_build(10k)` via `ConcatConstLoop`	11.9 µs	84k

Interpreter vs Cranelift JIT vs native Rust

Slot-based inputs prevent constant folding — honest apples-to-apples comparison:

Workload	Interpreter	JIT (cached)	Native Rust	JIT vs interp	JIT vs native
`slot_mixed × 100`	2.2 µs	75 ns	42 ns	29x faster	1.8x slower
`slot_bitwise × 200`	6.6 µs	130 ns	74 ns	51x faster	1.8x slower
`slot_float × 200`	3.1 µs	246 ns	137 ns	13x faster	1.8x slower

JIT cache lookup is O(1) — chunk hash precomputed at build time (24ns overhead). The linear JIT is consistently ~1.8x slower than LLVM -O3 on real computation and 13–51x faster than the interpreter.

Block JIT — loops and branches compiled to native code

The block JIT handles real control flow — loops, conditionals, fused backedges:

Benchmark	Interpreter	Block JIT	Speedup
`sum(1..1M)` unfused loop	30.0 ms	315 µs	95x
`nested_loop(100×100)`	340 µs	9.5 µs	36x

The block JIT compiles the full CFG to native code via Cranelift. All mutable state flows through the slots pointer (*mut i64), and AccumSumLoop is register-allocated with block parameters — no memory traffic in the inner loop.

Tracking improvements

cargo bench --bench vm_bench -- --save-baseline before   # save baseline
# ... make changes ...
cargo bench --bench vm_bench -- --baseline before        # compare
open target/criterion/report/index.html                  # HTML graphs

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
benches		benches
docs		docs
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`[LANGUAGE-AGNOSTIC BYTECODE VM WITH FUSED SUPERINSTRUCTIONS]`

`Docs` · `API Reference` · `Crates.io` · `strykelang` · `zshrs`

Table of Contents

[0x00] OVERVIEW

[0x01] INSTALL

[0x02] USAGE

[0x03] ARCHITECTURE

[0x04] FUSED SUPERINSTRUCTIONS

[0x05] OP CATEGORIES

[0x06] EXTENSION MECHANISM

[0x07] JIT COMPILATION

Linear JIT — eligible ops

[0x08] VALUE REPRESENTATION

[0x09] BENCHMARKS

Classic algorithms

Interpreter vs Cranelift JIT vs native Rust

Block JIT — loops and branches compiled to native code

Tracking improvements

[0xFF] LICENSE

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[LANGUAGE-AGNOSTIC BYTECODE VM WITH FUSED SUPERINSTRUCTIONS]

Docs · API Reference · Crates.io · strykelang · zshrs

Table of Contents

[0x00] OVERVIEW

[0x01] INSTALL

[0x02] USAGE

[0x03] ARCHITECTURE

[0x04] FUSED SUPERINSTRUCTIONS

[0x05] OP CATEGORIES

[0x06] EXTENSION MECHANISM

[0x07] JIT COMPILATION

Linear JIT — eligible ops

[0x08] VALUE REPRESENTATION

[0x09] BENCHMARKS

Classic algorithms

Interpreter vs Cranelift JIT vs native Rust

Block JIT — loops and branches compiled to native code

Tracking improvements

[0xFF] LICENSE

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`[LANGUAGE-AGNOSTIC BYTECODE VM WITH FUSED SUPERINSTRUCTIONS]`

`Docs` · `API Reference` · `Crates.io` · `strykelang` · `zshrs`

Packages