v0.4.0 — Multi-Threading
Multi-Threading Support
Perry v0.4.0 introduces real OS-level multi-threading — something no JavaScript runtime can do. V8, Bun, and Deno are locked to one thread per isolate. Perry compiles to native code with no such limitation.
perry/thread Module
Three primitives with compile-time safety:
import { parallelMap, parallelFilter, spawn } from "perry/thread";
// Process a large array across all CPU cores
const results = parallelMap(data, (item) => heavyComputation(item));
// Filter a large dataset in parallel
const active = parallelFilter(users, (u) => u.score > threshold);
// Run expensive work on a background thread
const answer = await spawn(() => computeHash(largeFile));- Compile-time safety: Closures cannot capture mutable variables — data races eliminated by design
- Zero-cost for numbers: Numeric values cross threads as 64-bit copies, no serialization
- Automatic core detection: Arrays split across all available CPU cores
- Small array optimization: Skips threading for trivial inputs — no overhead
Parallel Compiler Pipeline
The Perry compiler itself now uses multi-threading via rayon:
- Module codegen: Cranelift code generation runs across all CPU cores
- Transform passes: HIR transforms (JS imports, native instances, monomorphization) parallelized
- Symbol scanning:
nminvocations run in parallel
Array.sort() — O(n²) → O(n log n)
Array.sort() upgraded from insertion sort to a TimSort-style hybrid: insertion sort for small arrays (≤32 elements), bottom-up merge sort for larger arrays. Massive speedup for sorting large datasets.
Documentation
Comprehensive Multi-Threading documentation with 4 pages covering the API, examples, performance tips, and safety model.
Comparison with JavaScript Runtimes
| Node.js / Deno / Bun | Perry | |
|---|---|---|
| Parallel compute | worker_threads (separate isolates, structured clone) | parallelMap / parallelFilter — one line |
| Background work | worker_threads + postMessage ceremony | await spawn(() => work()) |
| Data transfer | Structured clone (slow for large objects) | Zero-cost for numbers, efficient deep-copy |
| Safety | Runtime SharedArrayBuffer footguns | Compile-time mutable capture rejection |
| Overhead | ~2MB per worker (separate V8 isolate) | Lightweight OS thread (~8MB stack) |