Skip to content

v0.4.0 — Multi-Threading

Choose a tag to compare

@proggeramlug proggeramlug released this 23 Mar 05:20
· 4611 commits to main since this release

Multi-Threading Support

Perry v0.4.0 introduces real OS-level multi-threading — something no JavaScript runtime can do. V8, Bun, and Deno are locked to one thread per isolate. Perry compiles to native code with no such limitation.

perry/thread Module

Three primitives with compile-time safety:

import { parallelMap, parallelFilter, spawn } from "perry/thread";

// Process a large array across all CPU cores
const results = parallelMap(data, (item) => heavyComputation(item));

// Filter a large dataset in parallel
const active = parallelFilter(users, (u) => u.score > threshold);

// Run expensive work on a background thread
const answer = await spawn(() => computeHash(largeFile));
  • Compile-time safety: Closures cannot capture mutable variables — data races eliminated by design
  • Zero-cost for numbers: Numeric values cross threads as 64-bit copies, no serialization
  • Automatic core detection: Arrays split across all available CPU cores
  • Small array optimization: Skips threading for trivial inputs — no overhead

Parallel Compiler Pipeline

The Perry compiler itself now uses multi-threading via rayon:

  • Module codegen: Cranelift code generation runs across all CPU cores
  • Transform passes: HIR transforms (JS imports, native instances, monomorphization) parallelized
  • Symbol scanning: nm invocations run in parallel

Array.sort() — O(n²) → O(n log n)

Array.sort() upgraded from insertion sort to a TimSort-style hybrid: insertion sort for small arrays (≤32 elements), bottom-up merge sort for larger arrays. Massive speedup for sorting large datasets.

Documentation

Comprehensive Multi-Threading documentation with 4 pages covering the API, examples, performance tips, and safety model.

Comparison with JavaScript Runtimes

Node.js / Deno / Bun Perry
Parallel compute worker_threads (separate isolates, structured clone) parallelMap / parallelFilter — one line
Background work worker_threads + postMessage ceremony await spawn(() => work())
Data transfer Structured clone (slow for large objects) Zero-cost for numbers, efficient deep-copy
Safety Runtime SharedArrayBuffer footguns Compile-time mutable capture rejection
Overhead ~2MB per worker (separate V8 isolate) Lightweight OS thread (~8MB stack)