Skip to content

swap regex backend to librure (rust-lang/regex c abi) — 27x faster than posix regex.h#574

Merged
cs01 merged 4 commits intomainfrom
feat/regex-rure
Apr 20, 2026
Merged

swap regex backend to librure (rust-lang/regex c abi) — 27x faster than posix regex.h#574
cs01 merged 4 commits intomainfrom
feat/regex-rure

Conversation

@cs01
Copy link
Copy Markdown
Owner

@cs01 cs01 commented Apr 20, 2026

Summary

Replaces the POSIX <regex.h> regex backend in c_bridges/regex-bridge.c with librure — the C ABI for Rust's regex crate (rust-lang/regex). Same exported cs_regex_* symbols, so the codegen layer is unchanged.

Why

POSIX regex.h is a 1980s backtracking NFA: no JIT, no SIMD, no DFA caching. On a regex-heavy workload (100k matches, anchored pattern with one capture group) it took 51 ms. The new backend lands the same workload in 1.9 ms — a 27× speedup.

librure is a stable C ABI with no transitive deps beyond libc (+ Security/CoreFoundation on macOS, which rustc statically requires).

Speed

Time Speedup
chadscript (POSIX, before) 51 ms
chadscript (librure, this PR) 1.9 ms 27×

Three runs: 1.91 ms, 1.83 ms, 1.86 ms. Same hit count (correctness preserved).

Side benefits

  • Linear-time guarantee: librure is immune to ReDoS catastrophic backtracking. Real safety win for server-side regex on user input.
  • JS-shaped Unicode by default: RURE_FLAG_UNICODE is on. Closer to JavaScript regex semantics than POSIX.
  • Named groups, lookahead (subset), broader character class support.

Cost

  • Binaries that use regex grow from 263 KB → 4.1 MB (librure carries Unicode tables + DFA caches). Conditional bridge linking already scopes this — non-regex binaries are unchanged.
  • Contributors building from source need cargo (one-time rustup install). End users installing via the release tarball receive a prebuilt librure.a and never need rustc.

Files

  • c_bridges/regex-bridge.c — full rewrite, same exported symbols
  • scripts/vendor-pins.sh — pinned RUST_REGEX_TAG="1.11.1"
  • scripts/build-vendor.sh — adds librure build step (clones + cargo build --release + copies librure.a/rure.h)
  • scripts/build-target-sdk.sh — packages librure.a into target SDKs
  • src/compiler.ts + src/native-compiler-lib.ts — link librure.a (+ macOS frameworks) when usesRegex
  • .github/workflows/ci.yml + cross-compile.yml — install rustup; bump cache key to vendor-rure1-*; add librure.a to lib verify list and release packaging
  • BUILDING.md — documents Rust as contributor-only build dep

Verification

All 4 existing regex fixtures pass with the new backend:

  • regex-character-classes
  • regex-constructor
  • regex-exec
  • regex-exec-dynamic (exercises chad-shape string[] return path)

Out of scope (follow-ups)

  • musl static targets — librure.a ABI compat with musl needs verification
  • arm64 Linux SDK — needs rustup target add aarch64-unknown-linux-gnu in cross-compile.yml; TODO comment added
  • Phase 2: prebuilt librure.a per-arch fetched from GH Releases instead of built via cargo (drops the rustc requirement entirely for vendor builds)

Test plan

  • CI green on Ubuntu (build-linux-glibc, benchmarks)
  • CI green on macOS (build-macos)
  • test-artifact job: smoke-test regex bridge passes after release-tarball install
  • vendor cache miss on first run rebuilds librure cleanly

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 20, 2026

Benchmark Results (Linux x86-64)

Benchmark C ChadScript Go Node Place
Cold Start 1.1ms 0.9ms 1.2ms 28.9ms 🥇
Fibonacci 0.815s 0.764s 1.560s 3.168s 🥇
Hash Map Lookup 0.102s 0.064s 0.093s 0.153s 🥇
Regex Match 0.016s 0.005s 0.022s 0.005s 🥇
Binary Trees 1.529s 1.381s 2.774s 1.236s 🥈
File I/O 0.123s 0.094s 0.090s 0.210s 🥈
JSON Parse/Stringify 0.035s 0.053s 0.184s 0.139s 🥈
N-Body Simulation 1.671s 2.130s 2.205s 2.384s 🥈
SQLite 0.053s 0.376s 0.492s 0.472s 🥈
Monte Carlo Pi 0.389s 0.410s 0.405s 2.249s 🥉
Quicksort 0.215s 0.246s 0.213s 0.263s 🥉
Sieve of Eratosthenes 0.013s 0.027s 0.018s 0.038s 🥉
String Manipulation 0.008s 0.019s 0.017s 0.037s 🥉
Matrix Multiply 0.464s 0.688s 0.566s 0.371s #4

CLI Tool Benchmarks

Benchmark ChadScript grep node xxd Place
Hex Dump 0.567s 1.002s 0.129s 🥈
Recursive Grep 0.021s 0.011s 0.106s 🥈

@cs01 cs01 changed the title swap regex backend to librure (rust-lang/regex c abi) — 27x faster, beats perry by 1.6x swap regex backend to librure (rust-lang/regex c abi) — 27x faster than posix Apr 20, 2026
…bi) — 27x faster on the regex_match microbench (51ms posix → 1.9ms librure). same exported cs_regex_* symbols so codegen layer unchanged. linear-time guarantee (no redos), js-shaped unicode by default. ci installs rustup; release tarball ships prebuilt librure.a so end users dont need rustc. binary cost: regex-using binaries grow 263kb to 4.1mb, non-regex binaries unchanged.
@cs01 cs01 force-pushed the feat/regex-rure branch from 25ba753 to f9d9a76 Compare April 20, 2026 06:03
@cs01 cs01 changed the title swap regex backend to librure (rust-lang/regex c abi) — 27x faster than posix swap regex backend to librure (rust-lang/regex c abi) — 27x faster than posix regex.h Apr 20, 2026
cs01 added 3 commits April 19, 2026 23:14
… 100k objects. regex_match is the workload the librure swap was motivated by; map_lookup exercises hash-keyed lookup at realistic scale (100k entries, 1m gets); json scaled up so yyjson's parser/serializer have enough work to be measurable. all three benches include c, go, chadscript, node implementations producing matching outputs (verified locally).
…e prepared statements + bound params for apples-to-apples vs go's database/sql layer (c jumps 0.347s to 0.032s = 3m qps), add regex_match+map_lookup to assemble_json.py meta block (json desc updated to 100k), install rustup in update-benchmarks.yml workflow + bump cache key for librure vendor build.
@cs01 cs01 merged commit 3d48827 into main Apr 20, 2026
13 checks passed
@cs01 cs01 deleted the feat/regex-rure branch April 20, 2026 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant