WebAssembly port of PCRE2 - High-performance regular expression library with Unicode support and SIMD optimization.
A WebAssembly fork of the industry-standard PCRE2 regular expression library, featuring SIMD optimizations and a TypeScript API.
- π Full PCRE2 Functionality - Complete implementation with Unicode support
- β‘ SIMD Optimization - 1.2-11.3x performance improvements using WebAssembly SIMD
- π TypeScript Support - Complete type definitions and modern JavaScript API
- π Universal Compatibility - Works in browsers and Node.js environments
- π§ Dual Build System - SIDE_MODULE for dynamic linking + MAIN_MODULE for standalone usage
- π¦ Lightweight - Optimized bundle sizes
# NPM
npm install @discere-os/pcre2.wasm
# pnpm
pnpm add @discere-os/pcre2.wasm
# Yarn
yarn add @discere-os/pcre2.wasm
import PCRE2 from '@discere-os/pcre2.wasm'
// Initialize the library
const pcre2 = new PCRE2()
await pcre2.initialize()
// Simple pattern matching
const isEmail = pcre2.test(
'\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b',
'user@example.com'
)
console.log(isEmail) // true
// Compile patterns for reuse
const datePattern = pcre2.compile('(\\d{4})-(\\d{2})-(\\d{2})')
const match = datePattern.exec('Today is 2023-12-25')
console.log(match.matches[1]) // '2023'
console.log(match.matches[2]) // '12'
console.log(match.matches[3]) // '25'
// Pattern replacement
const result = datePattern.replace('Date: 2023-12-25', 'Date: $3/$2/$1')
console.log(result.result) // 'Date: 25/12/2023'
// Clean up
datePattern.destroy()
// Unicode support
const unicodePattern = pcre2.compile('\\p{L}+', { utf: true, ucp: true })
console.log(unicodePattern.test('cafΓ©')) // true
// Case-insensitive matching
const ciPattern = pcre2.compile('HELLO', { caseless: true })
console.log(ciPattern.test('hello')) // true
// Global replacement
const numbers = pcre2.compile('\\d+')
const result = numbers.replaceAll('I have 123 apples and 456 oranges', 'many')
console.log(result.result) // 'I have many apples and many oranges'
// Performance metrics
const metrics = pcre2.getMetrics()
console.log(`Compiled ${metrics.patternsCompiled} patterns`)
// System capabilities
const capabilities = pcre2.getSystemCapabilities()
console.log(`SIMD support: ${capabilities.wasmSimd}`)
Initialize the WASM module.
Options:
modulePath?: string
- Custom path to WASM moduleenableMetrics?: boolean
- Enable performance metrics collection
Compile a regular expression pattern.
Compile Options:
caseless?: boolean
- Case-insensitive matchingmultiline?: boolean
- Multiline mode (^ and $ match line breaks)dotall?: boolean
- Dot matches all characters including newlinesextended?: boolean
- Extended syntax (ignore whitespace)utf?: boolean
- UTF-8 modeucp?: boolean
- Unicode properties support
Quick pattern test (compile and match in one call).
Test if pattern matches subject.
Execute pattern and return detailed match information.
Find all matches in subject string.
Replace first match in subject.
Replace all matches in subject.
Free compiled pattern memory.
interface MatchResult {
success: boolean
captures: number
offsets: [number, number][]
matches: string[]
error?: string
}
interface SubstituteResult {
success: boolean
result: string
substitutions: number
error?: string
}
Our WebAssembly SIMD optimizations deliver performance improvements across all regex operations:
Pattern Type | Size | SIMD Speed | Scalar Speed | Speedup | Throughput |
---|---|---|---|---|---|
Character Search | 1KB | 6.5ms | 61.1ms | 9.4x | 0.1 MB/s |
Character Search | 10KB | 7.1ms | 79.6ms | 11.3x | 1.4 MB/s |
Character Search | 100KB | 13.0ms | 115.7ms | 8.9x | 7.4 MB/s |
Phone Numbers | 5KB | 6.5ms | 9.0ms | 1.4x | 0.7 MB/s |
Phone Numbers | 50KB | 6.1ms | 11.3ms | 1.8x | 7.8 MB/s |
Email Validation | 5KB | 5.8ms | 7.0ms | 1.2x | 0.8 MB/s |
Email Validation | 50KB | 6.9ms | 11.3ms | 1.6x | 6.9 MB/s |
URL Matching | 5KB | 9.8ms | 18.4ms | 1.9x | 0.5 MB/s |
URL Matching | 50KB | 6.2ms | 9.9ms | 1.6x | 7.7 MB/s |
Whitespace Normalization | 10KB | 10.2ms | 15.7ms | 1.5x | 0.9 MB/s |
Whitespace Normalization | 100KB | 10.5ms | 16.1ms | 1.5x | 9.1 MB/s |
Hex Color Codes | 5KB | 5.2ms | 6.1ms | 1.2x | 0.9 MB/s |
Hex Color Codes | 50KB | 11.6ms | 15.3ms | 1.3x | 4.1 MB/s |
- π Average Speedup: 3.4x across all test cases
- β‘ Maximum Speedup: 11.3x for character search operations
- π― Minimum Speedup: 1.2x for complex patterns on small data
- π Peak Throughput: 9.1 MB/s for text processing operations
- β 100% Accuracy: Identical results to scalar implementation
- Platform: Node.js v22.19.0 on Linux x64
- SIMD Support: WebAssembly SIMD 128-bit vectors enabled
- Test Data: Real-world patterns with varied text sizes (1KB-100KB)
- Iterations: 100-1000 per test case for statistical accuracy
-
Character Operations (8-11x speedup)
- Single character search:
wasm_i8x16_eq()
with bitmask extraction - Character counting: Parallel run-length encoding
- Memory scanning: 16-byte parallel processing
- Single character search:
-
Pattern Matching (1.2-1.8x speedup)
- Substring search: SIMD-enhanced Boyer-Moore algorithm
- Character classes: Parallel range comparisons for
[0-9]
,\s
, etc. - Complex patterns: Optimized character class evaluation
-
Text Processing (1.2-1.7x speedup)
- UTF-8 validation: Fast ASCII detection with selective validation
- Line ending detection: Parallel newline scanning
- Memory operations: Optimized
memchr
/memcmp
replacements
Benchmarks conducted on Chrome 113+ with WebAssembly SIMD enabled:
- CPU: x64 architecture with SIMD support
- Environment: Node.js v22.19.0 on Linux
- Methodology: 100-1000 iterations per test, averaged results
- Memory: Optimized alignment for 16-byte SIMD operations
PCRE2.wasm provides two build targets:
- Artifact:
install/wasm/pcre2-main.js
+pcre2-main.wasm
- Performance: WebAssembly SIMD required; optimized for Chrome/Edge 113+
- Use Case: Deno demos/tests and npm/jsr distribution
- Size: 169KB WASM (standalone)
- Performance: SIMD-optimized with dynamic loading capability
- Features: Designed for
dlopen()
integration - Compatibility: Requires SIMD-capable browsers + main module host
- Use Case: Integration with larger WebAssembly applications
SIMD required: This library fails fast if WebAssembly SIMD is unavailable. Use Chrome/Edge 113+ (or Node with WASM SIMD) and ensure SIMD is enabled.
// Dynamic loading as SIDE_MODULE
const pcre2Handle = await dlopen('https://wasm.discere.cloud/pcre2@v10.44.0/side/pcre2-side.wasm')
// Standard NPM import (standalone applications)
import PCRE2 from '@discere-os/pcre2.wasm'
const pcre2 = new PCRE2()
await pcre2.initialize()
// Process log files
const logPattern = pcre2.compile('\\[(\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2})\\] (ERROR|WARN|INFO): (.+)')
const results = logPattern.execAll(logFileContent)
console.log(`Found ${results.length} log entries`)
# MAIN
meson setup build-main --cross-file=scripts/emscripten.cross -Dmain_module=true -Dside_module=false -Dsimd=true --prefix=$PWD/install -Dlibdir=wasm -Dbindir=wasm
meson compile -C build-main
meson install -C build-main
# SIDE
meson setup build-side --cross-file=scripts/emscripten.cross -Dmain_module=false -Dside_module=true -Dsimd=true --prefix=$PWD/install -Dlibdir=wasm -Dbindir=wasm
meson compile -C build-side
meson install -C build-side
# Deno tasks
deno task build:main:meson
deno task build:side:meson
deno task build:wasm:meson
Run the comprehensive test suite:
# Run all tests
pnpm test
# Run with coverage
pnpm test:coverage
# Run in watch mode
pnpm test:watch
# Run UI mode
pnpm test:ui
# Run SIMD-specific benchmarks
pnpm benchmark:simd # Comprehensive SIMD performance benchmarks
./test-functionality.cjs # Core functionality verification
pnpm benchmark # Production-ready pattern benchmarks
The SIMD optimizations include comprehensive testing infrastructure:
- Unit Tests: 150+ test cases covering all SIMD functions
- Integration Tests: Full PCRE2 regression suite with SIMD enabled
- Performance Tests: Cross-platform benchmarking with statistical analysis
- Edge Case Tests: Boundary conditions, alignment, large datasets
All optimizations are validated with rigorous testing:
# Build all SIMD variants
deno task build:wasm:meson # Build MAIN and SIDE modules via Meson
# Run comprehensive validation
./test-functionality.cjs # Verify API functionality
pnpm benchmark # Production pattern benchmarks
pnpm benchmark:simd # Detailed SIMD performance analysis
Build the library from source:
# Install dependencies
pnpm install
# Build TypeScript + WASM modules
pnpm build
# Build only WASM modules
pnpm build:wasm
# Clean build artifacts
pnpm clean
Browser | Version | WASM Support | SIMD Support |
---|---|---|---|
Chrome | 57+ | β | 91+ |
Firefox | 52+ | β | 89+ |
Safari | 11+ | β | 14.1+ |
Edge | 16+ | β | 91+ |
Node.js | 16.4+ | β | 16.4+ |
# Clone repository
git clone https://github.com/discere-os/pcre2.wasm.git
cd pcre2.wasm
# Install dependencies
pnpm install
# Install Emscripten
git clone https://github.com/emscripten-core/emsdk.git
cd emsdk && ./emsdk install 4.0.14 && ./emsdk activate 4.0.14
source ./emsdk_env.sh
# Build and test
pnpm build
pnpm test
PCRE2.wasm is licensed under the BSD-3-Clause License, the same license as the original PCRE2 library.
This project includes:
- Original PCRE2 library Β© 1997-2024 University of Cambridge
- WebAssembly port Β© 2025 Superstruct Ltd, New Zealand
- PCRE2 Team: For the excellent regular expression library
- Emscripten Team: For the outstanding WebAssembly compiler
- Contributors: Everyone who helped improve this library