Skip to content

discere-os/pcre2.wasm

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

@discere-os/pcre2.wasm

WebAssembly port of PCRE2 - High-performance regular expression library with Unicode support and SIMD optimization.

CI/CD JSR npm version License Status

A WebAssembly fork of the industry-standard PCRE2 regular expression library, featuring SIMD optimizations and a TypeScript API.

✨ Features

  • πŸš€ Full PCRE2 Functionality - Complete implementation with Unicode support
  • ⚑ SIMD Optimization - 1.2-11.3x performance improvements using WebAssembly SIMD
  • πŸ“˜ TypeScript Support - Complete type definitions and modern JavaScript API
  • 🌐 Universal Compatibility - Works in browsers and Node.js environments
  • πŸ”§ Dual Build System - SIDE_MODULE for dynamic linking + MAIN_MODULE for standalone usage
  • πŸ“¦ Lightweight - Optimized bundle sizes

πŸš€ Quick Start

Installation

# NPM
npm install @discere-os/pcre2.wasm

# pnpm
pnpm add @discere-os/pcre2.wasm

# Yarn
yarn add @discere-os/pcre2.wasm

Basic Usage

import PCRE2 from '@discere-os/pcre2.wasm'

// Initialize the library
const pcre2 = new PCRE2()
await pcre2.initialize()

// Simple pattern matching
const isEmail = pcre2.test(
  '\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b', 
  'user@example.com'
)
console.log(isEmail) // true

// Compile patterns for reuse
const datePattern = pcre2.compile('(\\d{4})-(\\d{2})-(\\d{2})')
const match = datePattern.exec('Today is 2023-12-25')
console.log(match.matches[1]) // '2023'
console.log(match.matches[2]) // '12' 
console.log(match.matches[3]) // '25'

// Pattern replacement
const result = datePattern.replace('Date: 2023-12-25', 'Date: $3/$2/$1')
console.log(result.result) // 'Date: 25/12/2023'

// Clean up
datePattern.destroy()

Advanced Features

// Unicode support
const unicodePattern = pcre2.compile('\\p{L}+', { utf: true, ucp: true })
console.log(unicodePattern.test('cafΓ©')) // true

// Case-insensitive matching
const ciPattern = pcre2.compile('HELLO', { caseless: true })
console.log(ciPattern.test('hello')) // true

// Global replacement
const numbers = pcre2.compile('\\d+')
const result = numbers.replaceAll('I have 123 apples and 456 oranges', 'many')
console.log(result.result) // 'I have many apples and many oranges'

// Performance metrics
const metrics = pcre2.getMetrics()
console.log(`Compiled ${metrics.patternsCompiled} patterns`)

// System capabilities
const capabilities = pcre2.getSystemCapabilities()
console.log(`SIMD support: ${capabilities.wasmSimd}`)

πŸ”§ API Reference

PCRE2 Class

initialize(options?: InitializationOptions): Promise<void>

Initialize the WASM module.

Options:

  • modulePath?: string - Custom path to WASM module
  • enableMetrics?: boolean - Enable performance metrics collection

compile(pattern: string, options?: CompileOptions): CompiledPattern

Compile a regular expression pattern.

Compile Options:

  • caseless?: boolean - Case-insensitive matching
  • multiline?: boolean - Multiline mode (^ and $ match line breaks)
  • dotall?: boolean - Dot matches all characters including newlines
  • extended?: boolean - Extended syntax (ignore whitespace)
  • utf?: boolean - UTF-8 mode
  • ucp?: boolean - Unicode properties support

test(pattern: string, subject: string): boolean

Quick pattern test (compile and match in one call).

CompiledPattern Class

test(subject: string, options?: MatchOptions): boolean

Test if pattern matches subject.

exec(subject: string, options?: MatchOptions): MatchResult | null

Execute pattern and return detailed match information.

execAll(subject: string, options?: MatchOptions): MatchResult[]

Find all matches in subject string.

replace(subject: string, replacement: string, options?: MatchOptions): SubstituteResult

Replace first match in subject.

replaceAll(subject: string, replacement: string, options?: MatchOptions): SubstituteResult

Replace all matches in subject.

destroy(): void

Free compiled pattern memory.

Result Types

interface MatchResult {
  success: boolean
  captures: number
  offsets: [number, number][]
  matches: string[]
  error?: string
}

interface SubstituteResult {
  success: boolean
  result: string
  substitutions: number
  error?: string
}

πŸ“Š Performance

SIMD Optimization Results

Our WebAssembly SIMD optimizations deliver performance improvements across all regex operations:

Real-World Pattern Performance

Pattern Type Size SIMD Speed Scalar Speed Speedup Throughput
Character Search 1KB 6.5ms 61.1ms 9.4x 0.1 MB/s
Character Search 10KB 7.1ms 79.6ms 11.3x 1.4 MB/s
Character Search 100KB 13.0ms 115.7ms 8.9x 7.4 MB/s
Phone Numbers 5KB 6.5ms 9.0ms 1.4x 0.7 MB/s
Phone Numbers 50KB 6.1ms 11.3ms 1.8x 7.8 MB/s
Email Validation 5KB 5.8ms 7.0ms 1.2x 0.8 MB/s
Email Validation 50KB 6.9ms 11.3ms 1.6x 6.9 MB/s
URL Matching 5KB 9.8ms 18.4ms 1.9x 0.5 MB/s
URL Matching 50KB 6.2ms 9.9ms 1.6x 7.7 MB/s
Whitespace Normalization 10KB 10.2ms 15.7ms 1.5x 0.9 MB/s
Whitespace Normalization 100KB 10.5ms 16.1ms 1.5x 9.1 MB/s
Hex Color Codes 5KB 5.2ms 6.1ms 1.2x 0.9 MB/s
Hex Color Codes 50KB 11.6ms 15.3ms 1.3x 4.1 MB/s

Performance Summary

  • πŸš€ Average Speedup: 3.4x across all test cases
  • ⚑ Maximum Speedup: 11.3x for character search operations
  • 🎯 Minimum Speedup: 1.2x for complex patterns on small data
  • πŸ“ˆ Peak Throughput: 9.1 MB/s for text processing operations
  • βœ… 100% Accuracy: Identical results to scalar implementation

Benchmark Environment

  • Platform: Node.js v22.19.0 on Linux x64
  • SIMD Support: WebAssembly SIMD 128-bit vectors enabled
  • Test Data: Real-world patterns with varied text sizes (1KB-100KB)
  • Iterations: 100-1000 per test case for statistical accuracy

SIMD Optimization Categories

  1. Character Operations (8-11x speedup)

    • Single character search: wasm_i8x16_eq() with bitmask extraction
    • Character counting: Parallel run-length encoding
    • Memory scanning: 16-byte parallel processing
  2. Pattern Matching (1.2-1.8x speedup)

    • Substring search: SIMD-enhanced Boyer-Moore algorithm
    • Character classes: Parallel range comparisons for [0-9], \s, etc.
    • Complex patterns: Optimized character class evaluation
  3. Text Processing (1.2-1.7x speedup)

    • UTF-8 validation: Fast ASCII detection with selective validation
    • Line ending detection: Parallel newline scanning
    • Memory operations: Optimized memchr/memcmp replacements

Browser Performance

Benchmarks conducted on Chrome 113+ with WebAssembly SIMD enabled:

  • CPU: x64 architecture with SIMD support
  • Environment: Node.js v22.19.0 on Linux
  • Methodology: 100-1000 iterations per test, averaged results
  • Memory: Optimized alignment for 16-byte SIMD operations

πŸ—οΈ Build Targets

PCRE2.wasm provides two build targets:

MAIN Module (Recommended)

  • Artifact: install/wasm/pcre2-main.js + pcre2-main.wasm
  • Performance: WebAssembly SIMD required; optimized for Chrome/Edge 113+
  • Use Case: Deno demos/tests and npm/jsr distribution

SIDE Module (Dynamic Linking)

  • Size: 169KB WASM (standalone)
  • Performance: SIMD-optimized with dynamic loading capability
  • Features: Designed for dlopen() integration
  • Compatibility: Requires SIMD-capable browsers + main module host
  • Use Case: Integration with larger WebAssembly applications

SIMD required: This library fails fast if WebAssembly SIMD is unavailable. Use Chrome/Edge 113+ (or Node with WASM SIMD) and ensure SIMD is enabled.

πŸ”— Integration

// Dynamic loading as SIDE_MODULE
const pcre2Handle = await dlopen('https://wasm.discere.cloud/pcre2@v10.44.0/side/pcre2-side.wasm')

// Standard NPM import (standalone applications)
import PCRE2 from '@discere-os/pcre2.wasm'

const pcre2 = new PCRE2()
await pcre2.initialize()

// Process log files
const logPattern = pcre2.compile('\\[(\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2})\\] (ERROR|WARN|INFO): (.+)')
const results = logPattern.execAll(logFileContent)

console.log(`Found ${results.length} log entries`)

πŸ”¨ Building with Meson

# MAIN
meson setup build-main --cross-file=scripts/emscripten.cross -Dmain_module=true -Dside_module=false -Dsimd=true --prefix=$PWD/install -Dlibdir=wasm -Dbindir=wasm
meson compile -C build-main
meson install -C build-main

# SIDE
meson setup build-side --cross-file=scripts/emscripten.cross -Dmain_module=false -Dside_module=true -Dsimd=true --prefix=$PWD/install -Dlibdir=wasm -Dbindir=wasm
meson compile -C build-side
meson install -C build-side

# Deno tasks
deno task build:main:meson
deno task build:side:meson
deno task build:wasm:meson

πŸ§ͺ Testing

Run the comprehensive test suite:

# Run all tests
pnpm test

# Run with coverage
pnpm test:coverage

# Run in watch mode
pnpm test:watch

# Run UI mode
pnpm test:ui

# Run SIMD-specific benchmarks
pnpm benchmark:simd   # Comprehensive SIMD performance benchmarks
./test-functionality.cjs  # Core functionality verification
pnpm benchmark   # Production-ready pattern benchmarks

SIMD Testing & Benchmarking

The SIMD optimizations include comprehensive testing infrastructure:

Test Categories

  • Unit Tests: 150+ test cases covering all SIMD functions
  • Integration Tests: Full PCRE2 regression suite with SIMD enabled
  • Performance Tests: Cross-platform benchmarking with statistical analysis
  • Edge Case Tests: Boundary conditions, alignment, large datasets

Benchmark Results Verification

All optimizations are validated with rigorous testing:

# Build all SIMD variants
deno task build:wasm:meson    # Build MAIN and SIDE modules via Meson

# Run comprehensive validation
./test-functionality.cjs      # Verify API functionality
pnpm benchmark          # Production pattern benchmarks
pnpm benchmark:simd          # Detailed SIMD performance analysis

πŸ“¦ Building

Build the library from source:

# Install dependencies
pnpm install

# Build TypeScript + WASM modules
pnpm build

# Build only WASM modules
pnpm build:wasm

# Clean build artifacts
pnpm clean

🌐 Browser Compatibility

Browser Version WASM Support SIMD Support
Chrome 57+ βœ… 91+
Firefox 52+ βœ… 89+
Safari 11+ βœ… 14.1+
Edge 16+ βœ… 91+
Node.js 16.4+ βœ… 16.4+

Development Setup

# Clone repository
git clone https://github.com/discere-os/pcre2.wasm.git
cd pcre2.wasm

# Install dependencies
pnpm install

# Install Emscripten
git clone https://github.com/emscripten-core/emsdk.git
cd emsdk && ./emsdk install 4.0.14 && ./emsdk activate 4.0.14
source ./emsdk_env.sh

# Build and test
pnpm build
pnpm test

πŸ“„ License

PCRE2.wasm is licensed under the BSD-3-Clause License, the same license as the original PCRE2 library.

This project includes:

  • Original PCRE2 library Β© 1997-2024 University of Cambridge
  • WebAssembly port Β© 2025 Superstruct Ltd, New Zealand

πŸ™ Acknowledgments

  • PCRE2 Team: For the excellent regular expression library
  • Emscripten Team: For the outstanding WebAssembly compiler
  • Contributors: Everyone who helped improve this library

πŸ“š Resources

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Languages

  • C 83.7%
  • Shell 3.5%
  • TypeScript 2.6%
  • M4 2.1%
  • Batchfile 1.9%
  • Python 1.8%
  • Other 4.4%