High-performance string similarity and fuzzy matching via WASM bindings to rapidfuzz-rs.
This library provides blazing-fast string similarity metrics through WASM bindings to the Rust rapidfuzz-rs library, plus TypeScript implementations of advanced fuzzy matching algorithms. It combines the performance of compiled Rust/WASM with the flexibility of TypeScript for a comprehensive text similarity toolkit.
Features:
- WASM-powered distance metrics: Levenshtein, Damerau-Levenshtein, OSA, Jaro, Jaro-Winkler, Indel, LCS
- Fuzzy matching: Token-based comparison (order-insensitive, set-based)
- Process helpers: Find best matches from arrays with configurable scoring
- Unified API: Consistent interface across all metrics
- TypeScript extensions: Substring similarity, normalization presets, suggestions API
- Multi-runtime: Node.js, Bun, Deno support
- Rust toolchain via rustup
wasm-pack(pinned to the version we build against)
Install wasm-pack once per machine:
cargo install wasm-pack --version 0.13.1npm install string-metrics-wasmimport { levenshtein, ratio, tokenSortRatio, extractOne, score } from 'string-metrics-wasm';
// Basic edit distance
const dist = levenshtein('kitten', 'sitting');
console.log(dist); // 3
// Fuzzy matching (0-100 scale)
const fuzzy = ratio('hello', 'hallo');
console.log(fuzzy); // 80.0
// Order-insensitive comparison
const tokens = tokenSortRatio('new york mets', 'mets york new');
console.log(tokens); // 100.0
// Find best match from array
const choices = ['Atlanta Falcons', 'New York Jets', 'Dallas Cowboys'];
const best = extractOne('new york', choices);
console.log(best); // { choice: 'New York Jets', score: 57.14, index: 1 }
// Unified scoring API (0-1 scale)
const similarity = score('hello', 'world', 'jaroWinkler');
console.log(similarity); // 0.4666...Compatibility: All examples use camelCase option names and metric identifiers. For ecosystems that standardize on snake_case (e.g., Fulmen/Crucible fixtures), the same snake_case names are accepted as aliases and normalized internally.
Edit distance metrics return raw integer distances (lower = more similar):
Minimum edits (insertions, deletions, substitutions) to transform a into b.
levenshtein('kitten', 'sitting'); // 3Levenshtein + transpositions (unrestricted).
damerau_levenshtein('abcd', 'abdc'); // 1Optimal String Alignment (restricted Damerau-Levenshtein).
osa_distance('abcd', 'abdc'); // 1Insertions and deletions only (no substitutions).
indel_distance('hello', 'hallo'); // 2Longest Common Subsequence distance.
lcs_seq_distance('AGGTAB', 'GXTXAYB'); // 3Normalized similarity scores (0.0-1.0 scale, higher = more similar):
Normalized Levenshtein similarity.
normalized_levenshtein('kitten', 'sitting'); // 0.5714Jaro similarity.
jaro('kitten', 'sitting'); // 0.7460Jaro-Winkler similarity (boosts prefix matches).
jaro_winkler('kitten', 'sitting'); // 0.7460Normalized indel similarity.
indel_normalized_similarity('hello', 'hallo'); // 0.8Normalized LCS similarity.
lcs_seq_normalized_similarity('AGGTAB', 'GXTXAYB'); // 0.5714Fuzzy string comparison metrics (0-100 scale):
Basic fuzzy similarity using Indel distance.
ratio('kitten', 'sitting'); // 61.54Best matching substring using sliding window.
partialRatio('fuzzy', 'fuzzy wuzzy was a bear'); // 100.0Order-insensitive token comparison (sorts tokens first).
tokenSortRatio('new york mets', 'mets york new'); // 100.0Set-based token comparison (handles duplicates and order).
tokenSetRatio('hello world world', 'world hello'); // 100.0Find best matches from arrays:
Find the single best match.
Options:
scorer?: (a: string, b: string) => number- Scoring function (default:ratio)processor?: (str: string) => string- Preprocessing functionscoreCutoff?: number- Minimum score threshold (default: 0)
const choices = ['Atlanta Falcons', 'New York Jets', 'Dallas Cowboys'];
const best = extractOne('jets', choices, { scoreCutoff: 30 });
// { choice: 'New York Jets', score: 35.29, index: 1 }Find top N matches (sorted by score).
Options:
scorer?: (a: string, b: string) => number- Scoring functionprocessor?: (str: string) => string- Preprocessing functionscoreCutoff?: number- Minimum score thresholdlimit?: number- Maximum results to return
const results = extract('new york', choices, { limit: 2, scoreCutoff: 40 });
// [
// { choice: 'New York Jets', score: 57.14, index: 1 },
// { choice: 'New York Giants', score: 52.17, index: 2 }
// ]Metric-selectable interface with consistent scales:
Calculate edit distance using any metric (returns raw distance).
Supported metrics: 'levenshtein' (default), 'damerauLevenshtein', 'osa', 'indel',
'lcsSeq'
distance('hello', 'world'); // 4 (default: levenshtein)
distance('hello', 'world', 'indel'); // 8Calculate similarity using any metric (returns 0-1 normalized score).
Supported metrics: 'jaroWinkler' (default), 'levenshtein', 'damerauLevenshtein', 'osa',
'jaro', 'indel', 'lcsSeq', 'ratio', 'partialRatio', 'tokenSortRatio', 'tokenSetRatio'
score('hello', 'world'); // 0.4666... (default: jaroWinkler)
score('new york mets', 'mets york new', 'tokenSortRatio'); // 1.0
// Fulmen/Crucible users: override default metric if needed
score('hello', 'world', 'levenshtein'); // 0.5714 (edit distance-based)Normalize text for comparison with optional locale-specific case folding.
Presets: 'none', 'minimal', 'default', 'aggressive'
Locales: 'tr' (Turkish), 'az' (Azerbaijani), 'lt' (Lithuanian), or undefined (default
Unicode casefold)
normalize('Naïve Café', 'default'); // 'naïve café'
// Turkish/Azerbaijani: dotted/dotless I handling
normalize('İstanbul', 'default', 'tr'); // 'istanbul' (İ→i)
normalize('IĞDIR', 'default', 'tr'); // 'ığdır' (I→ı dotless)
// Default Unicode casefold (no locale)
normalize('İstanbul', 'default'); // 'i̇stanbul' (İ→i + combining dot)Note: Most applications don't need locale-specific normalization. Only use when processing Turkish, Azerbaijani, or Lithuanian text where dotted/dotless I distinction matters.
Get ranked suggestions with detailed scoring.
const suggestions = suggest('pythn', ['python', 'java', 'javascript'], {
metric: 'jaroWinkler',
minScore: 0.6,
maxSuggestions: 3,
});
// [
// { value: 'python', score: 0.9555, ... },
// ...
// ]See Suggestions API docs for full details.
This library uses a hybrid approach for optimal performance and flexibility:
WASM Implementations (fastest):
- Core distance metrics:
levenshtein,damerau_levenshtein,osa_distance,jaro,jaro_winkler - RapidFuzz metrics:
ratio,indel_*,lcs_seq_*
TypeScript Implementations (flexible):
- Token-based fuzzy matching:
partialRatio,tokenSortRatio,tokenSetRatio - Process helpers:
extractOne,extract - Unified API:
distance(),score() - Suggestions and normalization
Token-based metrics benefit from TypeScript's array operations and avoid WASM serialization overhead. The unified API provides a convenient abstraction over both WASM and TypeScript implementations.
- Node.js 16+ (ESM and CommonJS)
- Bun (native ESM support)
- Deno (use
npm:specifier)
- Install dependencies and tooling:
make bootstrap - Build WASM:
npm run build:wasmormake build - Build TS:
npm run build:ts
This project uses a Makefile for common tasks:
make help # Show all available targets
make build # Build WASM and TypeScript (with version check)
make test # Run tests
make clean # Remove build artifacts
# Code quality
make quality # Run all quality checks (format-check, lint, rust checks)
make format # Format all code (Biome + Prettier + rustfmt)
make format-check # Check formatting without changes
make lint # Lint TypeScript code with Biome
make lint-fix # Lint and auto-fix TypeScript code
# Version management
make version-check # Verify package.json and Cargo.toml versions match
make bump-patch # Bump patch version (0.1.0 -> 0.1.1)
make bump-minor # Bump minor version (0.1.0 -> 0.2.0)
make bump-major # Bump major version (0.1.0 -> 1.0.0)
make set-version VERSION=x.y.z # Set explicit versionExplore the rest of the documentation under docs/. Start with the high-level
overview or jump straight to the contributor guide in
docs/development.md.
This project uses modern, fast tooling for code quality:
- TypeScript/JavaScript: Biome for linting and formatting
- JSON/YAML/Markdown: Prettier for formatting
- Rust:
rustfmtfor formatting,clippyfor linting
Run make quality before committing to ensure all checks pass.
This project maintains version sync between package.json (npm) and Cargo.toml (Rust). The
Makefile provides targets to bump versions and keep them in sync. Additionally, the test suite
includes a version consistency check that will fail if versions drift.
Important: Always use make bump-* or make set-version commands to update versions. This
ensures both files stay synchronized.
All string comparison operations complete in < 1ms:
- WASM metrics: 0.0003-0.0005ms per operation
- Token-based metrics: 0.0003-0.0017ms per operation
- Process helpers: 0.0008-0.001ms per operation
- Unified API: minimal dispatch overhead
Run node benchmark-phase1b.js for detailed benchmarks.
This project includes comprehensive test coverage:
- 119 unit tests covering all functions
- 80 YAML fixture test cases for reproducibility
- 100% regression-free across all releases
Run tests with npm test or make test.
- rapidfuzz-rs - Rust implementation of RapidFuzz
- rapidfuzz - Original Python implementation
- strsim-rs - String similarity metrics (deprecated in favor of rapidfuzz-rs)
This project follows Semantic Versioning. Version history is maintained in CHANGELOG.md.
Current Status: See latest release for the current version and changes.
This project is licensed under the MIT License.
Contributions welcome! Please see our contributing guidelines:
- Development setup: docs/development.md
- Release workflow (maintainers): docs/publishing.md
- Authoritative policies repository: https://github.com/3leaps/oss-policies/
- Code of Conduct: https://github.com/3leaps/oss-policies/blob/main/CODE_OF_CONDUCT.md
- Security Policy: https://github.com/3leaps/oss-policies/blob/main/SECURITY.md
- Contributing Guide: https://github.com/3leaps/oss-policies/blob/main/CONTRIBUTING.md
⚡ Fast Strings. Accurate Matches. ⚡
High-performance text similarity for modern TypeScript applications
Built with ⚡ by the 3 Leaps team
String Metrics • Fuzzy Matching • WASM Performance