regexp-tree: Optimizer module

Overview

Optimizer transforms your regexp into an optimized version, replacing some sub-expressions with their idiomatic patterns.

Advantages:

Optimized regexps are smaller -- good for minification.
Optimized regexps may be easier to read.
Some optimizations will reduce your risk of REDOS.

Example:

/[a-zA-Z_0-9][A-Z_\da-z]*\e{1,}/

becomes:

/\w+e+/

API

optimize(regexp, {whitelist: [transformsWhitelist], blacklist: [transformsWhitelist]}): Optimize the regexp. Optionally request specific transforms.

Note that this API of the optimizer differs from the API of regexp-tree's optimize method which instead expects its first argument to be an array (the whitelist) and its second argument as an object with blacklist as a property.

Transforms will be applied until no further optimization progress is made.

If you wish to specify a whitelist, give an array of transform names from the table below.

const optimizer = require('./index.js');
const inefficient = /[0-9]/;

const optimized1 = optimizer.optimize(inefficient);
const optimized2 = optimizer.optimize(inefficient, {
  whitelist: [
    'charClassToMeta',       // [0-9] -> [\d]
    'charClassToSingleChar', // [\d] -> \d
  ]
});

console.log(`${inefficient} -> ${optimized1} === ${optimized2}`);

You can also add a blacklist, e.g., to disable the defaults (which are all enabled if no whitelist is provided):

const optimizer = require('./index.js');

const original = /[åä]/;
const optimized = /[åä]/;


const optimized1 = optimizer.optimize(original); // [åä] -> [äå]
const optimized2 = optimizer.optimize(original, { // [åä] (does not change)
  blacklist: [
    'charClassClassrangesMerge'
  ]
});

Transforms

Here is the list of transforms supported by the Optimizer module.

Transform name	Description	Example
charSurrogatePairToSingleUnicode	Unicode pairs to single Unicode char	`\ud83d\ud380` -> `\u{1f680}`
charCodeToSimpleChar	Don't use fancy char codes unless we have to	`\u0061` -> `a`
charCaseInsensitiveLowerCaseTransform	If regex is case insensitive, use lower-case everywhere	`/Aa/i` -> `/aa/i`
charClassRemoveDuplicates	Remove duplicates from char classes	`[\d\d]` -> `[\d]`
quantifiersMerge	Merge quantifiers where possible	`a{1,2}a{2,3}` -> `a{3,5}`
quantifierRangeToSymbol	Reduce visual of quantifier ranges	`a{1,}` -> `a+`
charClassClassrangesToChars	Replace char class ranges with chars	`[a-a]` -> `[a]`
charClassClassrangesMerge	Merge adjacent class ranges	`[a-de-f]` -> `[a-f]`
charClassToMeta	Use meta-chars like `\d` and `w` where possible	`[0-9]` -> `[\d]`
charClassToSingleChar	Replace a char class with a single meta-char where possible	`[\d]` -> `\d`
charEscapeUnescape	Remove unnecessary escapes	`\e` -> `e`
disjunctionRemoveDuplicates*	Remove duplicate disjunctions	`(ab\|ab)` -> `(ab)`
groupSingleCharsToCharClass*	Reduce disjunction complexity	`(a\|b\|c)` -> `[abc]`
removeEmptyGroup	Remove empty groups	`(?:)a` -> `a`
ungroup	Remove unnecessary groups	`(?:a)` -> `a`
combineRepeatingPatterns	Replace repetition with quantifiers where possible	`abcabcabc` -> `(?:abc){3}`

*: May reduce the risk of REDOS in your regexes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

regexp-tree: Optimizer module

Overview

API

Transforms

Files

README.md

Latest commit

History

README.md

File metadata and controls

regexp-tree: Optimizer module

Overview

API

Transforms