Skip to content

rbranson/glob-trie.js

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GlobTrie

A highly optimized way to match strings against a large set of pattern matching expressions quickly. It breaks down the pattern matchers into pieces and organizes them in a Trie (prefix search tree) structure, allowing them to be searched in roughly logarithmic time, versus linear time for just an array of regular expressions, for instance.

Installation

Using NPM:

$ npm install glob-trie.js

Now a simple require will bring it on:

var GlobTrie = require("glob-trie.js");

Notes

  • In some rare cases, like a match class sandwiched between asterisks, you'll get duplicate matches.
  • GlobTrie.walk looks wonky because I've performance optimized it.
  • There are probably some edge cases where I'm not checking things properly and syntax errors in expressions will create odd output.
  • No captures right now, but that's on the list, if I can get it to work without impacting mainline performance.
  • The expression syntax WILL change in the future as I add features, so use a specific version.

Expression Support

Currently GlobTrie only supports a very small set of pattern matching tools, similar to what's commonly available for file path matching.

    •  will match any character 0 to infinity times
      
  • ? will match any character once
  • \ will escape * and ? and [ and ]
  • [...] will match a RegExp-compatible character class once
  • anything else gets matched at face value

Use It

var trie = new GlobTrie();

trie.add("http://www.*.com/", "A .com website!");
trie.add("http://www.*.net/", "A .net website!");
trie.add("http://www.*.org/", "A .org website!");
trie.add("http://*.??/", "A 2-character TLD website!");
trie.add("http://*/", "A website!");

trie.collect("http://www.nodejs.org/"); // => [ "A .org website", "A website!" ]
trie.collect("ftp://ftp.cdrom.com/");   // => []
trie.collect("http://t.co/");           // => [ "A 2-character TLD website!", "A website!" ]

trie.remove("http://*/", "A website!");
trie.collect("http://t.co/");           // => [ "A 2-character TLD website!" ]

Performance

Using the brute-perf.js and perf.js scripts, performance can be compared.

For the GlobTrie implementation:

$ time node perf.js
Total Expressions: 9720
Total Operations: 10000
Total Found: 76000
Effective Operations: 97200000

real	0m0.618s

For the array of regular expressions implementation:

$ time node brute-perf.js
Total Expressions: 9720
Total Operations: 10000
Total Found: 76000
Effective Operations: 97200000

real	0m54.990s

Big difference! Note that because it's a logarithmic algorithm, the difference between the two will get wider and wider as the size of the expression list grows.

About

High performance pattern matching on large sets of patterns in node.js

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published