Skip to content
/ regex Public
forked from xysun/regex

Regular expression engine in Python using Thompson's algorithm.

Notifications You must be signed in to change notification settings

dchllngr/regex

 
 

Repository files navigation

A regex engine in Python following Thompson's Algorithm. This will perform significantly better than the backtracking approach implemented in Python's re module on some pathological patterns.

I wrote a blog post about this project here

It has the same interface as Python's re module:

import regex
n = 20
p = 'a?' * n + 'a' * n
nfa = regex.compile(p)
input_string = 'a' * n
matched = nfa.match(input_string)
print(matched) # True

Currently it supports the following:

  • Repetition operators: * + ?
  • Parenthesis
  • Characters (no character sets)

regex.py is the interface, parse.py holds main implementation logic, sample.py shows some samples.

You can run python3 testing.py -v to ensure it passes all test cases in test_suite.dat

Performance

This regex engine underperforms Python's re module on normal inputs (using Glenn Fowler's test suite -- see below), however it outperforms significantly on pathological inputs.

normal pathological

Credits

  • Test suite is based on Glenn Fowler's regex test suites.

  • Russ Cox has an excellent collection of articles on regex.

About

Regular expression engine in Python using Thompson's algorithm.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 67.4%
  • Python 32.6%