Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Regular Expression Engines
Python's native regular expression engine uses Perl style regular expressions that can not be compiled to fast DFAs. For regular expressions that don't use Perl style extensions (like look-behinds) the re2 library and its Python wrapper can produce code that runs about 100 times faster. Sefaria uses this library for compiling the regular expression that finds book titles in text. If 're2' isn't installed on the system, the code will fall back to using the built in 're' module.
To install the python re2 engine, you will need to install the Python development headers and install a build environment with g++, then install Google's re2, and the python re2 wrapper.
Linux: Use your package manager to install - e.g.
sudo apt-get install python-dev; sudo apt-get install build-essential
Mac OS: You'll likely need to install the Xcode dev tools.
Compile and install Google's re2 code (https://github.com/google/re2/wiki/Install) with the following:
git clone https://code.googlesource.com/re2 cd re2 make test make install make testinstall
Some systems see pthread errors during compilation. (See: https://code.google.com/p/re2/issues/detail?id=100)
If you see this error, change the following in Makefile:
In this case, you can ignore the errors in
Python re2 wrappers
Compile and install pyre2 from the repository (https://github.com/axiak/pyre2). The version installable by PIP is out of date, as of this writing.
git clone git://github.com/axiak/pyre2.git cd pyre2 sudo python setup.py install # If using a virtualenv, don't use sudo