Modern nogil
Python bindings for the Pcre2 library with stdlib.re
api compatibility.
- 10/12/2025 0.2.3: 🤗 Full
GIL=0
compliance for Python >= 3.13T. Reduced cache thread contention. Improved performance for all api. Expanded ci testing coverage. FreeBSD, Solaris, and Windows compatibility validated. - 10/09/2025 0.1.0: 🎉 First release. Thread safe, auto JIT, auto pattern caching and optimistic linking to system library for fast install.
PyPcre is a modern Pcre2 binding designed to be both super fast and thread-safe in the GIL=0
world. In the old days of global interpreter locks, Python had real threads but mostly fake concurrency (with the exception of some low-level apis and packages). In 2025, Python is moving toward full GIl=0
design which will unlock true multi-threaded concurrency and finally bring Python in parity with other modern languages.
Many Python regular expression packages will either out-right segfault due to safety under GIL=0
or suffer sub-optimal performance due to non-threaded design mindset.
PyPcre is fully ci tested where every single api and Pcre2 flag is tested in a continuous development environment backed by the ModelCloud.AI team. Fuzz (clobber) tests are also performed to catch any memory safety, accuracy, or memory leak regressions.
Safety first: PyPcre will optimistically link to the os provided libpcre2
package for maximum safetey since PyPcre will automatically enjoy upstream security patches. You can force full source compile via PYPCRE_BUILD_FROM_SOURCE=1
env toggle.
pip install PyPcre
The package prioritizes linking against the libpcre2-8
shared library in system for fast install and max security protection which gets latest patches from OS. See Building for manual build details.
Linux
, MacOS
, Windows
, WSL
, FreeBSD
, Solaris
If you already rely on the standard library re
, migrating is as
simple as changing your import:
import pcre as re
The module-level entry points (match
, search
, fullmatch
, findall
,
finditer
, split
, sub
, subn
, compile
, escape
, purge
) expose the
same call signatures as their re
counterparts, making existing code work
unchanged. Every standard flag with a PCRE2 equivalent—IGNORECASE
,
MULTILINE
, DOTALL
, VERBOSE
, ASCII
, and friends—is supported via the
re-exported constants and the pcre.Flag
enum.
from pcre import match, search, findall, compile, Flag
if match(r"(?P<word>\\w+)", "hello world"):
print("found word")
pattern = compile(rb"\d+", flags=Flag.MULTILINE)
numbers = pattern.findall(b"line 1\nline 22")
pcre
mirrors the core helpers from Python’s standard library re
module
match
, search
, fullmatch
, finditer
, findall
, and compile
while
exposing PCRE2’s extended flag set through the Pythonic Flag
enum
(Flag.CASELESS
, Flag.MULTILINE
, Flag.UTF
, ...).
- Module-level helpers and the
Pattern
class follow the same call shapes as the standard libraryre
module, includingpos
,endpos
, andflags
behaviour. Pattern
mirrorsre.Pattern
attributes like.pattern
,.groupindex
, and.groups
, whileMatch
objects surface the familiar.re
,.string
,.pos
,.endpos
,.lastindex
,.lastgroup
,.regs
, and.expand()
API.- Substitution helpers enforce the same type rules as the standard library
re
module: string patterns require string replacements, byte patterns require bytes-like replacements, and callable replacements receive the wrappedMatch
. compile()
accepts nativeFlag
values as well as compatiblere.RegexFlag
members from the standard library. Supported stdlib flags map 1:1 to PCRE2 options (IGNORECASE→CASELESS
,MULTILINE→MULTILINE
,DOTALL→DOTALL
,VERBOSE→EXTENDED
); passing unsupported stdlib flags raises a compatibilityValueError
to prevent silent divergences.pcre.escape()
delegates directly tore.escape
for byte and text patterns so escaping semantics remain identical.
The regex
package interprets
\uXXXX
and \UXXXXXXXX
escapes as UTF-8 code points, while PCRE2 expects
hexadecimal escapes to use the \x{...}
form. Enable Flag.COMPAT_UNICODE_ESCAPE
to
translate those escapes automatically when compiling patterns:
from pcre import compile, Flag
pattern = compile(r"\\U0001F600", flags=Flag.COMPAT_UNICODE_ESCAPE)
assert pattern.pattern == r"\\x{0001F600}"
Set the default behaviour globally with pcre.configure(compat_regex=True)
so that subsequent calls to compile()
and the module-level helpers apply
the conversion without repeating the flag.
pcre.compile()
caches the final Pattern
wrapper for up to 128
unique (pattern, flags)
pairs when the pattern object is hashable. By default
the cache is thread-local, keeping per-thread LRU stores so workers do not
contend with one another. Adjust the capacity with pcre.set_cache_limit(n)
—pass
0
to disable caching completely or None
for an unlimited cache—and check the
current limit with pcre.get_cache_limit()
. The cache can be emptied at any time
with pcre.clear_cache()
.
Applications that prefer the historic global cache can opt back in before any
compilation takes place by setting PYPCRE_CACHE_PATTERN_GLOBAL=1
in the
environment before importing pcre
. Runtime switching is no longer
supported; altering the value after patterns have been compiled raises
RuntimeError
.
String patterns follow the same defaults as Python’s re
module,
automatically enabling the Flag.UTF
and Flag.UCP
options so Unicode
pattern and character semantics “just work.” Byte patterns remain raw by
default—neither option is activated—so you retain full control over
binary-oriented matching. Explicitly set Flag.NO_UTF
/Flag.NO_UCP
if you
need to opt out for strings, or add the UTF/UCP flags yourself when compiling
bytes.
compile()
accepts either a pattern literal or an existingPattern
instance, making it easy to mix compiled objects with the convenience helpers.Pattern.match/search/fullmatch/finditer/findall
accept optionalpos
,endpos
, andoptions
arguments, mirroring the standard libraryre
module while letting you thread PCRE2 execution flags through individual calls.
pcre.parallel_map()
fans out work across a shared thread pool formatch
,search
,fullmatch
, andfindall
. The helper preserves the order of the provided subjects and returns the same result objects you’d normally receive from thePattern
methods.- The threaded backend activates only on machines with at least eight CPU cores; otherwise execution falls back to the sequential path regardless of flags or configuration.
- Threading is opt-in by default when Python runs without the GIL
(e.g. Python with
-X gil=0
orPYTHON_GIL=0
). When the GIL is active the default falls back to sequential execution to avoid needless overhead. - With auto threading enabled (
configure_threads(enabled=True)
), the pool is only engaged when at least one subject is larger than the configured threshold (60 kB by default). Smaller jobs run sequentially to avoid the cost of thread hand-offs; adjust the boundary viaconfigure_threads(threshold=...)
. - Use
Flag.THREADS
to force threaded execution for a specific pattern orFlag.NO_THREADS
to lock it to sequential mode regardless of global settings. pcre.configure_thread_pool(max_workers=...)
controls the size of the shared executor (capped to half the available CPUs); call it withpreload=True
to spin the pool up eagerly, andshutdown_thread_pool()
to tear it down manually if needed.
-
Precompile for hot loops. The module-level helpers mirror the
re
API and route through the shared compilation cache, but the extra call plumbing still adds overhead. With a simple pattern like"fo"
, using the low-levelpcre_ext_c.Pattern
directly costs ~0.60 µs per call, whereas the high-levelpcre.match()
helper lands at ~4.4 µs per call under the same workload. For sustained loops, create aPattern
object once and reuse it. -
Benchmark toggles. The extension defaults to the fastest safe configuration, but you can flip individual knobs back to the legacy behaviour by setting environment variables before importing
pcre
:Env var Effect (per-call, pattern.match("fo")
)(baseline) 0.60 µs PYPCRE_DISABLE_CONTEXT_CACHE=1
0.60 µs PYPCRE_FORCE_JIT_LOCK=1
0.60 µs pcre.match()
helper4.43 µs The toggles reintroduce the legacy GIL hand-off, per-call match-context allocation, and explicit locks so you can quantify the impact of each optimisation on your workload. Measurements were taken on CPython 3.14 (rc3) with 200 000 evaluations of
pcre_ext_c.compile("fo").match("foobar")
; absolute values will vary by platform, but the relative differences are representative. Leave the variables unset in production to keep the new fast paths active.
Pcre2’s JIT compiler is enabled by default for every compiled pattern. The wrapper exposes two complementary ways to adjust that behaviour:
-
Toggle the global default at runtime with
pcre.configure(jit=False)
to turn JIT off (callpcre.configure(jit=True)
to turn it back on). -
Override the default per pattern using the Python-only flags
Flag.JIT
andFlag.NO_JIT
:from pcre import compile, configure, Flag configure(jit=False) # disable JIT globally baseline = compile(r"expr") # JIT disabled fast = compile(r"expr", flags=Flag.JIT) # force-enable for this pattern slow = compile(r"expr", flags=Flag.NO_JIT) # force-disable for this pattern
pcre.compile()
caches hashable(pattern, flags)
pairs, keeping up to 128 entries per thread by default.- Set
PYPCRE_CACHE_PATTERN_GLOBAL=1
before importingpcre
if you need a shared, process-wide cache instead of isolated thread stores. - Use
pcre.clear_cache()
when you need to free the active cache proactively. - Non-hashable pattern objects skip the cache and are compiled each time.
- String patterns enable
Flag.UTF
andFlag.UCP
automatically so behaviour matchesre
. - Byte patterns keep both flags disabled; opt in manually if Unicode semantics are desired.
- Explicitly supply
Flag.NO_UTF
/Flag.NO_UCP
to override the defaults for strings.
- All top-level helpers (
match
,search
,fullmatch
,finditer
,findall
) defer to the cached compiler. - Compiled
Pattern
objects expose.pattern
,.flags
,.jit
, and.groupindex
for introspection. - Execution helpers accept
pos
,endpos
, andoptions
, allowing you to thread PCRE2 execution flags per call.
- By default PyPcre uses CPython's
PyMem
allocator. - Override the allocator explicitly by setting
PYPCRE_ALLOCATOR
to one ofpymem
,malloc
,jemalloc
, ortcmalloc
before importing the module. The optional allocators are still loaded withdlopen
, so no additional link flags are required when they are absent. - Call
pcre_ext_c.get_allocator()
to inspect which backend is active at runtime.
The extension links against an existing PCRE2 installation (the libpcre2-8
variant). Install the development headers for your platform before building,
for example apt install libpcre2-dev
on Debian/Ubuntu, dnf install pcre2-devel
on Fedora/RHEL derivatives, or brew install pcre2
on macOS.
If the headers or library live in a non-standard location you can export one
or more of the following environment variables prior to invoking the build
(pip install .
, python -m build
, etc.):
PYPCRE_ROOT
PYPCRE_INCLUDE_DIR
PYPCRE_LIBRARY_DIR
PYPCRE_LIBRARY_PATH
(pathsep-separated directories or explicit library files to prioritise when resolvinglibpcre2-8
)PYPCRE_LIBRARIES
PYPCRE_CFLAGS
PYPCRE_LDFLAGS
When pkg-config
is available the build will automatically pick up the
required include and link flags via pkg-config --cflags/--libs libpcre2-8
.
Without pkg-config
, the build script scans common installation prefixes for
Linux distributions (Debian, Ubuntu, Fedora/RHEL/CentOS, openSUSE, Alpine),
FreeBSD, macOS (including Homebrew), and Solaris to locate the headers and
libraries.
If your system ships libpcre2-8
under /usr
but you also maintain a
manually built copy under /usr/local
, export PYPCRE_LIBRARY_PATH
(and, if
needed, a matching PYPCRE_INCLUDE_DIR
) so the build links against the desired
location.