This is @andrew-d playing around with calling Rust's excellent regex
crate from Go... without paying overhead of cgo on every function call.
As mentioned in some work done by @BurntSushi (here), the cgo function
overhead can eat up the gains you get from using the regex engine.
First off, a huge thanks to Filippo Valsorda (@FiloSottile) for his blog post and work on ed25519-dalek-rustgo; that project was used as a starting point for this one.
The essentials behind how this works are very similar to those described in the above blog post, with some modifications detailed below:
- We don't use the
no_std
feature in Rust, since theregex
library doesn't support it. - I've wrapped the underlying Rust functions in a simple Go wrapper, exposing an API that requires no knowledge of the fact that it's using Rust under the hood. Note that this is still a bit of a work-in-progress 😃
- I've switched from using
go tool
in the Makefile to generating.syso
files, which Go will properly include when runninggo build
. - For the general
Regex
type, we allocate new thread stacks withmmap
and swap to these before calling our Rust code. This lets us use arbitrarily-large stacks for Rust without needing to worry about Go's split stacks, and intelligently pools the stacks that are in-use. As a slight bonus, using our own stacks also lets us mark the assembly function asNOSPLIT
, which removes the prelude that checks for stack sizes, saving us a couple instructions :-) - I've also added a
STRegex
type, which preallocates a single stack for use by the object; this is faster, since we avoid having to pay the overhead ofsync.Pool
, but means that each regex involves allocating a full stack's worth of memory (currently, stacks are 3MiB). This also means that theSTRegex
cannot be used concurrently, since every single method on the object reuses the same stack.
For now, this only works on OS X and Linux, and only on x86_64 platforms (since
it requires some assembly glue per-architecture and per-calling convention).
If you're on one of these platforms, great! After cloning the repository, you
can run make bench
in order to run a quick non-scientific benchmark
demonstrating the speedup. On my reasonably-modern Linux desktop, I see
the following results:
BenchmarkGoRegexp 1000000 1466 ns/op
BenchmarkRustRegexp 5000000 255 ns/op
BenchmarkSTRustRegexp 10000000 198 ns/op