This repository contains the implementation for the thesis "High-Performance Regular Expression Matching with Parabix and LLVM" which can also be found here.
This project was done a part of TUM Database Implementation practical course.
This repository contains both iterative and LLVM codegen approaches for Parabix, they are located at parabix_cpp and parabix_llvm relatively.
You may also want to check the Parabix compiler (parabix_compiler.cc) that generates a code by LLVM IRBuilder API.
You can find the PDF document here used during the presentation.
Relative files are generator and benchmark.
size/algo | std::regex | parabix-ccp | parabix-llvm |
---|---|---|---|
10MB | 0.22 | 0.12 | 0.016 |
100MB | 2.2 | 1.2 | 0.12 |
500MB | 11 | 6 | 0.6 |
1GB | 23 | 13 | 1.2 |
1.2GB | 25 | 15 | 1.4 |
NOTE: Time to read input data from a file is excluded from the elapsed times. The pattern is a[0-9]*z.
mkdir build
cd build
cmake ../
# generate input file
ninja generator
./generator 1000 ../1gb.txt
# run benchmark
ninja benchmark
./benchmark
# run vgrep
ninja vgrep_llvm
./vgrep_llvm ../1gb.txt "a[0-9]*z"