fastlex is a tiny C++23 header-only library for fast ASCII character
classification and ASCII case conversion.
It is built for lexer, parser, tokenizer, and protocol hot paths where the input
is bytes and the grammar is ASCII. In those places, the usual cctype functions
often do more work than needed: they accept int, have locale-shaped semantics,
and are not designed around compile-time custom character classes.
fastlex keeps the domain explicit:
input byte: unsigned char
domain: 0..255
semantics: ASCII only
tables: generated at compile time
dispatch: branchless
Include the library:
#include <fastlex/ascii.hpp>Use the cctype-style API for explicit ASCII classification:
fastlex::ascii::isalpha('A');
fastlex::ascii::isdigit('7');
fastlex::ascii::isspace('\n');
fastlex::ascii::isxdigit('f');
fastlex::ascii::tolower('Q'); // returns unsigned char{'q'}The full ASCII cctype-style set is available:
isalnum isalpha isblank iscntrl
isascii isdigit isgraph islower
isprint ispunct isspace isupper
isxdigit tolower toupper
The functions live in fastlex::ascii and intentionally take unsigned char.
If your input is a char, cast once at the boundary of your scanner:
auto const byte = static_cast<unsigned char>(input[pos]);
if(fastlex::ascii::isalpha(byte)) {
// ...
}Build your own compile-time character classes with custom:
using namespace std::literals;
constexpr auto separators = fastlex::ascii::custom(",:;|"sv);
static_assert(separators(','));
static_assert(!separators('x'));The argument is a std::string_view containing the accepted byte values.
custom is consteval, so the matcher table is built during compilation and
the returned object is a tiny constexpr callable.
Longer classes are just longer strings:
constexpr auto ident_start =
fastlex::ascii::custom("ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"_"sv);Repeated characters are harmless. A character class is a set, so putting the same byte into the string twice sets the same table entry twice and does not change the result:
constexpr auto digits = fastlex::ascii::custom("00112233445566778899"sv);
static_assert(digits('7'));
static_assert(!digits('x'));Custom matchers and most built-in predicates can use one of two backends:
enum class matcher_backend : std::uint8_t {
BITMAP,
LOOKUP,
};BITMAP is the default for custom:
constexpr auto compact =
fastlex::ascii::custom<fastlex::ascii::matcher_backend::BITMAP>("0123456789"sv);It stores the whole 256-byte domain in four std::uint64_t words, so the table
is only 32 bytes:
return (table[x >> 6] & (std::uint64_t{1} << (x & 63u))) != 0u;The bitmap backend is compact and usually a great fit for small, simple classes.
The cost per query is a word load, a shift, an and, and a comparison.
LOOKUP stores one byte per possible input value:
constexpr auto direct =
fastlex::ascii::custom<fastlex::ascii::matcher_backend::LOOKUP>("0123456789"sv);The lookup table is 256 bytes and contains 0 or 1 for each byte:
return lookup[x] != 0u;The lookup backend uses more memory, but the query is just an indexed byte load and a comparison. It can win for larger, compound, or irregular classes where avoiding bitmap arithmetic matters more than the extra table size.
As a rule of thumb:
BITMAP smaller table, excellent for compact/simple classes
LOOKUP larger table, often faster for compound/irregular classes
The best choice is workload and compiler dependent, so the built-in predicates choose defaults from benchmarks, and callers can override those defaults.
The built-ins are implemented with the same matcher machinery that powers
custom. Each predicate owns a C++23 local static constexpr matcher, so the
class definition stays next to the function while the table is still generated
at compile time.
For example, isdigit is conceptually:
template<fastlex::ascii::matcher_backend BACKEND = fastlex::ascii::matcher_backend::BITMAP>
constexpr auto isdigit(unsigned char x) noexcept -> bool
{
using namespace std::literals;
static constexpr auto matcher = detail::make_matcher<BACKEND>("0123456789"sv);
return matcher(x);
}Predicates can be instantiated with an explicit backend:
fastlex::ascii::isdigit<fastlex::ascii::matcher_backend::BITMAP>('7');
fastlex::ascii::isalnum<fastlex::ascii::matcher_backend::LOOKUP>('A');Current defaults:
LOOKUP isalnum isalpha isblank iscntrl ispunct isspace isxdigit
BITMAP isdigit isgraph islower isprint isupper
isascii does not need a table:
return x <= 0x7f;tolower and toupper use direct 256-entry byte maps:
return lower_map[x];
return upper_map[x];That makes case conversion branchless for every byte. Bytes outside A-Z for
tolower and outside a-z for toupper map to themselves.
Every character class starts as a std::string_view of accepted bytes.
make_bitmap turns that string into a 32-byte bitmap:
char_bitmap table{};
for(unsigned char c : chars) {
table[c >> 6] |= std::uint64_t{1} << (c & 63u);
}If the requested backend is BITMAP, that bitmap becomes the matcher storage.
If the requested backend is LOOKUP, make_lookup expands the bitmap into a
256-byte 0/1 table.
The important detail is that all of this happens at compile time:
custom("..."sv)
-> consteval make_bitmap(...)
-> optional consteval make_lookup(...)
-> constexpr matcher object
-> branchless operator()(unsigned char)
There are three table shapes:
bitmap matcher 4 x uint64_t 32 bytes
lookup matcher 256 x unsigned char 256 bytes
case map 256 x unsigned char 256 bytes
fastlex is deliberately not locale-aware. For example, isalpha means only:
A-Z
a-z
Bytes 0x80..0xff are not treated as alphabetic, printable, punctuation, or
space. Case conversion only changes bytes in A-Z or a-z.
isxdigit follows cctype:
0-9
A-F
a-f
Use the exported interface target:
find_package(fastlex CONFIG REQUIRED)
target_link_libraries(my_target PRIVATE fastlex::fastlex)From this repository:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DFASTLEX_BUILD_TESTS=ON -DFASTLEX_BUILD_BENCHMARKS=ON
cmake --build build
ctest --test-dir build --output-on-failureCMake options:
FASTLEX_BUILD_TESTS Build gtest coverage
FASTLEX_BUILD_BENCHMARKS Build Google Benchmark executables
FASTLEX_BUILD_EXAMPLES Build the small example program
Benchmarks live in benchmark/ascii_benchmark.cpp and compare:
fastlex branchless predicates
branch-based ASCII predicates
std::cctype predicates
Run them after building:
./build/benchmark/fastlex_ascii_benchmarkThese numbers were measured with:
CPU: AMD Ryzen 9 7950X, 16 cores / 32 threads
Caches: L1d 32 KiB x16, L2 1 MiB x16, L3 32 MiB x2
OS: NixOS, Linux 7.0.6, x86_64
Compiler: clang++ 22.1.5
Command: ./build/benchmark/fastlex_ascii_benchmark --benchmark_min_time=0.05s --benchmark_repetitions=5 --benchmark_report_aggregates_only=true
The table reports input throughput, not total memory traffic. Each benchmark processes a fixed 4096-byte input buffer and calls:
state.SetBytesProcessed(state.iterations() * data.size());So Google Benchmark reports real byte throughput with bytes_per_second.
The benchmark also prints time_per_byte, which is the inverse throughput for a
single input byte.
Predicate fastlex branch std::cctype
isalnum 5.01 GB/s 1.94 GB/s 547 MB/s
isalpha 4.98 GB/s 2.99 GB/s 547 MB/s
isblank 3.97 GB/s 3.97 GB/s 498 MB/s
iscntrl 5.02 GB/s 3.31 GB/s 547 MB/s
isdigit 3.90 GB/s 3.26 GB/s 3.26 GB/s
isgraph 3.90 GB/s 3.26 GB/s 547 MB/s
islower 3.92 GB/s 3.27 GB/s 607 MB/s
isprint 3.82 GB/s 3.25 GB/s 546 MB/s
ispunct 4.99 GB/s 1.83 GB/s 545 MB/s
isspace 4.99 GB/s 3.03 GB/s 545 MB/s
isupper 3.90 GB/s 3.26 GB/s 547 MB/s
isxdigit 5.00 GB/s 1.94 GB/s 547 MB/s
tolower 7.01 GB/s 3.27 GB/s 497 MB/s
toupper 7.01 GB/s 3.26 GB/s 495 MB/s
The same run also reports per-byte latency:
Predicate fastlex branch std::cctype
isalnum 198 ps 515 ps 1.82 ns
isalpha 198 ps 332 ps 1.81 ns
isblank 250 ps 250 ps 2.01 ns
iscntrl 199 ps 301 ps 1.82 ns
isdigit 255 ps 305 ps 304 ps
isgraph 254 ps 305 ps 1.82 ns
islower 254 ps 305 ps 1.82 ns
isprint 255 ps 308 ps 1.82 ns
ispunct 200 ps 597 ps 2.00 ns
isspace 202 ps 327 ps 1.64 ns
isupper 254 ps 306 ps 1.81 ns
isxdigit 198 ps 515 ps 1.82 ns
tolower 142 ps 305 ps 2.00 ns
toupper 142 ps 304 ps 2.01 ns
Enter the development shell:
nix developBuild and run checks through the flake:
nix build
nix flake check