Skip to content

Fast text file analyzer with memory-mapped I/O and SIMD. Counts lines, detects encoding, finds delimiters. ~1.4 GB/s throughput.

License

Notifications You must be signed in to change notification settings

AGDNoob/FastLog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastLog

Text file analyzer. Counts lines, bytes, delimiters, detects encoding. Runs at ~1.4 GB/s on NVMe.

$ fastlog server.log

file=server.log
lines=1,000,001
bytes=89.65 MB
empty_lines=0.0%
avg_line_length=93.0
max_line_length=96
encoding=UTF-8 (BOM)
line_ending=LF
top_delimiters: '=' ',' ':'
ascii_ratio=100.00%

Installation

Windows: Run installer from dist/. Done. No dependencies.

From source:

cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
cmake --build build

Requires C++17 and CMake 3.16+.

Usage

fastlog <file>              # Analyze single file
fastlog <directory>         # Analyze all text files recursively
fastlog <directory> --flat  # Analyze directory without subdirectories
fastlog -h                  # Show help
fastlog -v                  # Show version
fastlog --update            # Check for updates

Options

Option Description
-h, --help Show usage information
-v, --version Show version number
--update Check GitHub for new releases
--flat Disable recursive directory scanning

How it works

Memory-mapped I/O

No fread(), no std::ifstream. The file gets mapped directly into address space:

// Windows
CreateFileW(..., FILE_FLAG_SEQUENTIAL_SCAN, ...);
CreateFileMappingW(...);
MapViewOfFile(...);

// Linux
mmap(..., MAP_PRIVATE, ...);
madvise(..., MADV_SEQUENTIAL | MADV_WILLNEED);

OS handles prefetching, no copy operations needed.

SIMD character counting

Instead of checking bytes one by one, we process 32 bytes at a time with AVX2:

__m256i v = _mm256_loadu_si256(data + i);
__m256i cmp = _mm256_cmpeq_epi8(v, vnewline);
// cmp is 0xFF for matches, 0x00 otherwise
acc_nl = _mm256_sub_epi8(acc_nl, cmp);

After 255 iterations (8-bit overflow), horizontal sum:

__m256i sad = _mm256_sad_epu8(acc, _mm256_setzero_si256());
// 4x 64-bit sums, add them up

SSE2 fallback for older CPUs. Every x86-64 has SSE2.

Runtime dispatch

inline bool cpu_has_avx2() noexcept {
    static int cached = -1;
    if (cached >= 0) return cached;
    
#ifdef _WIN32
    int info[4];
    __cpuid(info, 0);
    if (info[0] >= 7) {
        __cpuidex(info, 7, 0);
        cached = (info[1] & (1 << 5)) ? 1 : 0;
    }
#else
    cached = __builtin_cpu_supports("avx2") ? 1 : 0;
#endif
    return cached;
}

Functions use __attribute__((target("avx2"))) / target("sse2") so GCC compiles both versions into one binary.

Threading

For files >4 MB:

  1. SIMD counts all characters (single-threaded, I/O bound anyway)
  2. Find all newline positions
  3. Calculate line lengths in parallel
std::atomic<size_t> next_chunk{0};

auto worker = [&]() {
    while (true) {
        size_t idx = next_chunk.fetch_add(1, std::memory_order_relaxed);
        if (idx >= num_chunks) break;
        // process chunk
    }
};

for (int i = 1; i < num_threads; i++)
    threads.emplace_back(worker);
worker();  // main thread works too
for (auto& t : threads) t.join();

Work stealing via atomics. No fixed chunk assignment, threads grab work until done.

Directory scanning

std::filesystem is slow. Multiple syscalls per file. On Windows we use:

FindFirstFileExW(
    path,
    FindExInfoBasic,           // skip 8.3 name lookup
    &find_data,
    FindExSearchNameMatch,
    nullptr,
    FIND_FIRST_EX_LARGE_FETCH  // batch fetching
);

Matters when you have 5000+ files.

What gets measured

Metric How Accuracy
lines count \n, +1 if file doesn't end with \n exact
bytes file size from OS exact
empty_lines lines with length 0 after CRLF strip exact
avg_line_length sum / count exact
max_line_length max over all lines exact
encoding BOM detection, else ASCII/UTF-8 guess ~95%, can't detect Latin-1
line_ending CRLF if \r before \n, else LF exact
top_delimiters count , ; : = \t | space exact counts
ascii_ratio ascii_chars / total * 100 exact

Performance

Tested on Kingston NV1 500GB (1812 MB/s sequential read):

File Time Throughput
90 MB 78 ms 1.15 GB/s
448 MB 320 ms 1.40 GB/s

That's 77% of raw SSD speed. Rest is NTFS overhead, page faults, actual processing.

Files

src/
├── main.cpp       CLI, argument parsing, output formatting
├── analyzer.cpp   memory mapping, threading, directory scanning
├── analyzer.hpp   public API
├── stats.hpp      data structures (ChunkStats, FileStats, etc.)
└── simd.hpp       AVX2/SSE2 character counting

Requirements

  • C++17
  • CMake 3.16+
  • x86-64 (SSE2 minimum, AVX2 for full speed)
  • Windows or Linux (both tested)

License

MIT


Built this because I got tired of waiting for slow log parsers. Turns out when you stop abstracting everything and just let the CPU do what it's good at, things get fast. Who knew.

— AGDNoob

About

Fast text file analyzer with memory-mapped I/O and SIMD. Counts lines, detects encoding, finds delimiters. ~1.4 GB/s throughput.

Topics

Resources

License

Stars

Watchers

Forks