Skip to content

SIMD accelerated yEnc en/decode C library

Notifications You must be signed in to change notification settings

animetosho/rapidyenc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This C compatible library provides functions for implementing yEnc where speed is important.

Note that it only handles the underlying encoding/decoding routines - yEnc headers aren’t handled.

Features

  • implementation uses x86/ARM/RISC-V SIMD capabilities, with support for ARMv7 NEON, ARMv8 ASIMD or the following x86 SIMD extensions: SSE2, SSSE3, AVX, AVX2, AVX512-BW (128/256-bit), AVX512-VBMI2 (or AVX10.1/256)
  • CPU detection and dynamic dispatch (i.e. select best implementation for currently running CPU)
  • incremental processing, including detection of yEnc/NNTP end sequences in decoder
  • raw yEnc encoding with the ability to specify line length. A single thread can achieve >450MB/s on a Raspberry Pi 3, or >5GB/s on a Core-i series CPU.
  • yEnc decoding, with and without NNTP layer dot unstuffing. A single thread can achieve >300MB/s on a Raspberry Pi 3, or >4.5GB/s on a Core-i series CPU.
  • CRC32 implementation via crcutil or PCLMULQDQ instruction, ARMv8’s CRC instructions, or RISC-V’s Zb(k)c extension (>1GB/s on a low power Atom/ARM CPU, >15GB/s on a modern Intel CPU)
  • ability to combine two CRC32 hashes into one (useful for amalgamating pcrc32s into a crc32 for yEnc), as well as quickly compute the CRC32 of a sequence of null bytes

Building

A build file/project can be created using CMake.

mkdir build
cd build
cmake ..
cmake --build . --config Release

After compilation, a shared and static library should be generated, as well as a benchmark and sample CLI application.

Build Options

The following options can be passed into CMake:

  • BUILD_NATIVE: Optimise for and target only the build host’s CPU; this build may not be re-distributable
  • DISABLE_AVX256: Disable the use of 256-bit AVX instructions on x86 processors
  • DISABLE_CRCUTIL: Disable crcutil usage. crcutil is only ever enabled for x86 builds
  • DISABLE_ENCODE: Remove yEnc encode functionality from build. rapidyenc_encode* functions, except rapidyenc_encode_max_length, will be unavailable
  • DISABLE_DECODE: Remove yEnc decode functionality from build. rapidyenc_decode* functions will be unavailable
  • DISABLE_CRC: Remove CRC32 functionality from build. rapidyenc_crc* functions will be unavailable. Implies DISABLE_CRCUTIL

API

Before any encoding/decoding/CRC functions can be used, the respective _init function must be called. These functions set up the necessary state for computation. Note that _init functions aren’t thread-safe, but all others are.

Functions documented in the header file.

cli.c is a simple command-line application which encodes/decodes stdin to stdout. It demonstrates how to do incremental encoding/decoding/CRC32 using this library.

Other Language Bindings

Algorithm

A brief description of how the SIMD yEnc encoding algorithm works can be found here. I may eventually write up something more detailed, regarding optimizations and such used.

License

This module is Public Domain or CC0 (or equivalent) if PD isn’t recognised.

crcutil, used for CRC32 calculation, is licensed under the Apache License 2.0

zlib-ng, from where the CRC32 calculation using folding approach was stolen, is under a zlib license