This is not the LAZ/LASzip LiDAR format. This is a standalone general-purpose lossless compressor.
LAZ reconstructs the essence of data storage using a "first principles" approach. Through literal casting and physical bit-width stripping techniques, it achieves a lossless compression algorithm that outperforms traditional algorithms (such as Zstandard and LZ4) in specific industrial data domains (sparse data, structured logs, binary vectors). It differs from general compression schemes that rely on probabilistic statistical models (Huffman/ANS) or sliding window dictionaries (LZ77).
The core of the algorithm lies in converting fixed-width (typically 8-bit) data into a variable-length sequence of minimal significant bits.
Standard Parsing Example:
- Original File Read: Read 8-bit physical data
00110010from the buffer. - Visual and Normative Extraction: Under the current ASCII standard, this byte maps to the character
'2', with a hexadecimal value of0x32. - Traditional Redundancy Analysis: The high bits
0x3in0x32serve solely as a type identifier within the ASCII standard (indicating it's a numeric character). - Literal Casting Operation: Ignore the high-order type identifier and directly extract its core literal numerical value,
0x2. - Binary Dimensionality Reduction Storage: Convert
0x2into its minimal binary sequence without leading zeros:10(occupying 2 bits).
The LAZ algorithm introduces a floor bandwidth parameter (default
-
Rule: If the absolute value of the data, when converted to binary, has a length less than
$B_f$ , it is padded with0s in the high bits until it equals$B_f$ . If the length is$\ge B_f$ , it remains as is, without any truncation. -
Example: When
$B_f = 2$ , the decimal value0converts to binary0(length 1), which is forcefully padded to00(length 2); the value6converts to binary110(length 3) and is preserved as is.
Problem Statement: After literal casting, both ASCII '2' (0x32) and original hexadecimal data 0x02 will be reduced to the binary 10. This leads to ambiguity during decoding when restoring the original values.
Solution: Introduce a Type_Profile field in the file header to declare the data semantics environment of the current data block.
- Profile 0x01 (Numeric ASCII): Declares this data block as numeric ASCII text. Upon extracting
10(i.e.,0x2), the decoder performs a bitwise OR operation0x2 | 0x30, restoring it to0x32('2'). - Profile 0x02 (Hex ASCII): Declares this data block as hexadecimal ASCII text (e.g., log dumps). The decoder, based on the numerical range, restores it to
0x30~0x39('0'-'9') or0x41~0x46('A'-'F'). - Profile 0x00 (Raw Binary): Declares it as a native binary stream. The decoded
10(0x2) is directly padded to0x02(00000010) for output.
LAZ incorporates multiple Profiles optimized for industrial scenarios, automatically matching the best compression strategy:
- Numeric/Hex ASCII: For sensor logs, hexadecimal dumps.
- Sparse 16/32: Specifically designed for machine words with numerous zero prefixes.
- Base64 Bridged: For text-encoded binary streams.
- UTF-8 Mixed: Strips fixed prefix redundancy from multi-byte characters.
- Dual CRC32 Checksum: Independent verification of Payload (for transmission security) and Source (for lossless restoration), ensuring end-to-end data integrity.
- 64KB Adaptive Chunking: Strictly limits memory usage, supporting streaming processing and random access.
- Adaptive Escape Mechanism: For data blocks that yield no positive compression gain, automatically falls back to Raw mode, ensuring an ultra-low inflation rate below 0.1%.
LAZ adheres to a zero-dependency principle, requiring only a C17 standard-compliant compiler and CMake.
cmake -S . -B build -DLAZ_BUILD_TESTS=ON -DLAZ_BUILD_CLI=ON
cmake --build build
ctest --test-dir build --output-on-failure# Compress
./lazctl compress input.log output.laz
# Decompress
./lazctl decompress output.laz restored.log#include "laz/laz.h"LAZ strictly follows byte alignment and little-endian conventions. The format layout is as follows:
- Global Header (64B): Contains the magic number
LAZ, version, dual CRC32, and payload statistics. - Chunk Descriptor Table: Records the Profile, bit-width, and offset for each 64KB chunk.
- Bitstream Zone: Tightly packed pointer stream and data stream.
For detailed format specifications, please refer to docs/FORMAT.md.