Columnar compression for JSON logs. 30-50% smaller than zstd, 20x faster.
cargo install logpacklogpack compress app.jsonl -o app.lpk --chunk-size 50000
logpack decompress app.lpk -o app.jsonl
logpack decompress app.lpk --max-frame-bytes $((1024*1024*1024)) --max-column-bytes $((512*1024*1024))Generic compressors treat logs as opaque bytes. logpack understands the structure: it extracts schemas, splits values into typed columns, detects log message templates, and encodes each data type optimally. The result is significantly better compression on structured JSON logs — the format used by every modern logging system.
All benchmarks are lossless roundtrips. Speed: 35-45 MB/s (vs zstd -19 at ~2 MB/s).
| Dataset | Rows | Input | logpack | zstd -19 | vs zstd-19 |
|---|---|---|---|---|---|
| Webserver logs | 500K | 128 MB | 47.4x | 22.1x | 53% smaller |
| Mixed service logs | 100K | 12 MB | 30.1x | 16.8x | 44% smaller |
| Logs with UUIDs | 500K | 158 MB | 24.2x | 16.8x | 30% smaller |
Real-World Logs (LogHub)
| Dataset | Rows | logpack | zstd -19 | vs zstd-19 |
|---|---|---|---|---|
| OpenStack | 100K | 10.4x | 9.9x | 5% smaller |
| Spark | 2K | 32.0x | 29.7x | 6% smaller |
| HPC | 2K | 16.6x | 15.7x | 5% smaller |
| Zookeeper | 2K | 22.8x | 22.0x | 3% smaller |
| BGL | 2K | 11.5x | 11.4x | 1% smaller |
| HDFS | 100K | 9.5x | 10.3x | 8% larger |
| Linux syslog | 100K | 8.8x | 11.7x | 33% larger |
logpack wins on structured data. On text-heavy logs with high-cardinality fields (60K unique PIDs, non-standard timestamps), zstd's sliding window has an edge.
use logpack::{compress, decompress, decompress_with_options, CompressOptions, DecompressOptions};
use std::fs::File;
use std::io::BufWriter;
let input = File::open("app.jsonl")?;
let output = BufWriter::new(File::create("app.lpk")?);
let compress_opts = CompressOptions { chunk_size: 50_000, ..Default::default() };
compress(input, output, &compress_opts)?;
let input = File::open("app.lpk")?;
let output = BufWriter::new(File::create("app.jsonl")?);
decompress(input, output)?;
let input = File::open("app.lpk")?;
let output = BufWriter::new(File::create("trusted.jsonl")?);
let mut decompress_opts = DecompressOptions::default();
decompress_opts.max_frame_bytes = 1024 * 1024 * 1024;
decompress_opts.max_column_bytes = 512 * 1024 * 1024;
decompress_with_options(input, output, &decompress_opts)?;- Schema extraction -- JSON keys repeat on every line. Store the structure once.
- Columnar split -- Group values by field. All timestamps together, all log levels together.
- Template extraction -- Log messages like
"Request from 10.0.1.5 took 42ms"decompose into a template ("Request from <IP> took <N>ms") plus typed variables (IP as 4 bytes, integer as varint). - Type-aware encoding -- 16 encoding strategies selected per column:
- Timestamps: delta-of-delta with frame-of-reference bit-packing
- Categoricals: dictionary, RLE, or bit-packed depending on cardinality
- Integers: delta encoding or dictionary for low-cardinality
- IPs: 4-byte binary. UUIDs: 16-byte binary. Hex strings: nibble-packed.
- Booleans: 1 bit each
- Adaptive zstd -- Each column compressed independently with tuned zstd levels.
- Streaming -- Processes input in 100K-row chunks. Constant memory regardless of file size.
logpack now rejects suspicious or excessively large frames during decompression instead of allowing unbounded memory growth.
- Default frame limit: 512 MiB decompressed across one frame
- Default column limit: 256 MiB decompressed for one column payload
- Default frame row limit: 1,000,000 rows
For trusted files, you can raise the limits:
logpack decompress app.lpk --max-frame-bytes $((1024*1024*1024)) --max-column-bytes $((512*1024*1024))To avoid hitting these limits on newly written files, reduce compression frame size:
logpack compress app.jsonl -o app.lpk --chunk-size 50000.lpk files use a chunked frame format:
LPAK + version byte
[frame]*
row_count | zstd_dictionary | schema_registry | schema_ids
per column: name | encoding | null_bitmap | compressed_data
end marker (row_count = 0)
Version 4 frames also store exact uncompressed sizes for compressed sections, which lets the decoder validate expected output sizes precisely.
Lossless roundtrip. Key order may differ (JSON objects are unordered by spec).
- JSONL (newline-delimited JSON), one object per line
- Mixed schemas (different objects can have different keys)
- UTF-8
MIT