logpack

Columnar compression for JSON logs. 30-50% smaller than zstd, 20x faster.

cargo install logpack

logpack compress app.jsonl -o app.lpk --chunk-size 50000
logpack decompress app.lpk -o app.jsonl
logpack decompress app.lpk --max-frame-bytes $((1024*1024*1024)) --max-column-bytes $((512*1024*1024))

Why

Generic compressors treat logs as opaque bytes. logpack understands the structure: it extracts schemas, splits values into typed columns, detects log message templates, and encodes each data type optimally. The result is significantly better compression on structured JSON logs — the format used by every modern logging system.

Benchmarks

All benchmarks are lossless roundtrips. Speed: 35-45 MB/s (vs zstd -19 at ~2 MB/s).

Structured JSON Logs

Dataset	Rows	Input	logpack	zstd -19	vs zstd-19
Webserver logs	500K	128 MB	47.4x	22.1x	53% smaller
Mixed service logs	100K	12 MB	30.1x	16.8x	44% smaller
Logs with UUIDs	500K	158 MB	24.2x	16.8x	30% smaller

Real-World Logs (LogHub)

Dataset	Rows	logpack	zstd -19	vs zstd-19
OpenStack	100K	10.4x	9.9x	5% smaller
Spark	2K	32.0x	29.7x	6% smaller
HPC	2K	16.6x	15.7x	5% smaller
Zookeeper	2K	22.8x	22.0x	3% smaller
BGL	2K	11.5x	11.4x	1% smaller
HDFS	100K	9.5x	10.3x	8% larger
Linux syslog	100K	8.8x	11.7x	33% larger

logpack wins on structured data. On text-heavy logs with high-cardinality fields (60K unique PIDs, non-standard timestamps), zstd's sliding window has an edge.

Library Usage

use logpack::{compress, decompress, decompress_with_options, CompressOptions, DecompressOptions};
use std::fs::File;
use std::io::BufWriter;

let input = File::open("app.jsonl")?;
let output = BufWriter::new(File::create("app.lpk")?);
let compress_opts = CompressOptions { chunk_size: 50_000, ..Default::default() };
compress(input, output, &compress_opts)?;

let input = File::open("app.lpk")?;
let output = BufWriter::new(File::create("app.jsonl")?);
decompress(input, output)?;

let input = File::open("app.lpk")?;
let output = BufWriter::new(File::create("trusted.jsonl")?);
let mut decompress_opts = DecompressOptions::default();
decompress_opts.max_frame_bytes = 1024 * 1024 * 1024;
decompress_opts.max_column_bytes = 512 * 1024 * 1024;
decompress_with_options(input, output, &decompress_opts)?;

How It Works

Schema extraction -- JSON keys repeat on every line. Store the structure once.
Columnar split -- Group values by field. All timestamps together, all log levels together.
Template extraction -- Log messages like "Request from 10.0.1.5 took 42ms" decompose into a template ("Request from <IP> took <N>ms") plus typed variables (IP as 4 bytes, integer as varint).
Type-aware encoding -- 16 encoding strategies selected per column:
- Timestamps: delta-of-delta with frame-of-reference bit-packing
- Categoricals: dictionary, RLE, or bit-packed depending on cardinality
- Integers: delta encoding or dictionary for low-cardinality
- IPs: 4-byte binary. UUIDs: 16-byte binary. Hex strings: nibble-packed.
- Booleans: 1 bit each
Adaptive zstd -- Each column compressed independently with tuned zstd levels.
Streaming -- Processes input in 100K-row chunks. Constant memory regardless of file size.

Safety Limits

logpack now rejects suspicious or excessively large frames during decompression instead of allowing unbounded memory growth.

Default frame limit: 512 MiB decompressed across one frame
Default column limit: 256 MiB decompressed for one column payload
Default frame row limit: 1,000,000 rows

For trusted files, you can raise the limits:

logpack decompress app.lpk --max-frame-bytes $((1024*1024*1024)) --max-column-bytes $((512*1024*1024))

To avoid hitting these limits on newly written files, reduce compression frame size:

logpack compress app.jsonl -o app.lpk --chunk-size 50000

Format

.lpk files use a chunked frame format:

LPAK + version byte
[frame]*
  row_count | zstd_dictionary | schema_registry | schema_ids
  per column: name | encoding | null_bitmap | compressed_data
end marker (row_count = 0)

Version 4 frames also store exact uncompressed sizes for compressed sections, which lets the decoder validate expected output sizes precisely.

Lossless roundtrip. Key order may differ (JSON objects are unordered by spec).

Supported Input

JSONL (newline-delimited JSON), one object per line
Mixed schemas (different objects can have different keys)
UTF-8

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

logpack

Why

Benchmarks

Structured JSON Logs

Real-World Logs (LogHub)

Library Usage

How It Works

Safety Limits

Format

Supported Input

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

logpack

Why

Benchmarks

Structured JSON Logs

Real-World Logs (LogHub)

Library Usage

How It Works

Safety Limits

Format

Supported Input

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages