Skip to content

Alex256-core/AdaptiveZip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

AdaptiveZip

Adaptive compression control module for automatic optimization of compression parameters on heterogeneous data.

What this is

AdaptiveZip is not a new compression algorithm.

It is a mathematical control module that dynamically optimizes compression parameters of existing compressors (currently zstd) during execution, based on observed compression behavior.

Its purpose is to automatically approach the locally optimal trade-off between compression ratio and compression speed on real-world, heterogeneous data — without manual tuning.

Why this matters

Most modern compressors expose multiple compression levels that represent different points on the speed–ratio trade-off curve.

In practice, a single global compression level is almost never optimal for an entire file:

  • different regions of the same file often have very different structure;
  • the optimal compression level varies along the data stream;
  • manual tuning is brittle and workload-specific.

As a result, fixed-level compression systematically leaves performance on the table — either wasting time or missing compression potential.

AdaptiveZip addresses this by treating compression as an online optimization problem rather than a static parameter choice.

Core idea

The algorithm observes how compression behavior changes between consecutive blocks and uses this information as a structural signal.

Intuitively:

  • if small changes in data produce large changes in compression behavior, the local structure is unstable and aggressive compression is inefficient;
  • if compression behavior is stable, higher compression levels become computationally justified.

Based on this signal, the compression level is adjusted gradually and deterministically, continuously tracking a near-optimal operating regime.

No pre-analysis, heuristics, or machine learning are used.

What AdaptiveZip optimizes

AdaptiveZip optimizes the trajectory of compression, not a single metric.

In practical terms, this means:

  • higher average compression efficiency on mixed data;
  • reduced time spent in suboptimal compression modes;
  • automatic adaptation to unknown or changing workloads.

The algorithm does not attempt to beat state-of-the-art compressors on their best-case benchmarks. Instead, it reduces the gap between best-case and real-world performance.

How it works (high-level)

  • input data is processed in fixed-size blocks;
  • each block is compressed using an existing backend (zstd);
  • the observed compression ratio is measured;
  • changes in compression behavior between blocks are interpreted as a structural signal;
  • the compression level for subsequent blocks is adjusted accordingly.

The controller is deterministic, lightweight, and operates entirely online.

Current implementation

  • backend: zstd
  • control granularity: block-level
  • adaptation: incremental, bounded adjustments
  • overhead: negligible relative to compression cost

The current implementation serves as a reference prototype for evaluating adaptive compression control.

Build

Requirements:

  • C++17 compatible compiler
  • zstd development library

Build example (Linux/macOS):

g++ adaptivezip.cpp -O3 -lzstd -o adaptivezip

Usage

./adaptivezip input.file output.az

Scope

This project focuses on compression control, not compression primitives.

It intentionally:

  • does not modify internal compression algorithms;
  • does not require workload-specific tuning;
  • does not rely on probabilistic models or training.

Discussion and feedback

Feedback, benchmarks, and technical discussion are welcome.

This project is primarily intended for engineers interested in compression systems, performance optimization, and adaptive algorithms.

About

Adaptive block-level compression controller for mixed data workloads

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages