Release v0.2.0 · cachevector/comprexx

New compression techniques

Unstructured pruning: magnitude or random, with optional gradual cubic schedule
N:M sparsity: default 2:4, for NVIDIA Ampere sparse tensor cores
Weight-only INT4/INT8 quantization: group-wise, symmetric or asymmetric
Low-rank decomposition: truncated SVD for Linear layers, with rank-ratio or energy-threshold selection
Operator fusion: Conv2d + BatchNorm2d folding via torch.fx
Weight clustering: per-layer k-means codebook

cx.analyze_sensitivity(): probes each Conv2d/Linear layer with a prune or noise perturbation, ranks layers by metric drop, and can suggest exclude_layers above a threshold

163 tests passing (up from 91).