v0.2.0
New compression techniques
- Unstructured pruning: magnitude or random, with optional gradual cubic schedule
- N:M sparsity: default 2:4, for NVIDIA Ampere sparse tensor cores
- Weight-only INT4/INT8 quantization: group-wise, symmetric or asymmetric
- Low-rank decomposition: truncated SVD for Linear layers, with rank-ratio or energy-threshold selection
- Operator fusion: Conv2d + BatchNorm2d folding via torch.fx
- Weight clustering: per-layer k-means codebook
New analysis tools
cx.analyze_sensitivity(): probes each Conv2d/Linear layer with a prune or noise perturbation, ranks layers by metric drop, and can suggestexclude_layersabove a threshold
163 tests passing (up from 91).