Skip to content

v0.2.0

Choose a tag to compare

@maskedsyntax maskedsyntax released this 07 Apr 10:07
· 15 commits to master since this release

New compression techniques

  • Unstructured pruning: magnitude or random, with optional gradual cubic schedule
  • N:M sparsity: default 2:4, for NVIDIA Ampere sparse tensor cores
  • Weight-only INT4/INT8 quantization: group-wise, symmetric or asymmetric
  • Low-rank decomposition: truncated SVD for Linear layers, with rank-ratio or energy-threshold selection
  • Operator fusion: Conv2d + BatchNorm2d folding via torch.fx
  • Weight clustering: per-layer k-means codebook

New analysis tools

  • cx.analyze_sensitivity(): probes each Conv2d/Linear layer with a prune or noise perturbation, ranks layers by metric drop, and can suggest exclude_layers above a threshold

163 tests passing (up from 91).