Skip to content

feat: add ASVD (activation-aware SVD) to FC_Decomposer#38

Merged
nathanhubens merged 14 commits into
masterfrom
feature/asvd
Apr 13, 2026
Merged

feat: add ASVD (activation-aware SVD) to FC_Decomposer#38
nathanhubens merged 14 commits into
masterfrom
feature/asvd

Conversation

@nathanhubens
Copy link
Copy Markdown
Collaborator

Summary

Phase 3 of misc module revamp. Adds activation-aware SVD (ASVD) from Yuan et al. 2024.

What it does

Standard SVD treats all input channels equally. ASVD weights channels by their actual activation magnitude before decomposing — channels the model uses heavily are harder to truncate.

Usage

# Standard SVD (existing behavior)
FC_Decomposer().decompose(model, percent_removed=0.5)

# ASVD: pass calibration data for better decomposition
FC_Decomposer().decompose(model, percent_removed=0.5, data=[calibration_batch])

Algorithm

  1. Hook all Linear layers, run calibration data, collect per-channel RMS
  2. Scale weight columns: W_scaled = W * diag(rms + eps)
  3. SVD on W_scaled, truncate to rank k
  4. Undo scaling in first decomposed layer: W2 = W2 / diag(rms + eps)

The scaling cancels out exactly — only the truncation decision changes.

Also includes (from PR #37)

  • energy_threshold for automatic rank selection
  • layers/exclude for per-layer control
  • _collect_activation_rms helper (reusable by Conv_Decomposer)

Test plan

  • ASVD produces finite outputs
  • data=None matches standard SVD behavior
  • All existing tests pass unchanged
  • Full nbdev-test suite passes

Reference: ASVD4LLM

Phase 1 of misc module revamp:
- Remove unused `import F` from bn_folding and fc_decomposer
- Replace cpu_optimizer with optimize_for_cpu (torch.compile backend)
  - Old accelerate_model_for_cpu deprecated with shim
  - Fixed bug: torch.jit.script doesn't use example_input (was dead param)
  - Removed dependency on deprecated optimize_for_mobile
  - Added tests (was skip_exec with zero coverage)
- Add conv_decomposer.ipynb and cpu_optimizer.ipynb to _quarto.yml sidebar
- Add cpu_optimizer to misc/all.py exports
- Fix rank_ratio → percent_removed doc bug in fc_decomposer tutorial
Phase 2 of misc module revamp:

FC_Decomposer + Conv_Decomposer:
- Add energy_threshold: auto rank selection via singular value energy
  retention (e.g., 0.99 keeps 99% of energy). Mutually exclusive with
  percent_removed.
- Add layers/exclude: per-layer control using exact layer names
  (matching Sparsifier dict-based pattern, not regex)
- Shared helpers: _rank_from_energy, _should_decompose

Conv_Decomposer:
- Expose n_iter (default 10, was hardcoded 5) and tol (1e-4) for HOOI
- Early stopping: HOOI exits when factor matrices converge within tol

Traversal refactored from recursive _modules to named_modules() +
parent replacement (cleaner, handles nested modules correctly).

All backward compatible — new params have defaults matching old behavior.
Pass calibration data to get better decomposition — channels with
higher activations are prioritized during SVD truncation.

Algorithm (from Yuan et al., 2024):
1. Collect per-channel activation RMS via forward hooks
2. Scale weight columns: W_scaled = W * diag(rms)
3. SVD on W_scaled → truncate to rank k
4. Undo scaling: W2 = W2 / diag(rms)

The scaling cancels out exactly — only the truncation decision changes.
Backward compatible: data=None gives standard SVD.

Usage:
  FC_Decomposer().decompose(model, 0.5, data=[calibration_batch])
Conv_Decomposer now supports two methods:
- method='tucker' (default): 3 layers — pointwise compress + spatial + pointwise expand
- method='svd' (new): 2 layers — spatial at reduced rank + pointwise expand

SVD reshapes the 4D weight to (C_out, C_in*K*K), applies standard SVD,
then splits into a spatial conv (C_in → R) and pointwise conv (R → C_out).
Simpler, less overhead, better when moderate compression is enough.

Usage:
  Conv_Decomposer().decompose(model, 0.5, method='svd')    # 2 layers
  Conv_Decomposer().decompose(model, 0.5, method='tucker')  # 3 layers (default)
Conv_Decomposer now supports 4 decomposition methods:

| Method    | Layers | Decomposes         | Structure                    |
|-----------|--------|--------------------|------------------------------|
| 'tucker'  | 3      | Both channels      | 1×1 + K×K + 1×1              |
| 'svd'     | 2      | Output channels    | K×K + 1×1                    |
| 'spatial' | 2      | Kernel (K×K→K×1+1×K) | K×1 + 1×K (grouped)        |
| 'cp'      | 4      | Everything         | 1×1 + K×1(dw) + 1×K(dw) + 1×1|

Usage:
  Conv_Decomposer().decompose(model, 0.5, method='spatial')  # K×K → K×1 + 1×K
  Conv_Decomposer().decompose(model, 0.5, method='cp')       # max compression
- Replace _unfold with _mode_unfold using rearrange (clearer intent)
- Spatial: vectorize with batched SVD (no more O(C_out×C_in) loop)
- CP: vectorize spatial decomposition with batched SVD
- Tucker/SVD: use rearrange for weight reshaping (replaces unsqueeze chains)
- All methods: cleaner, faster on GPU, same results
…work docs

- Tucker: pass activation RMS as input_scale to HOOI — weights mode-1
  unfolding by activation statistics (distribution-aware Tucker)
- SVD: scale input channels by activation RMS before SVD, undo after
  (same ASVD pattern as FC_Decomposer)
- Usage: Conv_Decomposer().decompose(model, 0.5, data=[batch])
- Backward compatible: data=None = standard decomposition
- Document future work: LayerNorm_Folder, NuclearNormCallback,
  latency-aware rank selection
Activation-aware Tucker/SVD increases raw reconstruction error on small
CNNs (15% accuracy drop on Pets/ResNet-18). The 4D tensor structure
makes exact scale/unscale (which works for FC's 2D SVD) incorrect —
the weighted HOOI optimizes a different objective than standard HOOI,
and projecting the original weight onto the scaled factors introduces
error.

Keep ASVD only in FC_Decomposer (2D SVD, exact scale/unscale).
Document Conv activation-aware as future work pending reference impl.
Strip execution metadata to pass CI's clean-checkout check.
In CI (Python 3.12), torch.jit.trace may emit additional warnings.
Filter for DeprecationWarning specifically instead of asserting exact count.
@nathanhubens nathanhubens merged commit f1087d4 into master Apr 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant