feat: add ASVD (activation-aware SVD) to FC_Decomposer by nathanhubens · Pull Request #38 · FasterAI-Labs/fasterai

nathanhubens · 2026-04-13T19:34:37Z

Summary

Phase 3 of misc module revamp. Adds activation-aware SVD (ASVD) from Yuan et al. 2024.

What it does

Standard SVD treats all input channels equally. ASVD weights channels by their actual activation magnitude before decomposing — channels the model uses heavily are harder to truncate.

Usage

# Standard SVD (existing behavior)
FC_Decomposer().decompose(model, percent_removed=0.5)

# ASVD: pass calibration data for better decomposition
FC_Decomposer().decompose(model, percent_removed=0.5, data=[calibration_batch])

Algorithm

Hook all Linear layers, run calibration data, collect per-channel RMS
Scale weight columns: W_scaled = W * diag(rms + eps)
SVD on W_scaled, truncate to rank k
Undo scaling in first decomposed layer: W2 = W2 / diag(rms + eps)

The scaling cancels out exactly — only the truncation decision changes.

Also includes (from PR #37)

energy_threshold for automatic rank selection
layers/exclude for per-layer control
_collect_activation_rms helper (reusable by Conv_Decomposer)

Test plan

ASVD produces finite outputs
data=None matches standard SVD behavior
All existing tests pass unchanged
Full nbdev-test suite passes

Reference: ASVD4LLM

Phase 1 of misc module revamp: - Remove unused `import F` from bn_folding and fc_decomposer - Replace cpu_optimizer with optimize_for_cpu (torch.compile backend) - Old accelerate_model_for_cpu deprecated with shim - Fixed bug: torch.jit.script doesn't use example_input (was dead param) - Removed dependency on deprecated optimize_for_mobile - Added tests (was skip_exec with zero coverage) - Add conv_decomposer.ipynb and cpu_optimizer.ipynb to _quarto.yml sidebar - Add cpu_optimizer to misc/all.py exports - Fix rank_ratio → percent_removed doc bug in fc_decomposer tutorial

Phase 2 of misc module revamp: FC_Decomposer + Conv_Decomposer: - Add energy_threshold: auto rank selection via singular value energy retention (e.g., 0.99 keeps 99% of energy). Mutually exclusive with percent_removed. - Add layers/exclude: per-layer control using exact layer names (matching Sparsifier dict-based pattern, not regex) - Shared helpers: _rank_from_energy, _should_decompose Conv_Decomposer: - Expose n_iter (default 10, was hardcoded 5) and tol (1e-4) for HOOI - Early stopping: HOOI exits when factor matrices converge within tol Traversal refactored from recursive _modules to named_modules() + parent replacement (cleaner, handles nested modules correctly). All backward compatible — new params have defaults matching old behavior.

Pass calibration data to get better decomposition — channels with higher activations are prioritized during SVD truncation. Algorithm (from Yuan et al., 2024): 1. Collect per-channel activation RMS via forward hooks 2. Scale weight columns: W_scaled = W * diag(rms) 3. SVD on W_scaled → truncate to rank k 4. Undo scaling: W2 = W2 / diag(rms) The scaling cancels out exactly — only the truncation decision changes. Backward compatible: data=None gives standard SVD. Usage: FC_Decomposer().decompose(model, 0.5, data=[calibration_batch])

Conv_Decomposer now supports two methods: - method='tucker' (default): 3 layers — pointwise compress + spatial + pointwise expand - method='svd' (new): 2 layers — spatial at reduced rank + pointwise expand SVD reshapes the 4D weight to (C_out, C_in*K*K), applies standard SVD, then splits into a spatial conv (C_in → R) and pointwise conv (R → C_out). Simpler, less overhead, better when moderate compression is enough. Usage: Conv_Decomposer().decompose(model, 0.5, method='svd') # 2 layers Conv_Decomposer().decompose(model, 0.5, method='tucker') # 3 layers (default)

Conv_Decomposer now supports 4 decomposition methods: | Method | Layers | Decomposes | Structure | |-----------|--------|--------------------|------------------------------| | 'tucker' | 3 | Both channels | 1×1 + K×K + 1×1 | | 'svd' | 2 | Output channels | K×K + 1×1 | | 'spatial' | 2 | Kernel (K×K→K×1+1×K) | K×1 + 1×K (grouped) | | 'cp' | 4 | Everything | 1×1 + K×1(dw) + 1×K(dw) + 1×1| Usage: Conv_Decomposer().decompose(model, 0.5, method='spatial') # K×K → K×1 + 1×K Conv_Decomposer().decompose(model, 0.5, method='cp') # max compression

- Replace _unfold with _mode_unfold using rearrange (clearer intent) - Spatial: vectorize with batched SVD (no more O(C_out×C_in) loop) - CP: vectorize spatial decomposition with batched SVD - Tucker/SVD: use rearrange for weight reshaping (replaces unsqueeze chains) - All methods: cleaner, faster on GPU, same results

…work docs - Tucker: pass activation RMS as input_scale to HOOI — weights mode-1 unfolding by activation statistics (distribution-aware Tucker) - SVD: scale input channels by activation RMS before SVD, undo after (same ASVD pattern as FC_Decomposer) - Usage: Conv_Decomposer().decompose(model, 0.5, data=[batch]) - Backward compatible: data=None = standard decomposition - Document future work: LayerNorm_Folder, NuclearNormCallback, latency-aware rank selection

…oser tutorial

Activation-aware Tucker/SVD increases raw reconstruction error on small CNNs (15% accuracy drop on Pets/ResNet-18). The 4D tensor structure makes exact scale/unscale (which works for FC's 2D SVD) incorrect — the weighted HOOI optimizes a different objective than standard HOOI, and projecting the original weight onto the scaled factors introduces error. Keep ASVD only in FC_Decomposer (2D SVD, exact scale/unscale). Document Conv activation-aware as future work pending reference impl.

Strip execution metadata to pass CI's clean-checkout check.

In CI (Python 3.12), torch.jit.trace may emit additional warnings. Filter for DeprecationWarning specifically instead of asserting exact count.

nathanhubens added 12 commits April 13, 2026 21:03

docs: update conv_decomposer tutorial with all 4 methods comparison

b474517

fix: use device=W.device for tensor creation in Spatial and CP methods

4469227

docs: add latency/speedup comparison to conv_decomposer tutorial

d2f6fc3

docs: add activation-aware + energy_threshold sections to conv_decomp…

9b14e93

…oser tutorial

This was referenced Apr 13, 2026

fix: misc module cleanup — deprecate cpu_optimizer, remove dead imports #36

Closed

feat: decomposer UX — energy_threshold, layers/exclude, HOOI convergence #37

Closed

nathanhubens added 2 commits April 13, 2026 22:51

chore: run nbdev-clean on all modified notebooks

e234676

Strip execution metadata to pass CI's clean-checkout check.

fix: relax deprecation warning test to tolerate extra warnings

42e955a

In CI (Python 3.12), torch.jit.trace may emit additional warnings. Filter for DeprecationWarning specifically instead of asserting exact count.

nathanhubens merged commit f1087d4 into master Apr 13, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add ASVD (activation-aware SVD) to FC_Decomposer#38

feat: add ASVD (activation-aware SVD) to FC_Decomposer#38
nathanhubens merged 14 commits into
masterfrom
feature/asvd

nathanhubens commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

nathanhubens commented Apr 13, 2026

Summary

What it does

Usage

Algorithm

Also includes (from PR #37)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant