## Overview

**FC Layer Decomposition** uses Singular Value Decomposition (SVD) to factorize large fully-connected layers into smaller, more efficient layers. This is particularly effective for models with large FC layers like VGG.

### How It Works

A weight matrix $W \in \mathbb{R}^{m \times n}$ is decomposed as:
$$W \approx U \cdot S \cdot V^T$$

By keeping only the top-$k$ singular values, we replace one large layer with two smaller layers:
- Original: `Linear(n → m)` with $m \times n$ parameters
- Decomposed: `Linear(n → k)` + `Linear(k → m)` with $k \times (m + n)$ parameters

When $k << \min(m, n)$, this significantly reduces parameters.

### When to Use FC Decomposition

| Model Type | FC Layer Size | Recommendation |
|------------|---------------|----------------|
| VGG-style | Very large (4096×4096) | ✅ **Highly effective** |
| ResNet-style | Small (512×classes) | ❌ Not needed |
| Transformers | Medium (hidden×4×hidden) | ⚠️ May help |

**Best for:** Models where FC layers dominate the parameter count (e.g., VGG has ~90% of parameters in FC layers).

In [None]:
#| include: false
from fastai.vision.all import *
from fasterai.misc.all import *

import torch
import torch.nn as nn
import torch.nn.functional as F

## 1. Setup and Data

In [None]:
path = untar_data(URLs.PETS)
files = get_image_files(path/"images")

def label_func(f): return f[0].isupper()

dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(64))

## 2. Train the Model

We use VGG16 with batch normalization - a model with very large FC layers:

In [None]:
learn = Learner(dls, vgg16_bn(num_classes=2), metrics=accuracy)
learn.fit_one_cycle(5, 1e-5)

epoch,train_loss,valid_loss,accuracy,time
0,0.696439,0.601888,0.686739,00:03
1,0.654363,0.559391,0.701624,00:03
2,0.624809,0.566203,0.697564,00:03
3,0.592064,0.534797,0.730717,00:03
4,0.591283,0.531738,0.73613,00:03


## 3. Apply FC Decomposition

Use `FC_Decomposer` to factorize the fully-connected layers:

In [None]:
#| include: false
def count_parameters(model):
    return sum(p.numel() for p in model.parameters())

In [None]:
fc = FC_Decomposer()
new_model = fc.decompose(learn.model)

Notice how each FC layer is now replaced by a `Sequential` of two smaller layers. For example:
- Original: `Linear(25088 → 4096)` = 102M parameters
- Decomposed: `Linear(25088 → 2048)` + `Linear(2048 → 4096)` = 59M parameters

In [None]:
new_model

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (7): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (9): ReLU(inplace=True)
    (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (12): ReLU(inplace=True)
    (13): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (14): Conv2d(128, 256

## 4. Compare Results

### Parameter Reduction

In [None]:
count_parameters(learn.model)

134277186

In [None]:
count_parameters(new_model)

91281476

A reduction of **~43 million parameters** (~32% smaller)!

### Accuracy Trade-off

SVD decomposition is an **approximation**, so some accuracy loss is expected. The accuracy depends on how many singular values are retained:

In [None]:
new_learn = Learner(dls, new_model, metrics=accuracy)
new_learn.validate()

[0.5516967177391052, 0.7050067782402039]

The accuracy drop from ~90% to ~68% is significant. To recover accuracy, **fine-tune** the decomposed model:

```python
new_learn = Learner(dls, new_model, metrics=accuracy)
new_learn.fit_one_cycle(5, 1e-4)  # Fine-tune with small learning rate
```

## 5. Parameter Reference

### FC_Decomposer Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `rank_ratio` | `0.5` | Fraction of singular values to keep (0-1). Lower = more compression, more accuracy loss |

### Choosing rank_ratio

| rank_ratio | Compression | Accuracy Impact |
|------------|-------------|-----------------|
| `0.8` | Low | Minimal |
| `0.5` | Medium | Moderate |
| `0.25` | High | Significant (requires fine-tuning) |

## Summary

| Metric | Original VGG16 | Decomposed | Change |
|--------|----------------|------------|--------|
| Parameters | 134M | 91M | **-32%** |
| FC Layer Params | ~120M | ~77M | **-36%** |
| Accuracy (before fine-tune) | 90% | 68% | Needs fine-tuning |

### Recommended Workflow

```python
from fasterai.misc.all import *

# 1. Train model
learn.fit_one_cycle(5)

# 2. Decompose FC layers
fc = FC_Decomposer(rank_ratio=0.5)
new_model = fc.decompose(learn.model)

# 3. Fine-tune to recover accuracy
new_learn = Learner(dls, new_model, metrics=accuracy)
new_learn.fit_one_cycle(3, 1e-4)

# 4. (Optional) Apply other compressions
# - Pruning, sparsification, quantization
```

---

## See Also

- [BN Folding](bn_folding.html) - Combine with BN folding for more optimization
- [Pruner](../../prune/pruner.html) - Apply structured pruning after decomposition
- [ONNX Export](../export/onnx_export.html) - Export optimized model for deployment
- [Sparsifier](../../sparse/sparsifier.html) - Add sparsity for further compression