Spectral Compact Training (SCT)

Train 70B-class neural networks on a Steam Deck.

SCT stores every weight matrix as W = U diag(s) V^T and never builds the dense matrix. Exact gradients flow through the small spectral factors via standard backpropagation. After each optimizer step, U and V are retracted to the Stiefel manifold via QR decomposition. That's the entire method.

Dense 70B + Adam:  1,245 GB
SCT 70B + Adam:      7.2 GB
Compression:          172x

Patent Pending — Irish Short-Term Patent Application PTIE20260000000219, filed March 27, 2026.

Results

70B Architecture on Consumer Hardware

Hardware	Peak Memory	Forward	Backward	Total Step
Apple M4 Pro (48 GB)	7,938 MB	-	2.15s	-
Steam Deck (16 GB)	7,235 MB	0.43s	0.92s	6.28s

80 layers, d=8192, ffn=28672, SwiGLU activation. LLaMA-3-70B proportions. 452M spectral parameters representing 77.8B dense equivalent. Orthonormality error after retraction: 1.30e-06.

MLP Proof (from-scratch training)

Task	Dense+AdamW	SFT+DFA	SCT (ours)
Sine regression (loss)	0.000002	0.0768	0.0000680
XOR classification (acc)	100%	85.5%	100%

SCT matches dense training quality. 1,129x better than DFA on sine regression.

Compression Scales with Model Size

Model	Layer	SCT (k=32)	Compression
SmolLM2-135M	576 x 1536	1.1 MB	13x
SmolLM2-1.7B	2048 x 8192	5.2 MB	51x
LLaMA-7B	4096 x 11008	7.7 MB	93x
Qwen3.5-27B	4096 x 17408	11.0 MB	104x
LLaMA-70B	8192 x 28672	18.9 MB	199x

How It Works

Forward Pass

h  = x @ U       # [batch, k]     project into spectral basis
hs = h * s       # [batch, k]     scale by singular values
y  = hs @ V.T    # [batch, out]   reconstruct in output space

Three small matmuls. Cost: O(bk(m+n)) instead of O(bmn). Never builds the m x n matrix.

Backward Pass

PyTorch autograd computes dL/dU, dL/ds, dL/dV exactly through the same three operations. Gradients are shapes (m x k), (k,), (n x k). No m x n gradient ever exists.

Stiefel Retraction

After Adam updates U and V, they're no longer orthonormal. QR retraction fixes this:

Q, R = torch.linalg.qr(U_updated)
U = Q * torch.sign(torch.diag(R))  # sign correction for stability

Cost: O(mk^2) per layer. This is what makes SCT a training method, not just compression.

Quick Start

70B Architecture Test (fits on any machine with 8+ GB RAM)

pip install torch transformers
python examples/sct_steamdeck.py

Fine-tuning SmolLM2

pip install torch transformers datasets
python examples/macbook_m4pro/sct_smollm2.py --energy 0.95 --steps 400

Core Implementation

The entire method is one class:

from spectral_compact_training import SpectralLinear, retract_all

# Use like nn.Linear
layer = SpectralLinear(in_features=4096, out_features=11008, rank=32)
y = layer(x)

# After optimizer.step(), retract to Stiefel manifold
optimizer.step()
retract_all(model)

Repository Structure

sct/
  spectral_compact_training/       Core library
    __init__.py
    spectral_layer.py              SpectralLinear implementation
  examples/
    sct_steamdeck.py               70B architecture validation
    macbook_m4pro/
      sct_70b_flex.py              70B on M4 Pro (MPS backend)
      sct_smollm2.py               SmolLM2 fine-tuning
      sct_vs_dense.py              Head-to-head Dense vs SCT
  proof/
    SteamDeck-Demo.mp4             Video: Steam Deck running 70B
    SteamDeck-Konsole.mp4          Video: terminal output
    SteamDeck-Konsole-Output.txt   Raw console log (v2)
    sct_smollm2_results.json       SmolLM2 fine-tuning results
    sct_vs_dense_results.json      Dense vs SCT comparison
    patent_pending.webp             Filing confirmation
  docs/
    SCT_Patent_Application.pdf     Patent specification
    SCT_Whitepaper.pdf             Technical whitepaper
    paper.tex                      arXiv preprint source

Important Notes

What SCT is: A training method that stores and updates weights exclusively in spectral form with exact gradients and Stiefel manifold constraints. The 70B results are architectural validation: a full training step (forward, backward, optimizer, retraction) fits in 7.2 GB.

What SCT is not: A finished 70B model. Training a model to completion requires compute time proportional to the dataset size, which SCT does not change. SCT changes how much memory you need to do that training.

Scaling: SCT compression improves with model size. Models below ~360M parameters (hidden dim < 1024) don't benefit meaningfully at practical ranks. The sweet spot is 1.7B+ where rank 32 gives 50x+ compression.

Citation

@misc{kohlberger2026sct,
  title={Spectral Compact Training: Memory-Efficient Neural Network Training
         via Truncated SVD Factorization with Stiefel Manifold Retraction},
  author={Kohlberger, Bj{\"o}rn Roman},
  year={2026},
  note={Irish Patent Application PTIE20260000000219}
}

License

Apache 2.0

Author

Bjorn Roman Kohlberger -- EctoSpace

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
docs		docs
examples		examples
proof		proof
spectral_compact_training		spectral_compact_training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spectral Compact Training (SCT)

Results

70B Architecture on Consumer Hardware

MLP Proof (from-scratch training)

Compression Scales with Model Size

How It Works

Forward Pass

Backward Pass

Stiefel Retraction

Quick Start

70B Architecture Test (fits on any machine with 8+ GB RAM)

Fine-tuning SmolLM2

Core Implementation

Repository Structure

Important Notes

Citation

License

Author

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 1

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Spectral Compact Training (SCT)

Results

70B Architecture on Consumer Hardware

MLP Proof (from-scratch training)

Compression Scales with Model Size

How It Works

Forward Pass

Backward Pass

Stiefel Retraction

Quick Start

70B Architecture Test (fits on any machine with 8+ GB RAM)

Fine-tuning SmolLM2

Core Implementation

Repository Structure

Important Notes

Citation

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 1

Languages

Packages