Train 70B-class neural networks on a Steam Deck.
SCT stores every weight matrix as W = U diag(s) V^T and never builds the dense matrix. Exact gradients flow through the small spectral factors via standard backpropagation. After each optimizer step, U and V are retracted to the Stiefel manifold via QR decomposition. That's the entire method.
Dense 70B + Adam: 1,245 GB
SCT 70B + Adam: 7.2 GB
Compression: 172x
Patent Pending — Irish Short-Term Patent Application PTIE20260000000219, filed March 27, 2026.
| Hardware | Peak Memory | Forward | Backward | Total Step |
|---|---|---|---|---|
| Apple M4 Pro (48 GB) | 7,938 MB | - | 2.15s | - |
| Steam Deck (16 GB) | 7,235 MB | 0.43s | 0.92s | 6.28s |
80 layers, d=8192, ffn=28672, SwiGLU activation. LLaMA-3-70B proportions. 452M spectral parameters representing 77.8B dense equivalent. Orthonormality error after retraction: 1.30e-06.
| Task | Dense+AdamW | SFT+DFA | SCT (ours) |
|---|---|---|---|
| Sine regression (loss) | 0.000002 | 0.0768 | 0.0000680 |
| XOR classification (acc) | 100% | 85.5% | 100% |
SCT matches dense training quality. 1,129x better than DFA on sine regression.
| Model | Layer | SCT (k=32) | Compression |
|---|---|---|---|
| SmolLM2-135M | 576 x 1536 | 1.1 MB | 13x |
| SmolLM2-1.7B | 2048 x 8192 | 5.2 MB | 51x |
| LLaMA-7B | 4096 x 11008 | 7.7 MB | 93x |
| Qwen3.5-27B | 4096 x 17408 | 11.0 MB | 104x |
| LLaMA-70B | 8192 x 28672 | 18.9 MB | 199x |
h = x @ U # [batch, k] project into spectral basis
hs = h * s # [batch, k] scale by singular values
y = hs @ V.T # [batch, out] reconstruct in output space
Three small matmuls. Cost: O(bk(m+n)) instead of O(bmn). Never builds the m x n matrix.
PyTorch autograd computes dL/dU, dL/ds, dL/dV exactly through the same three operations. Gradients are shapes (m x k), (k,), (n x k). No m x n gradient ever exists.
After Adam updates U and V, they're no longer orthonormal. QR retraction fixes this:
Q, R = torch.linalg.qr(U_updated)
U = Q * torch.sign(torch.diag(R)) # sign correction for stabilityCost: O(mk^2) per layer. This is what makes SCT a training method, not just compression.
pip install torch transformers
python examples/sct_steamdeck.pypip install torch transformers datasets
python examples/macbook_m4pro/sct_smollm2.py --energy 0.95 --steps 400The entire method is one class:
from spectral_compact_training import SpectralLinear, retract_all
# Use like nn.Linear
layer = SpectralLinear(in_features=4096, out_features=11008, rank=32)
y = layer(x)
# After optimizer.step(), retract to Stiefel manifold
optimizer.step()
retract_all(model)sct/
spectral_compact_training/ Core library
__init__.py
spectral_layer.py SpectralLinear implementation
examples/
sct_steamdeck.py 70B architecture validation
macbook_m4pro/
sct_70b_flex.py 70B on M4 Pro (MPS backend)
sct_smollm2.py SmolLM2 fine-tuning
sct_vs_dense.py Head-to-head Dense vs SCT
proof/
SteamDeck-Demo.mp4 Video: Steam Deck running 70B
SteamDeck-Konsole.mp4 Video: terminal output
SteamDeck-Konsole-Output.txt Raw console log (v2)
sct_smollm2_results.json SmolLM2 fine-tuning results
sct_vs_dense_results.json Dense vs SCT comparison
patent_pending.webp Filing confirmation
docs/
SCT_Patent_Application.pdf Patent specification
SCT_Whitepaper.pdf Technical whitepaper
paper.tex arXiv preprint source
What SCT is: A training method that stores and updates weights exclusively in spectral form with exact gradients and Stiefel manifold constraints. The 70B results are architectural validation: a full training step (forward, backward, optimizer, retraction) fits in 7.2 GB.
What SCT is not: A finished 70B model. Training a model to completion requires compute time proportional to the dataset size, which SCT does not change. SCT changes how much memory you need to do that training.
Scaling: SCT compression improves with model size. Models below ~360M parameters (hidden dim < 1024) don't benefit meaningfully at practical ranks. The sweet spot is 1.7B+ where rank 32 gives 50x+ compression.
@misc{kohlberger2026sct,
title={Spectral Compact Training: Memory-Efficient Neural Network Training
via Truncated SVD Factorization with Stiefel Manifold Retraction},
author={Kohlberger, Bj{\"o}rn Roman},
year={2026},
note={Irish Patent Application PTIE20260000000219}
}Apache 2.0
Bjorn Roman Kohlberger -- EctoSpace