D# WeightWatcher-Examples A curated collection of real-world examples, notebooks, and experiments using WeightWatcher, the open‑source tool for analyzing layer-wise spectra, heavy‑tailed behavior, power‑law exponents (α), correlation traps, and model quality throughout training.
These examples span small MLPs, double descent, and billion-parameter LLMs.
Single layer example
How to analyze fine-tuned models
How varying the batch size and/or learning rates affect convergence
Explaining Epoch-wise Double Descent
Comparing the inductive biases between AdamW and Muon
MLP3 on CIFAR10: Extreme overfitting in the first layer
Post Analysis of the paper "Overtrained Language Models Are Harder to Fine-Tune"
The Magic of Mistral Dragon Kings blog
Expeperiment Method: SVD Smooting
The original 1989 Double Descent Experiment (https://calculatedcontent.com/2024/03/01/describing-double-descent-with-weightwatcher/)
Comparing BERT, RoBERTa, XLNet
ONNX Format
Old experiments on random labels
- How α < 2 identifies overfitting & correlation traps
- Spectral phase transitions during training
- Epoch-wise double descent behavior
- Optimizer differences (Muon vs AdamW vs SGD)
- Fine‑tuning shifts between underfit → well‑fit → overfit
- Diagnostics for memorization and rank collapse
git clone https://github.com/CalculatedContent/WeightWatcher-Examples.git
cd WeightWatcher-Examples
pip install weightwatcher
jupyter notebookMIT License — see LICENSE