We introduce a straightforward yet effective method to empirically measure and regularize memorization in deep neural networks for classification tasks. Our approach augments each training sample with auxiliary random labels, which are then predicted by a random label prediction head (RLP-head). RLP-heads can be attached at arbitrary depths of a network, predicting random labels from the corresponding intermediate representation and thereby enabling analysis of how memorization capacity evolves across layers. By interpreting the RLP-head performance as an empirical estimate of Rademacher complexity, we obtain a direct measure of both sample-level memorization and model capacity. We leverage this random label accuracy metric to analyze generalization and overfitting in different models and datasets. Building on this approach, we further propose a novel regularization technique based on the output of the RLP-head, which demonstrably reduces memorization. Interestingly, our experiments reveal that reducing memorization can either improve or impair generalization, depending on the dataset and training setup. These findings challenge the traditional assumption that overfitting is equivalent to memorization and suggest new hypotheses to reconcile these seemingly contradictory results.
git clone https://github.com/MarlonBecker/RandomLabelHeads
cd RandomLabelHeads
# Create and activate conda environment
conda env create
conda activate RandomLabelHeads
# Run with 'all default'
python -m torch.distributed.run main.py
# Switch to ImageNet data with Vision Transformer by providing corresponding parameter file (--ifile option). Adjust number of random labels, epochs and batch items. Use verbose output.
python -m torch.distributed.run main.py --subLabels 100000 --verbose --ifile configs/ImageNet_ViT_adam.toml --epochs 10 --batchSize 2
# Now, also adjust regularization factor (default: 0, no regularizion). Use 'Big' Vision Transformer.
python -m torch.distributed.run main.py --subLabels 100000 --verbose --ifile configs/ImageNet_ViT_adam.toml --epochs 10 --batchSize 2 --ViTSize B --reg 1e3
# Display help output (to see all options)
python -m torch.distributed.run main.py -h| Option | Description | Default |
|---|---|---|
--logDir |
(Main) Directory to store logs | ./logs |
--logSubDir |
Subdirectory to store logs | RLP-heads_test |
--subLabels |
Number of random labels (learned by RLP-head) | 10 |
--regFactor |
Regularization strength | 0 |
--epochs |
Total number of epochs | 200 |
--contin |
Continue training from checkpoint | false |
--truncate |
Overwrite any existing logs | false |
--model |
Classification network to be used | WRN |
--dataset |
Dataset to be trained on (classification task) | CIFAR100 |
--labelNoise |
Fraction of label noise to be applied to classification labels | 0 |
Call python -m torch.distributed.run main.py -h for an overview of all available arguments.
If you find RLP-heads helpful, please consider citing our paper in your work:
@inproceedings{becker2026rlpheads,
title={Random Label Prediction Heads for Studying Memorization in Deep Neural Networks},
author={Marlon Becker and Jonas Konrad and Luis Garcia Rodriguez and Benjamin Risse},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=qBknFL81JO}
}