Parameter Efficiency Leaderboard

MNIST, 99% accuracy

Parameters	Model	Links	Authors
910	ConvNet with customized Hough voting layer	code	Satoshi Tanaka
1,398	Three-layer Sharpened Cosine Similarity with paired depthwise and pointwise operations.	code	Raphael Pisoni

Fashion MNIST, 90% accuracy

Parameters	Model	Links	Authors
1,890	ConvNet with customized Hough voting layer	code	Satoshi Tanaka
2,764	Four-layer SCS (layers include depthwise and pointwise operations) with 20 kernels per layer	code	Brandon Rohrer
7,156	WaveMix Lite-8/5.	code, paper	Pranav Jeevan P, Amit Sethi

Fashion MNIST, 95% accuracy

Parameters	Model	Links	Authors

CIFAR-10, 80% accuracy

Parameters	Model	Links	Authors
25,214	Three-layer Sharpened Cosine Similarity with Mixer layer (paired depthwise and pointwise layers).	code	Brandon Rohrer
37,058	WaveMix Lite-32/7 (Replaced DeConv with Upsample)	code, paper	Pranav Jeevan P, Amit Sethi
37,086	Three-layer Sharpened Cosine Similarity with 56 kernels in each layer.	code	Brandon Rohrer
45,962	WaveMix Lite-32/4 (ff=16, mult=1, dropout=0.25).	code, paper	Pranav Jeevan P, Amit Sethi
47,643	Three-layer Sharpened Cosine Similarity with 30 5x5 kernels in each layer.	code	Brandon Rohrer

CIFAR-10, 90% accuracy

Parameters	Model	Links	Authors
103,000	ConvMixer-128/4, achieved 91.26%.	paper, code	Asher Trockman, J. Zico Kolter
520,106	WaveMix Lite-64/6	code, paper	Pranav Jeevan P, Amit Sethi
639,702	kEffNet-B0, an EfficientNet with paired pointwise convolutions, achieved 91.64%.	paper	Joao Paulo Schwarz Schuler, Santiago Romani, Mohamed Abdel-Nasser, Hatem Rashwan, Domenec Puig
1.2M	SCS-based network achieved 91.3%.	code	Håkon Hukkelås

CIFAR-10, 95% accuracy

Parameters	Model	Links	Authors
594,000	ConvMixer-256/8	paper, code	Asher Trockman, J. Zico Kolter

ImageNet top-1, 80% accuracy

Parameters	Model	Links	Authors
21.1M	ConvMixer-768/32	paper, code	Asher Trockman, J. Zico Kolter

ImageNet top-1, 90% accuracy

Parameters	Model	Links	Authors
390M	EfficientNet-B6-Wide with Meta-Pseudo Labels with 300M unlabled images from JFT	paper, code	Hieu Pham, Zihang Dai, Qizhe Xie, Quoc V. Le

Why parameter efficiency?

There are a lot of different dimensions to a model's performance and parameter efficiency is one that gets overlooked. If two models have similar accuracy, but one has fewer parameters it will probably be cheaper to store, run, distribute, and maintain. Some model families are inherently more parameter efficient than others, but those differences aren't showcased in accuracy leaderboards. This is a chance for parameter efficient architectures to get their time in the spotlight.

Isn't this just a cherry-picked metric that sharpened cosine similarity does well on?

Yes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Parameter Efficiency Leaderboard

MNIST, 99% accuracy

Fashion MNIST, 90% accuracy

Fashion MNIST, 95% accuracy

CIFAR-10, 80% accuracy

CIFAR-10, 90% accuracy

CIFAR-10, 95% accuracy

ImageNet top-1, 80% accuracy

ImageNet top-1, 90% accuracy

Why parameter efficiency?

Isn't this just a cherry-picked metric that sharpened cosine similarity does well on?

Files

README.md

Latest commit

History

README.md

File metadata and controls

Parameter Efficiency Leaderboard

MNIST, 99% accuracy

Fashion MNIST, 90% accuracy

Fashion MNIST, 95% accuracy

CIFAR-10, 80% accuracy

CIFAR-10, 90% accuracy

CIFAR-10, 95% accuracy

ImageNet top-1, 80% accuracy

ImageNet top-1, 90% accuracy

Why parameter efficiency?

Isn't this just a cherry-picked metric that sharpened cosine similarity does well on?