How many parameters are needed to get 99% of MNIST?
Well, 697 parameters and 5 convolution layers later, we have an upper limit!
Inspired by https://github.com/ruslangrimov/mnist-minimal-model
Components optimized:
- Activation function
- Kernel initialization
- Layer count and number of kernels per layer
- Kernel filter sizes/shapes
- Dropout %
- Optimizer (Type, decay, lrs)
- LR Scheduling
- Augmentation (none=best)
- Probably several other things I'm forgetting.
~700 697-parameter models trained (305 plotted).
https://github.com/ThomasWarn contributed ~35 models.