This neural network is of a standard architecture with 2 hidden layers. The activation function used is Leaky Rectified Linear units, and optimzation is done using mini-batch gradient descent.
For this project I have used dropout as regularization method (50%). Using this network with the below hyperparameters and one week of continous calculations on my desktop computer yeilded a result of 98.02% accuracy on the cross validation set.
60K training images/10K CV
- Hidden layers: 2
- Hidden units per layer: 4096
- Alpha: 0.003 Batch size: 100 Reg method: Dropout (p = 0.5) Iterations: 1500 epochs