Fashion-MNIST

CNN architecture and parameters tuning for Fashion MNIST dataset

Five CNN architectures were implemented. The best one, Conv2Dsimple2 has two convolutional layers (followed by maxpooling, batch normalization and 25% dropout respectively) and two fully connected layers (followed by 25% dropout). Achieved test accuracy is 93.6% for a network with 0.6M parameters.

Conv2Dsimple2 test metrics

Shirt class is the hardest to learn for this classifier. Both miss rate (derived from recall) and precision are minimal for this class.

Conv2Dsimple2 10-class confusion matrix on test set

Here one may comprehend better what is going on behind metrics in previous image (rows are classifier's prediction, columns represent the ground truth): *shirt* and *top* are mixed most often. Setting class weights to a higher value for this classifier does not do much good - it decreases recall. The incapacity to distinguish between these two classes is the main drawback of *Conv2Dsimple2*.

Conv2Dsimple2 architecture

CNN architecture choices and motivation:

First, Conv2Dnet (3.2M parameters) was built after a research on the subject, from publicly available models listed on benchmarks on the Fashion-MNIST github page. Inspiration was fairly high test set accuracy (>93%) given small number of convolutional layers (up to two) and small number of parameters (~2M). The drawback of this architecture is limited capacity to learn the train data (plateauing very early into training).
Second, InceptionCNN (20.1M parameters) was implemented, a shallow CNN representing only one Inception block of this popular architecture (GoogLeNet). Reason was that I wanted try out multiple convolutional blocks with different filter sizes and max pooling and to see whether this might increase both train and validation accuracies. Indeed, network had >99.5% accuracy on train data, but really low validation accuracy (92.7%) -> overfitting.
Third, Conv2Dsimple (0.4M parameters) was a test build, as I wanted to see whether I can use the 1D convolution followed immediately by 2D convolution approach (convolutional layer with dimensionality reduction, or “network in network”) from Inception module and still get decent results. Indeed, train accuracy was still low but the network performs well enough on test set – 92.8% accuracy.
After that, Conv2Dsimple2 (0.6M parameters) and Conv2Dsimple3 (5.6M parameters) architectures were built by stacking an another convolutional layer with dimensionality reduction, first having smaller amount of filters. Second has also an additional fully connected layer. Training/validation accuracies were {96%, 98%}/{93%, 93.8%} while test set accuracies were 93.6%/93.9% respectively (Conv2Dsimple2 order-first).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Conv2Dsimple2		Conv2Dsimple2
CNNmodel.py		CNNmodel.py
ConfMat.png		ConfMat.png
Conv2Dnetsimple2-Arch.png		Conv2Dnetsimple2-Arch.png
README.md		README.md
TestMetrics.png		TestMetrics.png
train.py		train.py
train_callbacks.py		train_callbacks.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fashion-MNIST

Conv2Dsimple2 test metrics

Conv2Dsimple2 10-class confusion matrix on test set

Conv2Dsimple2 architecture

CNN architecture choices and motivation:

About

Releases

Packages

Languages

dtoertei/fashion-mnist

Folders and files

Latest commit

History

Repository files navigation

Fashion-MNIST

Conv2Dsimple2 test metrics

Conv2Dsimple2 10-class confusion matrix on test set

Conv2Dsimple2 architecture

CNN architecture choices and motivation:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages