In this project, we implement a neural network to predict the class labels of a given image without using any deep learning packages.
- This dataset contains 10,000 images in total, which are divided into 20 classes (each class has 500 images). The class labels are as follows.
label index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
text description | goldfish | frog | koala | jellyfish | penguin | dog | yak | house | bucket | instrument | nail | fence | cauliflower | bell peper | mushroom | orange | lemon | banana | coffee | beach |
- The dataset is cultured from the tiny ImageNet dataset. You may want to visit the source page for more details.
- All training images are color images, and each of them has 64 * 64 pixels. The size of the testing images is the same as the training images.
-
Input: an image.
-
Output: the confidence scores of all classes. If the input image is belong to one of the 20 classes, the confidence score corresponding to its true class is expected to be the largest one. Otherwise, the confidence score of a label “unknown” is supposed to be the largest one. Thus, we indeed have 21 class labels.
- For each testing image, the classifier will output the top three class labels with the highest confidence scores. If the true class label is one of the top three labels, we will say the classifier has correctly predicted the class label of the input image; otherwise, the classifier made a mistake.
-
In the actual training, we searched the Internet for 1000 64*64 RGB images as training materials for the unknown class.
-
We choose six different structures and test their performance on this problem separately. These six structures are as follows.
structure1 | structure2 | structure3 | structure4 | structure5 | structure6 |
---|---|---|---|---|---|
4-layers model | 6-layers model | residual model | VGG-like model 1 | VGG-like model 2 | VGG-like model 3 |
conv_relu(32,3,3) max_pooling(2,2) fc(100) fc(21) |
conv_relu_bn(32,5,5) max_pooling(2,2) conv_relu(64,3,3) conv_relu_bn(64,3,3) max_pooling(2,2) fc(128) fc(21) |
conv_relu_bn(32,5,5) max_pooling(2,2) res_block res_block mmax_pooling(2,2) fc(256) fc(21) |
conv_relu(32,3,3) max_pooling(2,2) conv_relu_bn(64,3,3) max_pooling(2,2) conv_relu(128,3,3) conv_relu_bn(128,3,3) max_pooling(2,2) conv_relu(256,3,3) conv_relu_bn(256,3,3) max_pooling(2,2) fc(256) fc(256) fc(21) |
conv_relu(32,3,3) max_pooling(2,2) conv_relu_bn(64,3,3) max_pooling(2,2) conv_relu(128,3,3) conv_relu_bn(128,3,3) max_pooling(2,2) conv_relu(256,3,3) conv_relu_bn(256,3,3) max_pooling(2,2) fc(1024) fc(1024) fc(21) |
conv_relu(64,3,3) max_pooling(2,2) conv_relu_bn(128,3,3) max_pooling(2,2) conv_relu(256,3,3) conv_relu_bn(256,3,3) max_pooling(2,2) conv_relu(512,3,3) conv_relu_bn(512,3,3) max_pooling(2,2) fc(1024) fc(1024) fc(21) |
- We trained 20 epochs on the training set with six different models and tested the accuracy of the model on the validation set after each epoch. We selected the best performance of 20 epochs as the best performance under one structure.
- The test results are as follows.(The accuracy below is based on the top-3 criterion.)
structure1 | structure2 | structure3 | structure4 | structure5 | structure6 |
---|---|---|---|---|---|
4-layers model | 6-layers model | residual model | VGG-like model 1 | VGG-like model 2 | VGG-like model 3 |
0.715 | 0.724 | 0.753 | 0.835 | 0.768 | 0.807 |
- Based on the performance of the test, we chose VGG-like model 1 as our final model. Its forward propagation process is as follows.