Skip to content

CatalinTiseanu/dl-hackerearth-challenge1

Repository files navigation

Deep learning competition on grocery images dataset

By (in alphabetical order): Alexandru Hodorogea, Andrei Olariu, Catalin Tiseanu

Synopsis

We competed in a image classification competition in which we finished in 7th place with an accuracy of 95%. The dataset was composed from grocery products on shelves.

The problem

The task was an image classification problem, on a 5000 images dataset with 27 classes. It was run on HackerEarth, in a style similar to Kaggle: https://www.hackerearth.com/challenge/competitive/deep-learning-challenge-1/?utm_source=challenges-modern&utm_campaign=registered-challenges&utm_medium=right-panel

There is an existing paper about the dataset (we found out about this later in the competition). https://arxiv.org/abs/1611.05799 [1611.05799] The Freiburg Groceries Dataset Abstract: With the increasing performance of machine learning techniques in the last few years, the computer vision and robotics communities have created a large number of datasets for benchmarking object recognition tasks. The authors get 78.9% accuracy.

Overview

We used Keras. We started by using pretrained models (VGG16, VGG19, Resnet50, ResnetV2) from Keras, adding dense layers on top:

base_model = applications.InceptionResNetV2( \
    weights='imagenet', include_top=False, input_shape=x_train.shape[1:])

add_model = Sequential()
add_model.add(Flatten(input_shape=base_model.output_shape[1:]))
add_model.add(BatchNormalization())
add_model.add(Dense(256, activation='relu'))
add_model.add(BatchNormalization())
add_model.add(Dropout(0.5))
add_model.add(Dense(np.max(y_train) + 1, activation='softmax'))

model = Model(inputs=base_model.input, outputs=add_model(base_model.output))
model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizers.Adam(lr=1e-5), \
                  metrics=['accuracy'])

base_model.trainable = False

We froze the pre-trained layers and trained this for an accuracy of 80%. The next step was to add image augmentation, using the Keras ImageDataGenerator:

train_datagen = ImageDataGenerator(
        rotation_range=30,
        width_shift_range=0.1,
        height_shift_range=0.1,
        shear_range=0.1,
        zoom_range=0.1,
        horizontal_flip=True,
        vertical_flip=True,
)

The workflow was:

  • Start with training a model like above with not augmentation for ~10 epochs
  • As soon as it started to overfit (validation accuracy started lagging behind training accuracy), stop the training and add more aggresive augmentation (bigger rotations, shifts, scaling, zooms, etc)
  • Repeat this for 6-7 cycles

With this we reached single model performance (accuracy given by using a single CNN model) of about 91-92%. What also helped a lot (described more below): Ensembling and Augmented Prediction

Lessons

Taking out of box models (pre-trained deep nets) give you around 80% accuracy With extra work, tuning and insights, a gain of up to 15% is possible, reflecting our final score of 95%, starting from ~ 80% - this was the single most surprising insight for me (that extra work / insights produce such a large gain, even when starting with the latest, hottest pre-trained networks). The main factors contributing to that gain are, in decreasing order of importance:

Heavy augmentation

Adding progressively heavier image augmentation was key to getting better and better single-model performance (as in score generated by using a single CNN). The key here was to add the augmentation progressively: start with just the normal train set, train for 10 epochs; add some augmentation, train for 10 more epochs; add heavier augmentation, train for 10 epochs; and so on. Doing this enabled a single model accuracy of up to 91-92%, from the initial 80%

Ensembling

Added around 2-3%. The 2 main points here: Model variety helps a lot. We used VGG16, VGG19, Resnet50 and ResnetV2 for pretrained models. Adding different checkpoints from the same model also helps. This means that while training a model, we took checkpoints at different points in time (eg after training every 10 epochs). Adding multiple checkpoints for the same model was better than just adding the latest, highest scoring checkpoint for that model

Augmented Prediction

Using augmentation at test time. When classifying an image from the test set (the one for which we don’t labels and which we would like to predict), instead of predicting just on the image, use augmentation in order to generate 10 augmented version of that image. Predict on all augmented versions and then average the predictions for the final prediction. The general idea here is that given that we use augmentation when training, it makes sense to use it when also predicting on the test set.

What didn’t work / we didn't try

Pseudo-labeling

Try to assign labels to images in the test set in order to use them in the training process (enrich the labeled dataset by guessing labels for the test set).

Different methods of Ensembling

Instead of using a voting Ensemble (multiple models vote on the label for a test image; the label with the most votes wins) use a standard ML algorithm on top (such as Gradient Boosting Trees or Logistic Regression). I plan to try this since Ensembling methods are key to squeezing out more performance in almost any ML task.

References

[1] https://www.hackerearth.com/challenge/competitive/deep-learning-challenge-1/
[2] http://webmining.olariu.org/top-ten-finish-in-the-deep-learning-hackerearth-challenge
[3] https://my.memo.ai/external/LDQ80W6NTcGAp7TtB0s0