Skip to content

Classification research

Sarah C edited this page Mar 22, 2023 · 5 revisions

Machine Learning researches for Classification

Models

Codes

Image classification

  • Comparison performances latest models
  • NoisyStudent (on all EfficientNets)
    • We train a "teacher" model on labelized images, then we use it to predict the class of unlabelized images. Then we train a "student" model on the mix sure images + images labelized by the teacher. Le student model then becomes teacher : it is used to re-labelize uncertian images, then we train a new student. We do that on various iterations.
    • +3% accuracy on ImageNet ➡️ For Basegun we are afraid of model derivation, this method seems too unreliable.
  • FixEfficientNet (on all EfficientNets)
    • Improved the training method, which usually takes a random crop of the input image to train while for test/inference images we take the middle crop of the image. method details: use a larger size for the crop of the input image to avoid taking into account only details. Quite effective
    • Uses a 2x bigger size for prediction than for training. To compensate for size difference, add a layer with parametric Fréchet model after max-pooling layer. Ineffective?
    • +3% accuracy on ImageNet
  • EfficientNetV2
    • Change the architecture of EfficientNet to have something lighter with less parameters which makes training much faster
    • During training, use “progressive learning” which consists in gradually increasing the size of the inputs as the epochs progress, while introducing more data augmentation=”regularizarion” (rotations for example) to avoid the overfitting
    • +2% accuracy on ImageNet but no improvement for transfer learning compared to B6 if we keep model <100M...
  • EfficientNetL2 ➡️ too large for Basegun (several hundred million parameters, we want several tens max since we need to be able to run on CPU)

ML Techniques

Overcome label noise

Improve the reliability of confidence score

The output of a classification model applied on a image is a array size number of classes which contains for each class the probability that this image is of this class. For instance if we have 5 classes, the output [0.1, 0.2, 0.5, 0.05, 0.15] means that the algorithm thinks:

  • there is 10% chance this image is of class 1
  • there is 20% chance this image is of class 2
  • etc.

Consequently, we consider the image of probably of class 3, since it's the class with highest confidence score. However, reseach showed that usually the confidence scores of deep neural networks are deceiptful : it's not because a score is of 0.5 that it means that 50% of the time the algorithm would be correct for that image. Therefore there is a technique called callibration which aims at correcting these scores so that they are more aligned with the statistics they are supposed to represent.

Improve prediction time

  • Use ONNX format ➡️ Tested on training VM, less performant than using original .pth. TODO : try on inference VM where maybe it could be better

Continue learning after training

  • Class-incremental learning: survey and performance evaluation ➡️ Not explored yet. We thought of using inputs from users of Basegun app, where they can vote when they think the prediction result is right, but after analyzing their votes we saw that the user inputs often contain mistakes -> fireams identification need too much knowledge so we need to double-check new images for dataset by experts.