Implement Image Classification on dermaMNIST #20

bdzyubak · 2024-04-04T16:22:52Z

As I have mostly done image segmentation in the past, let's do a classification project!

dermaMNIST is a publicly available dataset of small RGB images of skin tumors with multi-label disease classifications. Unlike the hand digit dataset which is easy, the benchmark accuracy published in Nature with resnet50 is only 0.73. https://www.nature.com/articles/s41597-022-01721-8/tables/4

Let's see if we can do better.
pip install medmnist
from medmnist import DermaMNIST

bdzyubak · 2024-04-05T19:24:52Z

A basic tutorial style CNN + FCN gets to 0.69 train acc in one epoch and then stays there.

ResNet50 from torchvision gets:
Fresh weights: [98/100] train_loss: 0.333 - train_acc: 0.884 - eval_loss: 1.127 - eval_acc: 0.704
Pretrained weights: [100/100] train_loss: 0.207 - train_acc: 0.932 - eval_loss: 1.095 - eval_acc: 0.754

Using pretrained than random weights as a starting point is much better for optimization and generalizability, as always. Another way to help generalizability would be to train just some of the layers (e.g. add additional heads).

bdzyubak · 2024-04-08T21:41:36Z

Some fun with basics
D:\Source\torch-control\projects\ComputerVision\dermMNIST\train_basic_network.py

A) A network bottlenecked to 1x1 by CNN+maxpooling, that is torch.Size([100, 4096, 1, 1]) going into dense layers, will still train ok.

B) But a network that has another conv layer (no maxpool) following this won't. Likely because of issues with padding a 1x1 feature map for 3x3 Conv2d to run on it.

C) Maxpooling/increasing channels vs raw CNN layers helps optimization but has little impact on the final accuracy. The latter may not hold in a more difficult dataset.
Maxpool:
[22/100] train_loss: 0.129 - train_acc: 0.956 - eval_loss: 1.831 - eval_acc: 0.699

No maxpool:
[22/100] train_loss: 0.510 - train_acc: 0.809 - eval_loss: 1.175 - eval_acc: 0.620
[52/100] train_loss: 0.012 - train_acc: 0.996 - eval_loss: 2.166 - eval_acc: 0.737

D) Without activation layers, the network will take longer to optimize. It still fits train data okay but fails to generalize, even with dropout:
[22/100] train_loss: 0.609 - train_acc: 0.769 - eval_loss: 0.725 - eval_acc: 0.727
[70/100] train_loss: 0.143 - train_acc: 0.936 - eval_loss: 2.066 - eval_acc: 0.686

bdzyubak · 2024-04-09T15:12:48Z

The issue of the network underfitting this data was caused by a bug in the basic implementation. Pytorch CrossEntropyLoss applies nn.log_softmax() inside it and needs to be passed raw logits. If it is passed nn.log_softmax(), it works fine, but if it is passed nn.Softmax, this really hurts optimization
With nn.Softmax() activation - stuck at 0.67 train acc:
[22/100] train_loss: 1.499 - train_acc: 0.670 - eval_loss: 1.467 - eval_acc: 0.669

No nn.Softmax() layer - just pass logits
[22/100] train_loss: 0.129 - train_acc: 0.956 - eval_loss: 1.831 - eval_acc: 0.699

In the end, fitting the train data in dermaMNIST turned out to be very easy. There is still a class imbalance and generalizability issue to val data, which will be addressed in a future issue.

bdzyubak added the enhancement New feature or request label Apr 4, 2024

bdzyubak added this to the May Release milestone Apr 4, 2024

bdzyubak self-assigned this Apr 4, 2024

bdzyubak mentioned this issue Apr 9, 2024

20 implement image classification on dermamnist #21

Merged

bdzyubak linked a pull request Apr 9, 2024 that will close this issue

20 implement image classification on dermamnist #21

Merged

bdzyubak closed this as completed in #21 Apr 9, 2024

bdzyubak mentioned this issue Apr 10, 2024

Add oversampling/augmentation to mitigate class imbalance #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Image Classification on dermaMNIST #20

Implement Image Classification on dermaMNIST #20

bdzyubak commented Apr 4, 2024 •

edited

Loading

bdzyubak commented Apr 5, 2024 •

edited

Loading

bdzyubak commented Apr 8, 2024 •

edited

Loading

bdzyubak commented Apr 9, 2024

Implement Image Classification on dermaMNIST #20

Implement Image Classification on dermaMNIST #20

Comments

bdzyubak commented Apr 4, 2024 • edited Loading

bdzyubak commented Apr 5, 2024 • edited Loading

bdzyubak commented Apr 8, 2024 • edited Loading

bdzyubak commented Apr 9, 2024

bdzyubak commented Apr 4, 2024 •

edited

Loading

bdzyubak commented Apr 5, 2024 •

edited

Loading

bdzyubak commented Apr 8, 2024 •

edited

Loading