<a href="https://colab.research.google.com/github/yulinlina/MedMnist/blob/OragenSMNIST/ConvNeXt_for_OrgansMNIST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Use this notebook to finetune a ConvNeXt-tiny model on CIFAR 10 dataset. The [official ConvNeXt repository](https://github.com/facebookresearch/ConvNeXt) is instrumented with [Weights and Biases](https://wandb.ai/site). You can now easily log your train/test metrics and version control your model checkpoints to Weigths and Biases

# ⚽️ Installation and Setup

The following installation instruction is based on [INSTALL.md](https://github.com/facebookresearch/ConvNeXt/blob/main/INSTALL.md) provided by the official ConvNeXt repository. 

In [1]:
#运行前先在"代码执行程序"中选择"更改运行时类型"为GPU
!pip install -qq torch==1.8.0+cu111 torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
!pip install -qq wandb timm==0.3.2 six tensorboardX

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 GB[0m [31m892.5 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m34.1 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchtext 0.14.1 requires torch==1.13.1, but you have torch 1.8.0+cu111 which is incompatible.
torchaudio 0.13.1+cu116 requires torch==1.13.1, but you have torch 1.8.0+cu111 which is incompatible.[0m[31m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m64.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 KB[0m [31m29.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.5/114.5 KB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
[2K  

Download the official ConvNeXt respository. 

In [2]:
!git clone https://github.com/facebookresearch/ConvNeXt

Cloning into 'ConvNeXt'...
remote: Enumerating objects: 252, done.[K
remote: Counting objects: 100% (249/249), done.[K
remote: Compressing objects: 100% (117/117), done.[K
remote: Total 252 (delta 129), reused 193 (delta 111), pack-reused 3[K
Receiving objects: 100% (252/252), 69.37 KiB | 13.87 MiB/s, done.
Resolving deltas: 100% (129/129), done.


# 🏀 Download the Dataset

We will be finetuning on CIFAR-10 dataset. To use any custom dataset (CIFAR-10 here) the format of the dataset should be as shown below:

```
/path/to/dataset/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg
```



In [3]:
!pip install --upgrade git+https://github.com/MedMNIST/MedMNIST.git

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/MedMNIST/MedMNIST.git
  Cloning https://github.com/MedMNIST/MedMNIST.git to /tmp/pip-req-build-vu841h64
  Running command git clone --filter=blob:none --quiet https://github.com/MedMNIST/MedMNIST.git /tmp/pip-req-build-vu841h64
  Resolved https://github.com/MedMNIST/MedMNIST.git to commit 16e3ead23ceb3e1c5f7b9b04032c30cea7a4b1d8
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting fire
  Downloading fire-0.5.0.tar.gz (88 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m88.3/88.3 KB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: medmnist, fire
  Building wheel for medmnist (setup.py) ... [?25l[?25hdone
  Created wheel for medmnist: filename=medmnist-2.1.0-py3-none-any.whl size=21734 sha256=0faf481aa1583c53b8b409745a0b02d27e408be7e

In [2]:
from tqdm import tqdm
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data
import torchvision.transforms as transforms

import medmnist
from medmnist import INFO, Evaluator
data_flag = 'organsmnist'
# data_flag = 'dermamnist'
# data_flag = 'breastmnist'
download = True
info = INFO[data_flag]
task = info['task']
n_channels = info['n_channels']
n_classes = len(info['label'])

DataClass = getattr(medmnist, info['python_class'])
# preprocessing
data_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[.5], std=[.5])
])
# load the data
train_dataset = DataClass(split='train', transform=data_transform, download=download)
test_dataset = DataClass(split='test', transform=data_transform, download=download)

pil_dataset = DataClass(split='train', download=download)

Downloading https://zenodo.org/record/6496656/files/organsmnist.npz?download=1 to /root/.medmnist/organsmnist.npz


  0%|          | 0/16528536 [00:00<?, ?it/s]

Using downloaded and verified file: /root/.medmnist/organsmnist.npz
Using downloaded and verified file: /root/.medmnist/organsmnist.npz


In [3]:
!python -m medmnist save --flag=organsmnist --folder=MedMNIST/ --postfix=jpeg

Saving organsmnist train...
100% 13940/13940 [00:02<00:00, 6300.12it/s]
Saving organsmnist val...
100% 2452/2452 [00:00<00:00, 6451.76it/s]
Saving organsmnist test...
100% 8829/8829 [00:01<00:00, 6655.95it/s]


In [1]:
#以下是数据集格式整理脚本，目的是把数据集变成类似CIFAR-10的格式，忽略mv: cannot stat 'val802_6.jpeg': No such file or directory等等输出即可
%cd /content/MedMNIST/organsmnist/
%mkdir train val test
%cd /content/MedMNIST/organsmnist/train/
%mkdir class1 class2 class3 class4 class5 class6 class7
%cd /content/MedMNIST/organsmnist/test/
%mkdir class1 class2 class3 class4 class5 class6 class7
%cd /content/MedMNIST/organsmnist/val/
%mkdir class1 class2 class3 class4 class5 class6 class7
%cd /content/MedMNIST/organsmnist/
%mv train{0..7006}_0.jpeg /content/MedMNIST/organsmnist/train/class1/
%mv train{0..7006}_1.jpeg /content/MedMNIST/organsmnist/train/class2/
%mv train{0..7006}_2.jpeg /content/MedMNIST/organsmnist/train/class3/
%mv train{0..7006}_3.jpeg /content/MedMNIST/organsmnist/train/class4/
%mv train{0..7006}_4.jpeg /content/MedMNIST/organsmnist/train/class5/
%mv train{0..7006}_5.jpeg /content/MedMNIST/organsmnist/train/class6/
%mv train{0..7006}_6.jpeg /content/MedMNIST/organsmnist/train/class7/
%cd /content/MedMNIST/organsmnist/
%mv test{0..2004}_0.jpeg /content/MedMNIST/organsmnist/test/class1/
%mv test{0..2004}_1.jpeg /content/MedMNIST/organsmnist/test/class2/
%mv test{0..2004}_2.jpeg /content/MedMNIST/organsmnist/test/class3/
%mv test{0..2004}_3.jpeg /content/MedMNIST/organsmnist/test/class4/
%mv test{0..2004}_4.jpeg /content/MedMNIST/organsmnist/test/class5/
%mv test{0..2004}_5.jpeg /content/MedMNIST/organsmnist/test/class6/
%mv test{0..2004}_6.jpeg /content/MedMNIST/organsmnist/test/class7/
%cd /content/MedMNIST/organsmnist/
%mv val{0..1002}_0.jpeg /content/MedMNIST/organsmnist/val/class1/
%mv val{0..1002}_1.jpeg /content/MedMNIST/organsmnist/val/class2/
%mv val{0..1002}_2.jpeg /content/MedMNIST/organsmnist/val/class3/
%mv val{0..1002}_3.jpeg /content/MedMNIST/organsmnist/val/class4/
%mv val{0..1002}_4.jpeg /content/MedMNIST/organsmnist/val/class5/
%mv val{0..1002}_5.jpeg /content/MedMNIST/organsmnist/val/class6/
%mv val{0..1002}_6.jpeg /content/MedMNIST/organsmnist/val/class7/

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
mv: cannot stat 'val15_2.jpeg': No such file or directory
mv: cannot stat 'val16_2.jpeg': No such file or directory
mv: cannot stat 'val17_2.jpeg': No such file or directory
mv: cannot stat 'val18_2.jpeg': No such file or directory
mv: cannot stat 'val19_2.jpeg': No such file or directory
mv: cannot stat 'val20_2.jpeg': No such file or directory
mv: cannot stat 'val21_2.jpeg': No such file or directory
mv: cannot stat 'val22_2.jpeg': No such file or directory
mv: cannot stat 'val23_2.jpeg': No such file or directory
mv: cannot stat 'val24_2.jpeg': No such file or directory
mv: cannot stat 'val25_2.jpeg': No such file or directory
mv: cannot stat 'val26_2.jpeg': No such file or directory
mv: cannot stat 'val27_2.jpeg': No such file or directory
mv: cannot stat 'val28_2.jpeg': No such file or directory
mv: cannot stat 'val29_2.jpeg': No such file or directory
mv: cannot stat 'val30_2.jpeg': No such file or directory
mv: can

# 🏈 Download Pretrained Weights

We will be finetuning the ConvNeXt Tiny model pretrained on ImageNet 1K dataset.

In [4]:
#先检查一下/content/MedMNIST/dermamnist文件夹是否把所有图片归类到train val test
%cd /content/ConvNeXt/
#下面是下载预训练模型，需要用到在Imagenet-1k上预训练模型，否则效果不好(可以去掉试试)
!wget https://dl.fbaipublicfiles.com/convnext/convnext_small_22k_224.pth

/content/ConvNeXt
--2023-03-30 12:04:34--  https://dl.fbaipublicfiles.com/convnext/convnext_small_22k_224.pth
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 18.165.83.35, 18.165.83.79, 18.165.83.44, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|18.165.83.35|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 265112135 (253M) [binary/octet-stream]
Saving to: ‘convnext_small_22k_224.pth.1’


2023-03-30 12:04:35 (250 MB/s) - ‘convnext_small_22k_224.pth.1’ saved [265112135/265112135]



# 🎾 Train with Weights and Biases

If you want to log the train and evaluation metrics using Weights and Biases pass `--enable_wandb true`. 

You can also save the finetuned checkpoints as version controlled W&B [Artifacts](https://docs.wandb.ai/guides/artifacts) if you pass `--wandb_ckpt true`.



In [5]:
!python main.py --epochs 30 \
                --model convnext_small \
                --data_set image_folder \
                --data_path /content/MedMNIST/organsmnist/train \
                --eval_data_path /content/MedMNIST/organsmnist/test \
                --nb_classes 7 \
                --num_workers 8 \
                --warmup_epochs 0 \
                --save_ckpt true \
                --output_dir model_ckpt \
                --cutmix 0 \
                --mixup 0 --lr 4e-4 \
                --enable_wandb true --wandb_ckpt true \
                --finetune convnext_small_22k_224.pth 

Not using distributed mode
Namespace(batch_size=64, epochs=30, update_freq=1, model='convnext_small', drop_path=0, input_size=224, layer_scale_init_value=1e-06, model_ema=False, model_ema_decay=0.9999, model_ema_force_cpu=False, model_ema_eval=False, opt='adamw', opt_eps=1e-08, opt_betas=None, clip_grad=None, momentum=0.9, weight_decay=0.05, weight_decay_end=None, lr=0.0004, layer_decay=1.0, min_lr=1e-06, warmup_epochs=0, warmup_steps=-1, color_jitter=0.4, aa='rand-m9-mstd0.5-inc1', smoothing=0.1, train_interpolation='bicubic', crop_pct=None, reprob=0.25, remode='pixel', recount=1, resplit=False, mixup=0.0, cutmix=0.0, cutmix_minmax=None, mixup_prob=1.0, mixup_switch_prob=0.5, mixup_mode='batch', finetune='convnext_small_22k_224.pth', head_init_scale=1.0, model_key='model|module', model_prefix='', data_path='/content/MedMNIST/organsmnist/train', eval_data_path='/content/MedMNIST/organsmnist/test', nb_classes=7, imagenet_default_mean_and_std=True, data_set='image_folder', output_dir='mo

In [6]:
#装载Google dirve
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [7]:
#保存
%cd /content
%cp -r MedMNIST drive/MyDrive/MedMNIST
%cp -r ConVNeXt drive/MyDrive/ConVNeXt

/content
cp: cannot stat 'ConVNeXt': No such file or directory


# 🏐 Conclusion

* **The above setting gives a top-1 accuracy of ~95%.**
* The ConvNeXt repository comes with modern training regimes and is easy to finetune on any dataset. 
* The finetune model achieves competitive results. 

* By passing two arguments you get the following:

  * Repository of all your experiments (train and test metrics) as a [W&B Project](https://docs.wandb.ai/ref/app/pages/project-page). You can easily compare experiments to find the best performing model.
  * Hyperparameters (Configs) used to train individual models. 
  * System (CPU/GPU/Disk) metrics.
  * Model checkpoints saved as W&B Artifacts. They are versioned and easy to share. 

  Check out the associated [W&B run page](https://wandb.ai/ayut/convnext/runs/16vi9e31). $→$