<a href="https://colab.research.google.com/github/ssktotoro/neuro/blob/colab_notebook/Neuro%20UNet%20Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neuro UNet/ MeshnetTutorial

Authors: [Kevin Wang] (), [Alex Fedorov] (), [Sergey Kolesnikov](https://github.com/Scitator)

[![Catalyst logo](https://raw.githubusercontent.com/catalyst-team/catalyst-pics/master/pics/catalyst_logo.png)](https://github.com/catalyst-team/catalyst)

### Colab setup

First of all, do not forget to change the runtime type to GPU. <br/>
To do so click `Runtime` -> `Change runtime type` -> Select `\"Python 3\"` and `\"GPU\"` -> click `Save`. <br/>
After that you can click `Runtime` -> `Run all` and watch the tutorial.

## Requirements

Download and install the latest versions of catalyst and other libraries required for this tutorial.

In [1]:
%%bash 
git clone https://github.com/ssktotoro/neuro.git
git pull
pip install -r neuro/requirements/requirements.txt


Collecting alchemy==20.4
  Downloading https://files.pythonhosted.org/packages/e1/d0/29085429e2f6203ee206a4aa93cb20cdafbdc2aa649d7b20de24eeb7fb69/alchemy-20.4-py2.py3-none-any.whl
Collecting catalyst==20.10.1
  Downloading https://files.pythonhosted.org/packages/1c/1f/7c0591a256990e146b377c282f17e2cd2717b25ac7e489c97dc972ed7248/catalyst-20.10.1-py2.py3-none-any.whl (475kB)
Collecting reaction==20.2
  Downloading https://files.pythonhosted.org/packages/75/9b/c549eb02e2b5caf8e2dcfb6386fa82645ffaaf2e7fc3c6d682f0591d8187/reaction-20.2-py2.py3-none-any.whl
Collecting osfclient
  Downloading https://files.pythonhosted.org/packages/2d/2f/b24d24c6376f6087048e1aaf93b0a4a7a6a2f5709ef74b7a0bbe267f8d52/osfclient-0.0.4-py2.py3-none-any.whl
Collecting requests==2.22.0
  Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl (57kB)
Collecting deprecation
  Downloading https://files.pythonhosted.org/pa

Cloning into 'neuro'...
fatal: not a git repository (or any of the parent directories): .git
ERROR: google-colab 1.0.0 has requirement requests~=2.23.0, but you'll have requests 2.22.0 which is incompatible.
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.


In [2]:
from typing import Callable, List, Tuple

import os
import torch
import catalyst
from catalyst import utils

print(f"torch: {torch.__version__}, catalyst: {catalyst.__version__}")

os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # "" - CPU, "0" - 1 GPU, "0,1" - MultiGPU

SEED = 42
utils.set_global_seed(SEED)
utils.prepare_cudnn(deterministic=True)

torch: 1.7.0+cu101, catalyst: 20.10.1


# Dataset

We'll be using the Mindboggle 101 dataset for a multiclass 3d segmentation task.
The dataset can be downloaded off osf with the following command from osfclient after you register with osf.

`osf -p 9ahyp clone .`

Otherwise you can download it using a Catalyst utility `download-gdrive` which downloads a version from the Catalyst Google Drive

`usage: download-gdrive {FILE_ID} {FILENAME}`

In [3]:
cd neuro

/content/neuro


In [4]:
%%bash
mkdir Mindboggle_data 
mkdir -p data/Mindboggle_101/
osf -p 9ahyp clone Mindboggle_data/
cp -r Mindboggle_data/osfstorage/Mindboggle101_volumes/ data/Mindboggle_101/
find data/Mindboggle_101 -name '*.tar.gz'| xargs -i tar zxvf {} -C data/Mindboggle_101
find data/Mindboggle_101 -name '*.tar.gz'| xargs -i rm {}

OASIS-TRT-20_volumes/
OASIS-TRT-20_volumes/OASIS-TRT-20-3/
OASIS-TRT-20_volumes/OASIS-TRT-20-4/
OASIS-TRT-20_volumes/OASIS-TRT-20-5/
OASIS-TRT-20_volumes/OASIS-TRT-20-2/
OASIS-TRT-20_volumes/OASIS-TRT-20-13/
OASIS-TRT-20_volumes/OASIS-TRT-20-14/
OASIS-TRT-20_volumes/OASIS-TRT-20-15/
OASIS-TRT-20_volumes/OASIS-TRT-20-12/
OASIS-TRT-20_volumes/OASIS-TRT-20-7/
OASIS-TRT-20_volumes/OASIS-TRT-20-9/
OASIS-TRT-20_volumes/OASIS-TRT-20-8/
OASIS-TRT-20_volumes/OASIS-TRT-20-1/
OASIS-TRT-20_volumes/OASIS-TRT-20-6/
OASIS-TRT-20_volumes/OASIS-TRT-20-19/
OASIS-TRT-20_volumes/OASIS-TRT-20-17/
OASIS-TRT-20_volumes/OASIS-TRT-20-10/
OASIS-TRT-20_volumes/OASIS-TRT-20-11/
OASIS-TRT-20_volumes/OASIS-TRT-20-16/
OASIS-TRT-20_volumes/OASIS-TRT-20-20/
OASIS-TRT-20_volumes/OASIS-TRT-20-18/
OASIS-TRT-20_volumes/OASIS-TRT-20-18/labels.DKT31.manual.nii.gz
OASIS-TRT-20_volumes/OASIS-TRT-20-18/t1weighted_brain.MNI152.nii.gz
OASIS-TRT-20_volumes/OASIS-TRT-20-18/t1weighted.MNI152.nii.gz
OASIS-TRT-20_volumes/OASIS-TRT-20

0files [00:00, ?files/s]
  0%|          | 0.00/3.22M [00:00<?, ?bytes/s][A100%|██████████| 3.22M/3.22M [00:00<00:00, 208Mbytes/s]
1files [00:03,  3.43s/files]
  0%|          | 0.00/3.66k [00:00<?, ?bytes/s][A100%|██████████| 3.66k/3.66k [00:00<00:00, 19.4Mbytes/s]
2files [00:04,  2.68s/files]
  0%|          | 0.00/843M [00:00<?, ?bytes/s][A
  0%|          | 4.21M/843M [00:00<00:27, 31.0Mbytes/s][A
  1%|          | 8.40M/843M [00:00<00:29, 27.9Mbytes/s][A
  4%|▍         | 33.6M/843M [00:00<00:21, 37.8Mbytes/s][A
  7%|▋         | 56.0M/843M [00:00<00:15, 50.4Mbytes/s][A
  8%|▊         | 67.8M/843M [00:00<00:14, 55.3Mbytes/s][A
 11%|█         | 94.0M/843M [00:00<00:10, 72.4Mbytes/s][A
 13%|█▎        | 109M/843M [00:01<00:12, 58.6Mbytes/s] [A
 15%|█▍        | 126M/843M [00:01<00:09, 72.9Mbytes/s][A
 17%|█▋        | 140M/843M [00:01<00:08, 82.9Mbytes/s][A
 19%|█▉        | 160M/843M [00:01<00:06, 101Mbytes/s] [A
 21%|██        | 176M/843M [00:01<00:06, 104Mbyt

Run the prepare data script that limits the labels to the DKT human labels (60 labels).

`usage: python ../neuro/scripts/prepare_data.py ../data/Mindboggle_101 {N_labels)`

In [5]:
%%bash 

python neuro/scripts/prepare_data.py data/Mindboggle_101/ 60

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100


Import Catalyst and Torch utils for training

In [6]:
import torch
import collections

from catalyst.contrib.utils.pandas import read_csv_data
from torch.utils.data import RandomSampler
from torch.utils.data import DataLoader
from torchvision import transforms
from catalyst.data import Augmentor, ReaderCompose
from torch.optim.lr_scheduler import CosineAnnealingLR
from catalyst.dl import SupervisedRunner
from catalyst.callbacks.logging import TensorboardLogger
from catalyst.callbacks import SchedulerCallback, CheckpointCallback
from torchvision.transforms import ToTensor
from torch.nn import functional as F

In [7]:
from torch.nn import functional as F

Here we import a BrainDataSet, which reads T1 scans and labels and samples either random patches of 38x38x38 samples from them or nonoverlapping patches of 38x38x38 for validation.  More detail can be found in brain_dataset.py and generator_coords.py  

In [8]:
cd training/

/content/neuro/training


In [9]:
from brain_dataset import BrainDataset
from reader import NiftiReader_Image, NiftiReader_Mask
from custom_metrics import CustomDiceCallback
from model import UNet

In [10]:
open_fn = ReaderCompose(                                                                                                                                                                            
    readers=[                                                                                                                                                                                       
        NiftiReader_Image(input_key="images", output_key="images"),                                                                                                                                 
        NiftiReader_Mask(input_key="nii_labels", output_key="targets"),
    ]
)

In [11]:

def get_loaders(
    random_state: int,
    volume_shape: List[int],
    subvolume_shape: List[int],
    in_csv_train: str = None,                                                                                                                                                                           
    in_csv_valid: str = None,                                                                                                                                                                           
    in_csv_infer: str = None,
    batch_size: int = 16,
    num_workers: int = 10,
) -> dict:

    df, df_train, df_valid, df_infer = read_csv_data(                                                                                                                                                   
    in_csv_train=in_csv_train,                                                                                                                                                                      
    in_csv_valid=in_csv_valid,                                                                                                                                                                      
    in_csv_infer=in_csv_infer,                                                                                                                                                                      
    ) 

    datasets = {}

    train_dataset = BrainDataset(shared_dict={},                                                                                                                                                             
                    list_data=df_train, list_shape=volume_shape, list_sub_shape=subvolume_shape,                                                                                                              
                    open_fn=open_fn,                                                                                                         
                    n_samples=100, mode='train', input_key="images",                                                                                                                                     
                    output_key="targets")
    valid_dataset = BrainDataset(shared_dict={},                                                                                                                                                             
                    list_data=df_valid, list_shape=volume_shape, list_sub_shape=subvolume_shape,                                                                                                              
                    open_fn=open_fn,                                                                                                         
                    n_samples=100, mode='valid', input_key="images",                                                                                                                                     
                    output_key="targets")
    test_dataset = BrainDataset(shared_dict={},                                                                                                                                                             
                    list_data=df_infer, list_shape=volume_shape, list_sub_shape=subvolume_shape,                                                                                                              
                    open_fn=open_fn,                                                                                                         
                    n_samples=100, mode='valid', input_key="images",                                                                                                                                     
                    output_key="targets")

    train_random_sampler = RandomSampler(data_source=train_dataset,                                                                                                                                   
                                          replacement=True,
                                          num_samples=80 * 128)

    valid_random_sampler = RandomSampler(data_source=valid_dataset,  
                                          replacement=True,
                                          num_samples=20*216)

    train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, sampler=train_random_sampler, 
                              num_workers=10, pin_memory=True)
    valid_loader = DataLoader(dataset=valid_dataset, batch_size=batch_size, sampler=valid_random_sampler, 
                              num_workers=10, pin_memory=True,drop_last=True)
    loaders = collections.OrderedDict()
    loaders["train"] = train_loader
    loaders["valid"] = valid_loader

    return loaders

In [12]:
loaders = get_loaders(0, [256, 256, 256], [38, 38, 38], 
                      "../data/dataset_train.csv", "../data/dataset_valid.csv", "../data/dataset_infer.csv", )

# Model Training

We'll train the model 30 epochs

An Adam Optimizer with a cosine annealing schedule starting at a learning rate of .01 is used for this experiment.

CrossEntropyLoss is the criterion/ loss function be minimized 

In [13]:
cd ..

/content/neuro


In [14]:
class CustomRunner(catalyst.dl.Runner):

    def predict_batch(self, batch):
        # model inference step
        return self.model(batch[0].to(self.device))

    def _handle_batch(self, batch):
        # model train/valid step
        x, y = batch['images'], batch['targets']
        y_hat = self.model(x)

        loss = F.cross_entropy(y_hat, y)
        self.batch_metrics.update({"loss": loss, })

        if self.is_train_loader:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()

In [None]:
unet = UNet(n_channels=1, n_classes=60)

num_epochs = 30
logdir = "logs/unet"

optimizer = torch.optim.Adam(unet.parameters(), lr=0.01, weight_decay=0.0001)
scheduler = CosineAnnealingLR(optimizer, T_max=30)

callbacks = [
    TensorboardLogger(),
    SchedulerCallback(reduced_metric='loss'),
    CustomDiceCallback(),
    CheckpointCallback(),
]
runner = CustomRunner()
runner.train(model=unet, optimizer=optimizer, loaders=loaders, num_epochs=30, logdir=logdir, verbose=True)

1/30 * Epoch (train):   1% 6/640 [18:17<32:10:12, 182.67s/it, loss=2.083]