Skip to content

Commit

Permalink
model parallel training research (#616)
Browse files Browse the repository at this point in the history
* adds network

* adds basic training

* update loading

* working prototype

* update validation set

* [MONAI] Add author; paper info; PDDCA18 (#6)

+ Author
+ Early accept
+ PDDCA18 link

* Update README.md

* adds network

* adds basic training

* update loading

* working prototype

* update validation set

* [MONAI] Update TRAIN_PATH, VAL_PATH (#8)

+ Update TRAIN_PATH, VAL_PATH

* [MONAI] Add data link (#7)

+ Add data link https://drive.google.com/file/d/1A2zpVlR3CkvtkJPvtAF3-MH0nr1WZ2Mn/view?usp=sharing

* fixes typos

* tested new dataset

* print more infor, checked new dataset

* [MONAI] Add paper link (#9)

Add paper link https://arxiv.org/abs/2006.12575

* [MONAI] Use dice loss + focal loss to train (#10)

Use dice loss + focal loss to train

* [MONAI] Support non-one-hot ground truth (#11)

Support non-one-hot ground truth

* fixes format and docstrings, adds argparser options

* resume the focal_loss

* adds tests

* [MONAI] Support non-one-hot ground truth (#11)

Support non-one-hot ground truth

* adds tests

* update docstring

* [MONAI] Keep track of best validation scores (#12)

Keep track of best validation scores

* model saving

* adds window sampling

* update readme

* update docs

* fixes flake8 error

* update window sampling

* fixes model name

* fixes channel size issue

* [MONAI] Update --pretrain, --lr (#13)

+ lr from 5e-4 to 1e-3 because we use mean for class channel instead of sum for class channel.
+ pretrain path is consistent with current model_name.

* [MONAI] Pad image; elastic; best class model (#14)

* [MONAI] Pad image; elastic; best class model

+ Pad image bigger than crop_size, avoid potential issues in RandCropByPosNegLabeld
+ Use Rand3DElasticd
+ Save best model for each class

* Update train.py

Co-authored-by: Wenqi Li <wenqil@nvidia.com>

* flake8 fixes

* removes -1 cropsize deform

* testing commands

* fixes unit tests

* update spatial padding

* [MONAI] Add full image deform augmentation (#15)

+ Add full image deform augmentation by Rand3DElasticd
+ Please use latest MONAI in #623

* Adding py.typed

* updating setup.py to comply with black

* update based on comments

* excluding research from packaging

* update tests

* update setup.py

Co-authored-by: Wentao Zhu <wentaozhu1991@gmail.com>
Co-authored-by: Neil Tenenholtz <ntenenz@users.noreply.github.com>
Co-authored-by: Nic Ma <nma@nvidia.com>
  • Loading branch information
4 people committed Jun 26, 2020
1 parent f262355 commit 379c959
Show file tree
Hide file tree
Showing 8 changed files with 595 additions and 1 deletion.
53 changes: 53 additions & 0 deletions research/lamp-automated-model-parallelism/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# LAMP: Large Deep Nets with Automated Model Parallelism for Image Segmentation

<p>
<img src="./fig/acc_speed_han_0_5hor.png" alt="LAMP on Head and Neck Dataset" width="500"/>
</p>


> If you use this work in your research, please cite the paper.
A reimplementation of the LAMP system originally proposed by:

Wentao Zhu, Can Zhao, Wenqi Li, Holger Roth, Ziyue Xu, and Daguang Xu (2020)
"LAMP: Large Deep Nets with Automated Model Parallelism for Image Segmentation."
MICCAI 2020 (Early Accept, paper link: https://arxiv.org/abs/2006.12575)


## To run the demo:

### Prerequisites
- install the latest version of MONAI: `git clone https://github.com/Project-MONAI/MONAI` and `pip install -e .`
- `pip install torchgpipe`

### Data
```bash
mkdir ./data;
cd ./data;
```
Head and Neck CT dataset

Please download and unzip the images into `./data` folder.

- `HaN.zip`: https://drive.google.com/file/d/1A2zpVlR3CkvtkJPvtAF3-MH0nr1WZ2Mn/view?usp=sharing
```bash
unzip HaN.zip; # unzip
```

Please find more details of the dataset at https://github.com/wentaozhu/AnatomyNet-for-anatomical-segmentation.git


### Minimal hardware requirements for full image training
- U-Net (`n_feat=32`): 2x 16Gb GPUs
- U-Net (`n_feat=64`): 4x 16Gb GPUs
- U-Net (`n_feat=128`): 2x 32Gb GPUs


### Commands
The number of features in the first block (`--n_feat`) can be 32, 64, or 128.
```bash
mkdir ./log;
python train.py --n_feat=128 --crop_size='64,64,64' --bs=16 --ep=4800 --lr=0.001 > ./log/YOURLOG.log
python train.py --n_feat=128 --crop_size='128,128,128' --bs=4 --ep=1200 --lr=0.001 --pretrain='./HaN_32_16_1200_64,64,64_0.001_*' > ./log/YOURLOG.log
python train.py --n_feat=128 --crop_size='-1,-1,-1' --bs=1 --ep=300 --lr=0.001 --pretrain='./HaN_32_16_1200_64,64,64_0.001_*' > ./log/YOURLOG.log
```
10 changes: 10 additions & 0 deletions research/lamp-automated-model-parallelism/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Copyright 2020 MONAI Consortium
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
66 changes: 66 additions & 0 deletions research/lamp-automated-model-parallelism/data_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Copyright 2020 MONAI Consortium
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
import numpy as np
from monai.transforms import DivisiblePad

STRUCTURES = (
"BrainStem",
"Chiasm",
"Mandible",
"OpticNerve_L",
"OpticNerve_R",
"Parotid_L",
"Parotid_R",
"Submandibular_L",
"Submandibular_R",
)


def get_filenames(path, maskname=STRUCTURES):
"""
create file names according to the predefined folder structure.
Args:
path: data folder name
maskname: target structure names
"""
maskfiles = []
for seg in maskname:
if os.path.exists(os.path.join(path, "./structures/" + seg + "_crp_v2.npy")):
maskfiles.append(os.path.join(path, "./structures/" + seg + "_crp_v2.npy"))
else:
# the corresponding mask is missing seg, path.split("/")[-1]
maskfiles.append(None)
return os.path.join(path, "img_crp_v2.npy"), maskfiles


def load_data_and_mask(data, mask_data):
"""
Load data filename and mask_data (list of file names)
into a dictionary of {'image': array, "label": list of arrays, "name": str}.
"""
pad_xform = DivisiblePad(k=32)
img = np.load(data) # z y x
img = pad_xform(img[None])[0]
item = dict(image=img, label=[])
for idx, maskfnm in enumerate(mask_data):
if maskfnm is None:
ms = np.zeros(img.shape, np.uint8)
else:
ms = np.load(maskfnm).astype(np.uint8)
assert ms.min() == 0 and ms.max() == 1
mask = pad_xform(ms[None])[0]
item["label"].append(mask)
assert len(item["label"]) == 9
item["name"] = str(data)
return item
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 52 additions & 0 deletions research/lamp-automated-model-parallelism/test_unet_pipe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Copyright 2020 MONAI Consortium
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import unittest

import torch
from parameterized import parameterized

from unet_pipe import UNetPipe

TEST_CASES = [
[ # 1-channel 3D, batch 12
{"spatial_dims": 3, "out_channels": 2, "in_channels": 1, "depth": 3, "n_feat": 8},
torch.randn(12, 1, 32, 64, 48),
(12, 2, 32, 64, 48),
],
[ # 1-channel 3D, batch 16
{"spatial_dims": 3, "out_channels": 2, "in_channels": 1, "depth": 3},
torch.randn(16, 1, 32, 64, 48),
(16, 2, 32, 64, 48),
],
[ # 4-channel 3D, batch 16, batch normalisation
{"spatial_dims": 3, "out_channels": 3, "in_channels": 2},
torch.randn(16, 2, 64, 64, 64),
(16, 3, 64, 64, 64),
],
]


class TestUNETPipe(unittest.TestCase):
@parameterized.expand(TEST_CASES)
def test_shape(self, input_param, input_data, expected_shape):
net = UNetPipe(**input_param)
if torch.cuda.is_available():
net = net.to(torch.device("cuda"))
input_data = input_data.to(torch.device("cuda"))
net.eval()
with torch.no_grad():
result = net.forward(input_data.float())
self.assertEqual(result.shape, expected_shape)


if __name__ == "__main__":
unittest.main()
Loading

0 comments on commit 379c959

Please sign in to comment.