Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

U-Net model for lung lesion segmentation model does not run using colab #67

Closed
aficionadoai opened this issue Nov 10, 2020 · 8 comments
Closed
Labels
question Further information is requested

Comments

@aficionadoai
Copy link

aficionadoai commented Nov 10, 2020

I tried running the following command python run_net.py train --data_folder "COVID-19-20_v2/Train" --model_folder "runs" per the instructions and I get the output below in example A. It The model does not seem to be training. I also tried running the inference command python run_net.py infer --data_folder "COVID-19-20_v2/Validation" --model_folder "runs" and I get the error in example B.

When I check the runs folder, I do not see any indication that model ran or checkpoints saved.

I am using google Colab to train to the model.

example A

MONAI version: 0.3.0+57.g70650b8
Python version: 3.6.9 (default, Oct  8 2020, 12:12:24)  [GCC 8.4.0]
OS version: Linux (4.19.112+)
Numpy version: 1.18.5
Pytorch version: 1.7.0+cu101
MONAI flags: HAS_EXT = False, USE_COMPILED = False

Optional dependencies:
Pytorch Ignite version: 0.4.2
Nibabel version: 3.0.2
scikit-image version: 0.16.2
Pillow version: 7.0.0
Tensorboard version: 2.3.0
gdown version: 3.6.4
TorchVision version: 0.8.1+cu101
ITK version: NOT INSTALLED or UNKNOWN VERSION.
tqdm version: 4.51.0

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

INFO:root:training: image/label (199) folder: COVID-19-20_v2/Train
INFO:root:training: train 160 val 39, folder: COVID-19-20_v2/Train
INFO:root:batch size 2
Load and cache transformed data: 100% 160/160 [05:18<00:00,  1.99s/it]
Load and cache transformed data: 100% 39/39 [01:21<00:00,  2.10s/it]
BasicUNet features: (32, 32, 64, 128, 256, 32).
^C

example B

MONAI version: 0.3.0+57.g70650b8
Python version: 3.6.9 (default, Oct  8 2020, 12:12:24)  [GCC 8.4.0]
OS version: Linux (4.19.112+)
Numpy version: 1.18.5
Pytorch version: 1.7.0+cu101
MONAI flags: HAS_EXT = False, USE_COMPILED = False

Optional dependencies:
Pytorch Ignite version: 0.4.2
Nibabel version: 3.0.2
scikit-image version: 0.16.2
Pillow version: 7.0.0
Tensorboard version: 2.3.0
gdown version: 3.6.4
TorchVision version: 0.8.1+cu101
ITK version: NOT INSTALLED or UNKNOWN VERSION.
tqdm version: 4.51.0

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

Traceback (most recent call last):
  File "run_net.py", line 264, in <module>
    infer(data_folder=data_folder, model_folder=args.model_folder)
  File "run_net.py", line 179, in infer
    ckpt = ckpts[-1]
IndexError: list index out of range
@wyli
Copy link
Contributor

wyli commented Nov 10, 2020

for your reference the expected training outputs would be something like:

Python version: 3.6.10 |Anaconda, Inc.| (default, May  8 2020, 02:54:21)  [GCC 7.3.0]
OS version: Linux (4.15.0-50-generic)
Numpy version: 1.19.1
Pytorch version: 1.7.0a0+8deb4fe
MONAI flags: HAS_EXT = False, USE_COMPILED = False

Optional dependencies:
Pytorch Ignite version: 0.4.2
Nibabel version: 3.2.0
scikit-image version: 0.15.0
Pillow version: 8.0.1
Tensorboard version: 1.15.0+nv
gdown version: 3.12.2
TorchVision version: 0.8.0a0
ITK version: 5.1.1
tqdm version: 4.51.0

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

INFO:root:training: image/label (199) folder: COVID-19-20_v2/Train
INFO:root:training: train 160 val 39, folder: COVID-19-20_v2/Train
INFO:root:batch size 2
BasicUNet features: (32, 32, 64, 128, 256, 32).
INFO:root:epochs 500, lr 0.0001, momentum 0.95
INFO:ignite.engine.engine.SupervisedTrainer:Engine run resuming from iteration 0, epoch 0 until 500 epochs
INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 1/500, Iter: 1/80 -- train_loss: 1.5370 
INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 1/500, Iter: 2/80 -- train_loss: 1.5101 
INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 1/500, Iter: 3/80 -- train_loss: 1.4932 
...

@aficionadoai
Copy link
Author

@wyli

Thank you! Right now, I trying to figure out why I am getting ^C right after BasicUNet features: (32, 32, 64, 128, 256, 32).. That is the part that I'm trying to figure out.

@wyli
Copy link
Contributor

wyli commented Nov 10, 2020

@wyli

Thank you! Right now, I trying to figure out why I am getting ^C right after BasicUNet features: (32, 32, 64, 128, 256, 32).. That is the part that I'm trying to figure out.

I haven't tried it with a colab instance, but perhaps you need to put

!pip install "git+https://github.com/Project-MONAI/MONAI#egg=monai[nibabel,ignite,tqdm]"

as the very first cell

@aficionadoai
Copy link
Author

@wyli

I'm testing it out now to see if that works.

@aficionadoai
Copy link
Author

Using it on the first cell in Cola doesn't work

@aficionadoai
Copy link
Author

@wyli

Tentatively I think it may because I'm running into out of memory in Colab.

google colab setting a '^C' in the proccess

Colab finishes with a ^C

@wyli
Copy link
Contributor

wyli commented Nov 10, 2020

@wyli

Tentatively I think it may because I'm running into out of memory in Colab.

google colab setting a '^C' in the proccess

Colab finishes with a ^C

ok if that's the case, perhaps you could reduce the number of features for the network features=(32, 32, 64, 128, 256, 32),
btw the full training will take more than 24 hours, I'm not sure whether the colab instance supports this type of long session

@Nic-Ma Nic-Ma added the question Further information is requested label Nov 11, 2020
@wyli wyli changed the title U-Net model for lung lesion segmentation model does not run U-Net model for lung lesion segmentation model does not run using colab Nov 11, 2020
@wyli wyli closed this as completed Nov 11, 2020
@aficionadoai
Copy link
Author

@wyli I was able to get it working with colab. You have to increase the RAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants