Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
#3 Segmentation Algorithm #147
As demanded by the ticket, I implemented a very simple segmentation algorithm that can be trained using Keras, its model is saved in the
I created training data pairs based on the LIDC dataset using pylidc. The label image is a binary mask whose pixels are 1 if there is an annotation in the LIDC dataset whose malice is bigger or equal to 3 (out of 5). The label images are saved in the
Reference to official issue
This addresses #3
Motivation and Context
We want to suggest the radiologist to have a look at nodules that are automatically found by an algorithm of us.
How Has This Been Tested?
I wrote a test that checks whether the response of the segmentation endpoint has the correct format.
I'd highly appreciate any feedback!
This looks great.
I think a 3D U-Net (sometimes called V-NET?) would probably take the prize for segmentation, but as you say, memory is a concern with 1mm voxel sizes. I don't think 32Gb memory use for evaluation can be right though - I'm currently training a VNET on 2 GPUS for an unrelated project, and with a batchsize of 4, I'm, still not out of memory. For inference, the memory use for a 144x144x144 volume should be less than 1Gb.
I am wondering how we benchmark these approaches fairly. We can't really evaluate on LIDC, as that's what everyone will use to train the models. Do we have a good validation dataset? I only have access to radiotherapy ones, and that's not quite the same thing.
Thanks @dchansen !
I'll check my exact configuration and memory as soon as I can, though it might not be before Wednesday. Sorry. I think it was 128x128x128, but I'll check. Total memory was 2x12Gb.
I believe part of the efficiency of VNET comes from the use of strided convolutions, rather than pooling layers. I do know that strided convolutions can lead to checkerboard artefacts though, so it might not give as nice results. The original VNET paper used a side of 128x128x64, and was able to fit 2 batches in 8Gb of memory for training. That should be enough to fit most (all?) nodules.
My concern with LIDC is that it might encourage overfitting to that dataset. Doing something like 5-fold cross validation would be quite difficult, as some of these models literally take weeks to train on a single Titan X.
if CROP_SIZE != CUBE_SIZE: cube_img = helpers.rescale_patient_images2
stuff need comments? If we're honest, this shouldn't even have made it to the current code base - I introduced it in #118 but only noticed now that it currently isn't executed and was probably used by Julian de Wit to play around with the
Hi @rracinskij ,
@rracinskij We use the scans of the LIDC dataset. You can already find 3 patient scans in tests/assets/test_image_data/full. The neat thing is that there is a library (pylidc) that makes it really easy to query scan and annotation information. After downloading the dataset, set up pylidc by creating a
referenced this pull request
Oct 18, 2017
It's looking good! With a few tweaks I got everything to run and was able to check the output of the routes via
Even though #142 won't be merged, I did like the simplicity of it since it doesn't seem to require training a model. I also like that it includes a
So my question to the both of you is this: Is there value in porting the
Also, one thing I wasn't able to resolve was the inability to directly compare both algorithms. This one errors when passed
prediction/src/algorithms/identify/prediction.py | 3 ++- prediction/src/algorithms/segment/src/training.py | 8 ++++++-- prediction/src/preprocess/lung_segmentation.py | 2 --
# diff --git a/prediction/src/algorithms/identify/prediction.py b/prediction/src/algorithms/identify/prediction.py # --- a/prediction/src/algorithms/identify/prediction.py # +++ b/prediction/src/algorithms/identify/prediction.py from keras.layers import Input, Convolution3D, MaxPooling3D, Flatten, AveragePoo from keras.metrics import binary_accuracy, binary_crossentropy, mean_absolute_error from keras.models import Model from keras.optimizers import SGD -from src.preprocess.lung_segmentation import rescale_patient_images + +from ...preprocess.lung_segmentation import rescale_patient_images CUBE_SIZE = 32 MEAN_PIXEL_VALUE = 41
# diff --git a/prediction/src/algorithms/segment/src/training.py b/prediction/src/algorithms/segment/src/training.py # --- a/prediction/src/algorithms/segment/src/training.py # +++ b/prediction/src/algorithms/segment/src/training.py import os import numpy as np import pylidc as pl -from config import Config from keras.callbacks import ModelCheckpoint from keras.models import load_model -from src.preprocess.lung_segmentation import save_lung_segments from .model import simple_model_3d +from ....preprocess.lung_segmentation import save_lung_segments + +try: + from .....config import Config +except ValueError: + from config import Config def get_data_shape():
# diff --git a/prediction/src/preprocess/lung_segmentation.py b/prediction/src/preprocess/lung_segmentation.py # --- a/prediction/src/preprocess/lung_segmentation.py # +++ b/prediction/src/preprocess/lung_segmentation.py from skimage.measure import label, regionprops from skimage.morphology import disk, binary_erosion, binary_closing from skimage.segmentation import clear_border -from ..algorithms.identify.helpers import rescale_patient_images