New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous improvement of nodule segmentation and volume estimates #283

Open
caseyfitz opened this Issue Jan 10, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@caseyfitz
Copy link

caseyfitz commented Jan 10, 2018

After exploring the segmentation code under prediction/src/algorithms/segment/ we have identified a few outstanding issues related to the segmentation functionality and volume calculations. These issues are all interrelated, but we've tried to divide them into two general catagories (whose code paths start in segment/trained_model.py):

  1. Model architecture / complexity (trained_model.predict)

    • Model output shape (512, 512, 1024) - the .npy mask saved to segment_path - should not have 1024 slices. Most slices after 200 are uniform, for example in LIDC-IDRI-0003 with value 0.45197698 and an overall range around -0.35 to 0.8
    • The simple_3d_model.py and unet_3d_model.py each use the same best_model_Simple3DModel and make identical predictions. However, the full unet will only process some full size test images without throwing a MemoryError.
    • It may be too much to try and retrain a new model this late, but it is desireable to have at least one model that accepts any appropriately sized input and outputs the correct shape.
  2. Nodule volume calculation (trained_model.calculate_volume)

    • The naive approach using numpy.bincount, which calculates nodule volumes by summing non-zero values in the binary mask saved as lung-mask.npy, does not use centroid information and merely sums non-zero values in the scan, yielding a (poor) total centroid volume rather than the distinct volumes of each centroid in centroids. One negative impact of this is that for n centroids, the predicted volume is just this total volume, n times.
    • More advanced brute force approaches using convex hull (scipy.spatial.ConvexHull, skimage.morphology.convex_hull_image) are either too memory intensive or only work with 2d arrays. Plus, it's not clear that a standard convex hull approach would be best anyway, since the entire lungs aren't our interest, but subsets of the lungs (perhaps something like skimage.morphology.convex_hull_object, but this only works on 2d arrays).
    • The ideal function (as specified in the doc string for trained_model.calculate_volume) takes a list of centroids as inputs and calculates e.g. 3d connected components given those centroids.
  • Note that in the current Simple3DModel, masking of nodules does not perform well and it's possible that there is essentially one large connected component spanning ~200 slices.

The approach to exploring these issues has been to use an interactive jupyter notebook, rooted in the prediection directory of the application. From there, one can use from src.algorithms.segment.trained_model import predict to start playing with the outputs directly and testing changes on the fly. (Pro tip: use the magic %load_ext autoreload to autoreload the functions with your changes everytime you call them.)

And as always, please update documentation too with any new changes for easy points! (The segment predict docs are pretty weak right now.)

@vessemer

This comment has been minimized.

Copy link
Contributor

vessemer commented Jan 10, 2018

@caseyfitz did you carefully read the code of trained_model.calculate_volume? This code treat centroids as connected components.

@caseyfitz

This comment has been minimized.

Copy link

caseyfitz commented Jan 10, 2018

Ah, thanks @vessemer! I thought the functionality was clear to me but I must have been confused due to the fact that labels = [mask[centroid['x'], centroid['y'], centroid['z']] for centroid in centroids] was returning [1 1 1 1 1 1] on the six centroids I was passing it (for LIDC-0003). Didn't realize that scipy.ndimage.label has a default structure parameter representing squared connectivity, which should be sufficient for this stage of the project.

The problem then, seems to be that the image has only one connected component, yes? If so, then 2 in the issue statement above should be good to go for now (in which case I'll edit the issue) and the immediate problems are just those in 1.

Make sense?

@vessemer

This comment has been minimized.

Copy link
Contributor

vessemer commented Jan 10, 2018

Yes, sure. I'll add some comments in trained_model.calculate_volume with my next commit, since there is some obscurity :)

@WGierke

This comment has been minimized.

Copy link
Contributor

WGierke commented Jan 24, 2018

@caseyfitz Are you planning to merge the changes you did to the code base in your branch at some point to the master? And by the way: nice notebook! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment