New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented simple nodule segmentation algorithm #142

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@dchansen
Copy link
Contributor

dchansen commented Sep 29, 2017

This pull request implements a simple level-set based segmentation algorithm for nodule segmentation.

Description

This implements a simple segmentation algorithm based on SITK, and returns binary mask, diameter and volume.

Reference to official issue

#3

Motivation and Context

Part of minimal viable product.

How Has This Been Tested?

The segmentation has been tested manually on a few LIDC images. A better approach would be a test based on pylidc and report dice values.

Screenshots (if appropriate):

Nodule Segmentation

CLA

  • I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well

init_ls = sitk.SignedMaurerDistanceMap(seg, insideIsPositive=True, useImageSpacing=True)
lsFilter = sitk.ThresholdSegmentationLevelSetImageFilter()
lsFilter.SetLowerThreshold(-600)

This comment has been minimized.

@WGierke

WGierke Sep 29, 2017

Contributor

Could you explain these magic numbers in meaningful constant variables, please?

This comment has been minimized.

@lamby

lamby Oct 1, 2017

Contributor

Mmm, please do so :)

@WGierke

This comment has been minimized.

Copy link
Contributor

WGierke commented Sep 29, 2017

Nice stuff! Do you have some measurement for the accuracy of your segmentation algorithm on the LIDC dataset? That would make it easier to compare it against further approaches :)

@dchansen

This comment has been minimized.

Copy link
Contributor

dchansen commented Sep 29, 2017

@WGierke This was very much a minimal implementation, so I haven't tested it that thoroughly.
Maybe we could open a seperate issue to create evaluation scripts for this feature?
I'll add some documentation for the variables Monday.

@vessemer

This comment has been minimized.

Copy link
Contributor

vessemer commented Sep 29, 2017

@WGierke, I guess, the Hausdorff distance or the simple Dice coefficient will suit it well. If you are mind to open an issue aimed at segmentation evaluation, I'll be glad to implement this feature.

@WGierke WGierke referenced this pull request Sep 30, 2017

Closed

Segmentation Evaluation #143

0 of 3 tasks complete

init_ls = sitk.SignedMaurerDistanceMap(seg, insideIsPositive=True, useImageSpacing=True)
lsFilter = sitk.ThresholdSegmentationLevelSetImageFilter()
lsFilter.SetLowerThreshold(-600)

This comment has been minimized.

@lamby

lamby Oct 1, 2017

Contributor

Mmm, please do so :)

@dchansen dchansen force-pushed the dchansen:nodule_segmentation branch from 2ce4e30 to 04a289b Oct 2, 2017

:param nodule: dict containing the nodule coordinates
:return:
"""

This comment has been minimized.

@reubano

reubano Oct 2, 2017

Contributor

Would you mind using Google Style Docstrings?

This comment has been minimized.

@dchansen

dchansen Oct 3, 2017

Contributor

Sorry. Oversight on my part. Fixed.

@dchansen dchansen force-pushed the dchansen:nodule_segmentation branch from 04a289b to fa06ffc Oct 3, 2017

lsFilter.SetUpperThreshold(upper_threshold)
# Sets the error tolerance and number of iterations for the levelset solver
lsFilter.SetMaximumRMSError(0.02)
lsFilter.SetNumberOfIterations(1000)

This comment has been minimized.

@lamby

lamby Oct 3, 2017

Contributor

I think someone previously mentioned here in review about explaining or making these constants/or at least documenting the magic numbers a bit? See also BinaryDilat(seg, 3). Three!

This comment has been minimized.

@dchansen

dchansen Oct 3, 2017

Contributor

@lamby, I am a little unsure what you are asking.
The algorithm is iterative, and RMSError and number of iterations are the stopping criteria. None of the parameters are "special", and it seems a bit weird to have a line like

maximumRMSError = 0.02
lsFilter.SetMaximumRMSError(maximumRMSError)

Most of the parameters were taken from the SITK tutorial on segmentation, but some (curvature and thresholds) were changed to work better for lung segmentation.
The parameters are either very easy to understand (the thresholds), or quite mathematically involved (curvature). I've tried to add a few more comments, but as I said, I'm not entirely sure what you're looking for.

This comment has been minimized.

@reubano

reubano Oct 3, 2017

Contributor

In this case it would be upper case for constants.... so MAX_RMS_ERROR perhaps. It may look weird at first, but it's usually best to declare constants at the top of the function or file.

This comment has been minimized.

@lamby

lamby Oct 4, 2017

Contributor

👍

This comment has been minimized.

@dchansen

dchansen Oct 4, 2017

Contributor

Hmm... not quite sure it increases readability in this case. After more consideration, I've moved the parameters to the args of the function, with reasonable defaults.

volumes.append(volume)
diameters.append(diameter)
binary_mask |= sitk.GetArrayFromImage(mask).astype(bool)

segment_path = os.path.join(os.path.dirname(__file__),

This comment has been minimized.

@lamby

lamby Oct 3, 2017

Contributor

Out of interest, don't we have some kind of settings.TEST_DATA directory setting? :)

This comment has been minimized.

@dchansen

dchansen Oct 3, 2017

Contributor

Not to my knowledge? But this is not test data at all - this is the binary mask produced by the algorithm. Ideally, we would use a temporary file, but someone has to clean it up after use, so I thought it would be better to just reuse the same filename.

This comment has been minimized.

@reubano

reubano Oct 3, 2017

Contributor

What's wrong with a tempfile? Specifically, tempfile.NamedTemporaryFile

This function operates exactly as TemporaryFile() does, except that the file is guaranteed to have a visible name in the file system (on Unix, the directory entry is not unlinked). That name can be retrieved from the name attribute of the returned file-like object. Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later). If delete is true (the default), the file is deleted as soon as it is closed.

This comment has been minimized.

@dchansen

dchansen Oct 4, 2017

Contributor

The only problem with tempfile is who has the responsibility of removing it? The client would then have to do the cleanup, wouldn't it?
I can certainly implement this, but I just didn't want us ending up in a situation where we have a harddrive filled with random binary masks.

This comment has been minimized.

@reubano

reubano Oct 4, 2017

Contributor

Edited the above comment to highlight this part

If delete is true (the default), the file is deleted as soon as it is closed.

Is the problem that the file object would never be closed?

This comment has been minimized.

@dchansen

dchansen Oct 4, 2017

Contributor

Well, more likely we would close the file after writing, which would delete it before the client could load it.

This comment has been minimized.

@lamby

lamby Oct 5, 2017

Contributor

I've implemented something similar here:

https://anonscm.debian.org/git/reproducible/diffoscope.git/tree/diffoscope/tempfiles.py#n29

.. notice how it adds files to a global which is called when we exit.

This comment has been minimized.

@dchansen

dchansen Oct 5, 2017

Contributor

Ok, I've done something similar, and added cleanup via atexit. Still means we will have to restart the service occasionally, but I guess that shouldn't be a problem.

@dchansen dchansen force-pushed the dchansen:nodule_segmentation branch from 17bd8bb to c63822a Oct 4, 2017

@drivendataorg drivendataorg deleted a comment from dchansen Oct 5, 2017

@dchansen dchansen force-pushed the dchansen:nodule_segmentation branch 2 times, most recently from 7b20edb to cf295ce Oct 5, 2017

binary_mask |= sitk.GetArrayFromImage(mask).astype(bool)

segment_path = tempfile.NamedTemporaryFile(suffix='.npy')
_FILES.append(segment_path)

This comment has been minimized.

@lamby

lamby Oct 6, 2017

Contributor

Sorry to mess you around here, but once I saw this implementation I just can't quite help feeling it doesn't feel right. :(

I'm not really sure about the best solution. Can't we just pass around the TemporaryFile object? Or change the API such that people are passing it into this method? At the very very least we should factor our the _FILES mangling into a global utility (eg. get_named_temporary_file) that isn't local to this module.

Other suggestions highly highly welcome!

This comment has been minimized.

@dchansen

dchansen Oct 6, 2017

Contributor

The problem is that our output is just the path to the mask - we have no way of knowing when the client is done with the file.
I think it makes sense to move this into a global utility instead.

This comment has been minimized.

@reubano

reubano Oct 6, 2017

Contributor

Just to throw out another idea, how about having a segment_path kwarg? something like:

def predict(*args, segment_path='default_path.npy'):
    pass

Plus some additional boilerplate to ensure it gets placed into the correct directory.

This comment has been minimized.

@dchansen

dchansen Oct 6, 2017

Contributor

@lamby Where would we want such a util to live? prediction/src/utils.py?

@reubano I think we still have the fundamental problem that either the client needs to clean up, or the prediction service does?

This comment has been minimized.

@lamby

lamby Oct 7, 2017

Contributor

@dchansen Sounds reasonable, does that "feel" good when you look at the code? Could probably live one level up if we have shared utils...?

This comment has been minimized.

@dchansen

dchansen Oct 10, 2017

Contributor

Ok. Moved the functionality into utils. I think this is a reasonable solution?

This comment has been minimized.

@reubano

reubano Oct 11, 2017

Contributor

Looks good... but you still need the exit handler right? FYI, it also works as a decorator. From the docs:

import atexit

@atexit.register
def goodbye():
    print("You are now leaving the Python sector.")

This comment has been minimized.

@dchansen

dchansen Oct 12, 2017

Contributor

No, that was a mistake on my part. All the files are closed (and thus deleted) when the program exits, so this should be ok.

This comment has been minimized.

@lamby

lamby Oct 12, 2017

Contributor

No, that was a mistake on my part. All the files are closed (and thus deleted) when the program exits, so this should be ok.

Can you add that to the docstring to the util you created?

Also, what's the point of _FILES if we aren't cleaning up?

@dchansen dchansen force-pushed the dchansen:nodule_segmentation branch from cf295ce to 29ffd7d Oct 10, 2017

@lamby

This comment has been minimized.

Copy link
Contributor

lamby commented Oct 12, 2017

No, that was a mistake on my part. All the files are closed (and thus deleted) when the program exits, so this should be ok.

Can you add that to the docstring to the util you created? Wait, what's the point of _FILES if we aren't cleaning up then?

I'm confused :)

@lamby

This comment has been minimized.

Copy link
Contributor

lamby commented Oct 18, 2017

@dchansen Gentle ping here :)

@dchansen

This comment has been minimized.

Copy link
Contributor

dchansen commented Oct 18, 2017

I am closing the pull request.
The point of it was to implement something simple, to quickly get the prediction service fully featured - 19 days later I think we somewhat missed that.
It is also directly competing with pull request #147, but with no objective way to compare them (we can't use the LIDC data).
Either way, whichever pull request does not get selected will essentially be wasted work.

@dchansen dchansen closed this Oct 18, 2017

@reubano reubano referenced this pull request Oct 21, 2017

Merged

#3 Segmentation Algorithm #147

1 of 1 task complete
@reubano

This comment has been minimized.

Copy link
Contributor

reubano commented Oct 21, 2017

Hi @dchansen please see my comment here.

@reubano reubano referenced this pull request Oct 25, 2017

Closed

Segmentation doesn't work across all valid nodules #187

0 of 1 task complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment