Fixed coordinates scaling for classification prediction #272

Serhiy-Shekhovtsov · 2018-01-03T21:36:32Z

We had a problem - the preprocessing is zooming the image. In other words, the pixel matrix is resized to real proportions. But coordinates we are using for prediction are not scaled accordingly and the actual patch we are making prediction on differs from the expected patch.
For example, the zooming factor for full LIDC-IDRI-0003 image is [2.5, 0.820312, 0.820312]. The prediction result for coordinates of the real nodule {'x': 367, 'y': 350, 'z': 72} is 0.0066. As you can see, it's very low. But if you scale these coordinates by the current zooming factor ({'x': 301, 'y': 286, 'z': 180}) we will get much higher probability of concerning 0.42.
More details here.

Reference to official issue

#268
TODO: Med prediction bugfix

CLA

I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well

WGierke · 2018-01-03T21:43:56Z

prediction/src/preprocess/preprocess_ct.py

@@ -34,7 +34,7 @@ class Params:
        preprocess.preprocess_dicom.Params
    """

-    def __init__(self, clip_lower=None, clip_upper=None, spacing=None, order=0,  # noqa: C901
+    def __init__(self, clip_lower=None, clip_upper=None, spacing=False, order=0,  # noqa: C901


According to the [doc string]https://github.com/Serhiy-Shekhovtsov/concept-to-clinic/blob/4887eeb6778013fce10e7e91fd846c84cb7cb248/prediction/src/preprocess/preprocess_ct.py#L19) spacing should be a float or float sequence. Could you update that, please?

Sure. Thanks.

Nice catch!

Serhiy-Shekhovtsov · 2018-01-04T09:43:44Z

Any comments on suggested approach in general? :)

caseyfitz · 2018-01-04T22:10:06Z

Thanks for these crucial updates, @Serhiy-Shekhovtsov! We had noticed the scaling issue on our end when testing the identification service, but had not assessed how pervasive the problem was.

Have you tested your update on a nodule_list with len(nodule_list) > 1? Manually testing the gtr123_model.predict(ct_path, nodule_list) method on my end works fine when using your test: test_classify_real_nodule_full_dicom(dicom_path_003, model_path).

However, that test uses the hard-coded nodule_list == [{'x': 369, 'y': 350, 'z': 5}]. When the length of the list is extended – either with a dummy nodule, e.g. [{'x': 369, 'y': 350, 'z': 5}, {'x': 0, 'y': 0, 'z': 0}], or with actual output from the identification service – prediction fails as follows:

~/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py in predict(ct_path, nodule_list, model_path)
    275     results = []
    276 
--> 277     for nodule, (cropped_image, coords) in zip(nodule_list, patches):
    278         cropped_image = Variable(torch.from_numpy(cropped_image[np.newaxis, np.newaxis]).float())
    279         cropped_image.volatile = True

ValueError: too many values to unpack (expected 2)

This happens because the contents of the list patches are inconsistent (some elements are tuples consisting of numpy.ndarray and coordinate lists, and some are just numpy.ndarray). Here is some ipynb debugging which demonstrates that the issue stems from the output of patches_from_ct (inputs are after ipydb>):

> /home/ubuntu/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py(274)predict()
    272     patches = patches_from_ct(ct_array, meta, config['crop_size'], centroids,
    273                               stride=config['stride'], pad_value=config['filling_value'])
--> 274     raise Exception
    275 
    276     results = []

ipdb> # forced exception to look at patches
ipdb> len(patches)
2
ipdb> len(patches[0])
2
ipdb> len(patches[1])
96
ipdb> type(patches[1])
<class 'numpy.ndarray'>
ipdb> type(patches[0][0])
<class 'numpy.ndarray'>
ipdb> type(patches[0][1])
<class 'numpy.ndarray'>
ipdb> patches[0][0].shape
(96, 96, 96)
ipdb> patches[0][1].shape
(3, 24, 24, 24)
ipdb> patches[1].shape
(96, 96, 96)
ipdb> q

We'll need to figure this out before merging so that identification and prediction can play nicely with each other 😉 Thanks again for your awesome work!

Serhiy-Shekhovtsov · 2018-01-05T07:23:12Z

@caseyfitz, yes I saw this problem. Will try to solve it today. And also will improve the approach a bit and fix the failing tests.

lamby · 2018-01-05T08:32:26Z

Thanks @Serhiy-Shekhovtsov :)

Serhiy-Shekhovtsov · 2018-01-05T12:55:50Z

@caseyfitz turned out, one little else at this line can solve the issue with multiple centroids. I will push the fix and tests soon.

More details [here](drivendataorg#272 (comment)).

Bug fixed. When the length of the nodules list is more then 1 – prediction fails as follows: ``` ~/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py in predict(ct_path, nodule_list, model_path) 275 results = [] 276 --> 277 for nodule, (cropped_image, coords) in zip(nodule_list, patches): 278 cropped_image = Variable(torch.from_numpy(cropped_image[np.newaxis, np.newaxis]).float()) 279 cropped_image.volatile = True ValueError: too many values to unpack (expected 2) ``` More details [here](drivendataorg#272 (comment)). ## CLA - [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well

Bug fixed. When the length of the nodules list is more then 1 – prediction fails as follows: ``` ~/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py in predict(ct_path, nodule_list, model_path) 275 results = [] 276 --> 277 for nodule, (cropped_image, coords) in zip(nodule_list, patches): 278 cropped_image = Variable(torch.from_numpy(cropped_image[np.newaxis, np.newaxis]).float()) 279 cropped_image.volatile = True ValueError: too many values to unpack (expected 2) ``` More details [here](#272 (comment)). ## CLA - [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well

lamby · 2018-01-05T21:52:42Z

turned out, one little else at this line can solve the issue with multiple centroids. I will push the fix and tests soon.

"with this one weird trick discovered by a schoolteacher", huh? Looking forward to tests and resolving the conflicts... :)

Serhiy-Shekhovtsov · 2018-01-07T18:35:09Z

@lamby finally got them all green.

reubano · 2018-01-08T06:44:48Z

Was the issue about converting spacing from a number to a boolean resolved? I wasn't sure if the back and forth reached a conclusion.

Serhiy-Shekhovtsov · 2018-01-08T07:57:49Z

@reubano converting to boolean is a fix for an issue of lost meta.spacing. Now params.spacing is boolean and it tells if we need to scale the image. But it doesn't wipe out the meta.spacing anymore.

reubano · 2018-01-08T13:18:11Z

@Serhiy-Shekhovtsov great! And just to be clear, this fixes #268 right? Is there any other test (besides that one) I should run to make sure this PR works as expected?

Serhiy-Shekhovtsov · 2018-01-08T13:45:41Z

To be precise - the reported issue was caused by wrong coordinates. The coordinates specified in the test were good for full size DICOM but test was running on a small image. But I have found an other issue with classification. It was confirmed by @caseyfitz here. This PR includes tests for real nodules. The problem was - coordinates were not scaled together with an image. So the centroid location was misinterpreted and the cropped patch was wrong, so was the prediction.

reubano · 2018-01-08T15:29:38Z

Ran this a few times and while @caseyfitz's use cases looks solved, the test_classify_real_nodule_full_dicom is failing for me:

sudo docker-compose -f local.yml run prediction pytest -vrsk src/tests/test_classification.py

_____________________________________ test_classify_real_nodule_full_dicom _____________________________________

dicom_paths = ['/images_full/LIDC-IDRI-0002/1.3.6.1.4.1.14519.5.2.1.6279.6001.490157381160200744295382098329/1.3.6.1.4.1.14519.5.2.1...14519.5.2.1.6279.6001.298806137288633453246975630178/1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192']
model_path = '/app/src/algorithms/classify/assets/gtr123_model.ckpt'

    def test_classify_real_nodule_full_dicom(dicom_paths, model_path):
        predicted = trained_model.predict(dicom_paths[2], [{'x': 367, 'y': 349, 'z': 75}], model_path)
        assert predicted
>       assert 0.3 <= predicted[0]['p_concerning'] <= 1
E       assert 0.3 <= 0.0021293163299560547

src/tests/test_classification.py:23: AssertionError

sudo sh tests/test_docker.sh

+ docker-compose -f local.yml run prediction pytest -rsx
Starting base ... 
Starting base ... done
============================================= test session starts ==============================================
platform linux -- Python 3.6.3, pytest-3.1.3, py-1.5.2, pluggy-0.4.0
rootdir: /app, inifile:
collected 59 items 

src/tests/test_classification.py ...F.
...

Serhiy-Shekhovtsov · 2018-01-08T20:04:39Z

@reubano, coordinates for this test has been hand crafted to match the real nodule on the third image: LIDC-IDRI-0003. But the glob method returns an unsorted list. By coincidence on your machine the third item is LIDC-IDRI-0002. I will fix the test to sort the list.

now machgrid is generated relativaly to symmetrical padding

lamby · 2018-01-09T03:22:12Z

@Serhiy-Shekhovtsov Is this ready to merge from your point of view? :)

Serhiy-Shekhovtsov · 2018-01-09T06:57:16Z

@lamby yes, it is.

reubano · 2018-01-09T13:22:51Z

Yes, tests pass for me now... maybe the travis error was a fluke? I just reran the build.

Serhiy-Shekhovtsov · 2018-01-09T14:26:44Z

@reubano it looks so.

reubano · 2018-01-09T14:36:46Z

great work!!

Serhiy-Shekhovtsov · 2018-01-09T14:37:48Z

Thank you!

Bug fixed. When the length of the nodules list is more then 1 – prediction fails as follows: ``` ~/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py in predict(ct_path, nodule_list, model_path) 275 results = [] 276 --> 277 for nodule, (cropped_image, coords) in zip(nodule_list, patches): 278 cropped_image = Variable(torch.from_numpy(cropped_image[np.newaxis, np.newaxis]).float()) 279 cropped_image.volatile = True ValueError: too many values to unpack (expected 2) ``` More details [here](drivendataorg#272 (comment)). ## CLA - [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well

…g#272) * updated gitignore to ignore jpyter notebook and temporary files * fixed coordinates scaling for classification * fixed mashgrid generation for cropped patch now machgrid is generated relativaly to symmetrical padding * added tests for real modules classification * fixed preprocessing tests

Serhiy-Shekhovtsov mentioned this pull request Jan 3, 2018

Classification throws RuntimeError for real nodule location #268

Closed

5 tasks

WGierke reviewed Jan 3, 2018

View reviewed changes

Serhiy-Shekhovtsov force-pushed the fixes/classification-prediction-fix branch 2 times, most recently from 0fbb81e to 436552a Compare January 4, 2018 20:44

Serhiy-Shekhovtsov added a commit to Serhiy-Shekhovtsov/concept-to-clinic that referenced this pull request Jan 5, 2018

fixed classification prediction for multiple nodules

debfede

More details [here](drivendataorg#272 (comment)).

Serhiy-Shekhovtsov mentioned this pull request Jan 5, 2018

fixed classification prediction for multiple nodules Serhiy-Shekhovtsov/concept-to-clinic#1

Closed

1 task

Serhiy-Shekhovtsov mentioned this pull request Jan 5, 2018

fixed classification prediction for multiple nodules #273

Merged

1 task

Serhiy-Shekhovtsov force-pushed the fixes/classification-prediction-fix branch from 436552a to 7671e9a Compare January 5, 2018 16:58

Serhiy-Shekhovtsov force-pushed the fixes/classification-prediction-fix branch 3 times, most recently from 741e59a to 7a53619 Compare January 7, 2018 17:46

Serhiy-Shekhovtsov added 4 commits January 8, 2018 22:22

updated gitignore to ignore jpyter notebook and temporary files

c4269f7

fixed coordinates scaling for classification

81883e7

fixed mashgrid generation for cropped patch

8d40000

now machgrid is generated relativaly to symmetrical padding

added tests for real modules classification

df27382

fixed preprocessing tests

245b5dd

Serhiy-Shekhovtsov force-pushed the fixes/classification-prediction-fix branch from 7a53619 to 245b5dd Compare January 8, 2018 20:24

reubano merged commit 413bc74 into drivendataorg:master Jan 9, 2018

vessemer mentioned this pull request Jan 26, 2018

Classification model pipeline #298

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed coordinates scaling for classification prediction #272

Fixed coordinates scaling for classification prediction #272

Serhiy-Shekhovtsov commented Jan 3, 2018 •

edited by reubano

Loading

WGierke Jan 3, 2018

Serhiy-Shekhovtsov Jan 3, 2018

lamby Jan 4, 2018

Serhiy-Shekhovtsov commented Jan 4, 2018

caseyfitz commented Jan 4, 2018

Serhiy-Shekhovtsov commented Jan 5, 2018

lamby commented Jan 5, 2018

Serhiy-Shekhovtsov commented Jan 5, 2018

lamby commented Jan 5, 2018

Serhiy-Shekhovtsov commented Jan 7, 2018

reubano commented Jan 8, 2018

Serhiy-Shekhovtsov commented Jan 8, 2018

reubano commented Jan 8, 2018

Serhiy-Shekhovtsov commented Jan 8, 2018 •

edited

Loading

reubano commented Jan 8, 2018 •

edited

Loading

Serhiy-Shekhovtsov commented Jan 8, 2018 •

edited

Loading

lamby commented Jan 9, 2018 •

edited

Loading

Serhiy-Shekhovtsov commented Jan 9, 2018

reubano commented Jan 9, 2018

Serhiy-Shekhovtsov commented Jan 9, 2018

reubano commented Jan 9, 2018

Serhiy-Shekhovtsov commented Jan 9, 2018

Fixed coordinates scaling for classification prediction #272

Fixed coordinates scaling for classification prediction #272

Conversation

Serhiy-Shekhovtsov commented Jan 3, 2018 • edited by reubano Loading

Reference to official issue

CLA

WGierke Jan 3, 2018

Choose a reason for hiding this comment

Serhiy-Shekhovtsov Jan 3, 2018

Choose a reason for hiding this comment

lamby Jan 4, 2018

Choose a reason for hiding this comment

Serhiy-Shekhovtsov commented Jan 4, 2018

caseyfitz commented Jan 4, 2018

Serhiy-Shekhovtsov commented Jan 5, 2018

lamby commented Jan 5, 2018

Serhiy-Shekhovtsov commented Jan 5, 2018

lamby commented Jan 5, 2018

Serhiy-Shekhovtsov commented Jan 7, 2018

reubano commented Jan 8, 2018

Serhiy-Shekhovtsov commented Jan 8, 2018

reubano commented Jan 8, 2018

Serhiy-Shekhovtsov commented Jan 8, 2018 • edited Loading

reubano commented Jan 8, 2018 • edited Loading

Serhiy-Shekhovtsov commented Jan 8, 2018 • edited Loading

lamby commented Jan 9, 2018 • edited Loading

Serhiy-Shekhovtsov commented Jan 9, 2018

reubano commented Jan 9, 2018

Serhiy-Shekhovtsov commented Jan 9, 2018

reubano commented Jan 9, 2018

Serhiy-Shekhovtsov commented Jan 9, 2018

Serhiy-Shekhovtsov commented Jan 3, 2018 •

edited by reubano

Loading

Serhiy-Shekhovtsov commented Jan 8, 2018 •

edited

Loading

reubano commented Jan 8, 2018 •

edited

Loading

Serhiy-Shekhovtsov commented Jan 8, 2018 •

edited

Loading

lamby commented Jan 9, 2018 •

edited

Loading