Skip to content
This repository has been archived by the owner on Feb 22, 2020. It is now read-only.

Fixed coordinates scaling for classification prediction #272

Conversation

Serhiy-Shekhovtsov
Copy link
Contributor

@Serhiy-Shekhovtsov Serhiy-Shekhovtsov commented Jan 3, 2018

We had a problem - the preprocessing is zooming the image. In other words, the pixel matrix is resized to real proportions. But coordinates we are using for prediction are not scaled accordingly and the actual patch we are making prediction on differs from the expected patch.
For example, the zooming factor for full LIDC-IDRI-0003 image is [2.5, 0.820312, 0.820312]. The prediction result for coordinates of the real nodule {'x': 367, 'y': 350, 'z': 72} is 0.0066. As you can see, it's very low. But if you scale these coordinates by the current zooming factor ({'x': 301, 'y': 286, 'z': 180}) we will get much higher probability of concerning 0.42.
More details here.

Reference to official issue

#268
TODO: Med prediction bugfix

CLA

  • I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well

@@ -34,7 +34,7 @@ class Params:
preprocess.preprocess_dicom.Params
"""

def __init__(self, clip_lower=None, clip_upper=None, spacing=None, order=0, # noqa: C901
def __init__(self, clip_lower=None, clip_upper=None, spacing=False, order=0, # noqa: C901
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the [doc string]https://github.com/Serhiy-Shekhovtsov/concept-to-clinic/blob/4887eeb6778013fce10e7e91fd846c84cb7cb248/prediction/src/preprocess/preprocess_ct.py#L19) spacing should be a float or float sequence. Could you update that, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@Serhiy-Shekhovtsov
Copy link
Contributor Author

Any comments on suggested approach in general? :)

@Serhiy-Shekhovtsov Serhiy-Shekhovtsov force-pushed the fixes/classification-prediction-fix branch 2 times, most recently from 0fbb81e to 436552a Compare January 4, 2018 20:44
@caseyfitz
Copy link

Thanks for these crucial updates, @Serhiy-Shekhovtsov! We had noticed the scaling issue on our end when testing the identification service, but had not assessed how pervasive the problem was.

Have you tested your update on a nodule_list with len(nodule_list) > 1? Manually testing the gtr123_model.predict(ct_path, nodule_list) method on my end works fine when using your test: test_classify_real_nodule_full_dicom(dicom_path_003, model_path).

However, that test uses the hard-coded nodule_list == [{'x': 369, 'y': 350, 'z': 5}]. When the length of the list is extended – either with a dummy nodule, e.g. [{'x': 369, 'y': 350, 'z': 5}, {'x': 0, 'y': 0, 'z': 0}], or with actual output from the identification service – prediction fails as follows:

~/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py in predict(ct_path, nodule_list, model_path)
    275     results = []
    276 
--> 277     for nodule, (cropped_image, coords) in zip(nodule_list, patches):
    278         cropped_image = Variable(torch.from_numpy(cropped_image[np.newaxis, np.newaxis]).float())
    279         cropped_image.volatile = True

ValueError: too many values to unpack (expected 2)

This happens because the contents of the list patches are inconsistent (some elements are tuples consisting of numpy.ndarray and coordinate lists, and some are just numpy.ndarray). Here is some ipynb debugging which demonstrates that the issue stems from the output of patches_from_ct (inputs are after ipydb>):

> /home/ubuntu/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py(274)predict()
    272     patches = patches_from_ct(ct_array, meta, config['crop_size'], centroids,
    273                               stride=config['stride'], pad_value=config['filling_value'])
--> 274     raise Exception
    275 
    276     results = []

ipdb> # forced exception to look at patches
ipdb> len(patches)
2
ipdb> len(patches[0])
2
ipdb> len(patches[1])
96
ipdb> type(patches[1])
<class 'numpy.ndarray'>
ipdb> type(patches[0][0])
<class 'numpy.ndarray'>
ipdb> type(patches[0][1])
<class 'numpy.ndarray'>
ipdb> patches[0][0].shape
(96, 96, 96)
ipdb> patches[0][1].shape
(3, 24, 24, 24)
ipdb> patches[1].shape
(96, 96, 96)
ipdb> q

We'll need to figure this out before merging so that identification and prediction can play nicely with each other 😉 Thanks again for your awesome work!

@Serhiy-Shekhovtsov
Copy link
Contributor Author

@caseyfitz, yes I saw this problem. Will try to solve it today. And also will improve the approach a bit and fix the failing tests.

@lamby
Copy link
Contributor

lamby commented Jan 5, 2018

Thanks @Serhiy-Shekhovtsov :)

@Serhiy-Shekhovtsov
Copy link
Contributor Author

@caseyfitz turned out, one little else at this line can solve the issue with multiple centroids. I will push the fix and tests soon.

Serhiy-Shekhovtsov added a commit to Serhiy-Shekhovtsov/concept-to-clinic that referenced this pull request Jan 5, 2018
Serhiy-Shekhovtsov added a commit to Serhiy-Shekhovtsov/concept-to-clinic that referenced this pull request Jan 5, 2018
Bug fixed. When the length of the nodules list is more then 1 – prediction fails as follows:

```
~/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py in predict(ct_path, nodule_list, model_path)
    275     results = []
    276 
--> 277     for nodule, (cropped_image, coords) in zip(nodule_list, patches):
    278         cropped_image = Variable(torch.from_numpy(cropped_image[np.newaxis, np.newaxis]).float())
    279         cropped_image.volatile = True

ValueError: too many values to unpack (expected 2)
```

More details [here](drivendataorg#272 (comment)).

## CLA
- [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
@Serhiy-Shekhovtsov Serhiy-Shekhovtsov force-pushed the fixes/classification-prediction-fix branch from 436552a to 7671e9a Compare January 5, 2018 16:58
reubano pushed a commit that referenced this pull request Jan 5, 2018
Bug fixed. When the length of the nodules list is more then 1 – prediction fails as follows:

```
~/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py in predict(ct_path, nodule_list, model_path)
    275     results = []
    276 
--> 277     for nodule, (cropped_image, coords) in zip(nodule_list, patches):
    278         cropped_image = Variable(torch.from_numpy(cropped_image[np.newaxis, np.newaxis]).float())
    279         cropped_image.volatile = True

ValueError: too many values to unpack (expected 2)
```

More details [here](#272 (comment)).

## CLA
- [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
@lamby
Copy link
Contributor

lamby commented Jan 5, 2018

turned out, one little else at this line can solve the issue with multiple centroids. I will push the fix and tests soon.

"with this one weird trick discovered by a schoolteacher", huh? Looking forward to tests and resolving the conflicts... :)

@Serhiy-Shekhovtsov Serhiy-Shekhovtsov force-pushed the fixes/classification-prediction-fix branch 3 times, most recently from 741e59a to 7a53619 Compare January 7, 2018 17:46
@Serhiy-Shekhovtsov
Copy link
Contributor Author

@lamby finally got them all green.

@reubano
Copy link
Contributor

reubano commented Jan 8, 2018

Was the issue about converting spacing from a number to a boolean resolved? I wasn't sure if the back and forth reached a conclusion.

@Serhiy-Shekhovtsov
Copy link
Contributor Author

@reubano converting to boolean is a fix for an issue of lost meta.spacing. Now params.spacing is boolean and it tells if we need to scale the image. But it doesn't wipe out the meta.spacing anymore.

@reubano
Copy link
Contributor

reubano commented Jan 8, 2018

@Serhiy-Shekhovtsov great! And just to be clear, this fixes #268 right? Is there any other test (besides that one) I should run to make sure this PR works as expected?

@Serhiy-Shekhovtsov
Copy link
Contributor Author

Serhiy-Shekhovtsov commented Jan 8, 2018

To be precise - the reported issue was caused by wrong coordinates. The coordinates specified in the test were good for full size DICOM but test was running on a small image. But I have found an other issue with classification. It was confirmed by @caseyfitz here. This PR includes tests for real nodules. The problem was - coordinates were not scaled together with an image. So the centroid location was misinterpreted and the cropped patch was wrong, so was the prediction.

@reubano
Copy link
Contributor

reubano commented Jan 8, 2018

Ran this a few times and while @caseyfitz's use cases looks solved, the test_classify_real_nodule_full_dicom is failing for me:

sudo docker-compose -f local.yml run prediction pytest -vrsk src/tests/test_classification.py

_____________________________________ test_classify_real_nodule_full_dicom _____________________________________

dicom_paths = ['/images_full/LIDC-IDRI-0002/1.3.6.1.4.1.14519.5.2.1.6279.6001.490157381160200744295382098329/1.3.6.1.4.1.14519.5.2.1...14519.5.2.1.6279.6001.298806137288633453246975630178/1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192']
model_path = '/app/src/algorithms/classify/assets/gtr123_model.ckpt'

    def test_classify_real_nodule_full_dicom(dicom_paths, model_path):
        predicted = trained_model.predict(dicom_paths[2], [{'x': 367, 'y': 349, 'z': 75}], model_path)
        assert predicted
>       assert 0.3 <= predicted[0]['p_concerning'] <= 1
E       assert 0.3 <= 0.0021293163299560547

src/tests/test_classification.py:23: AssertionError
sudo sh tests/test_docker.sh

+ docker-compose -f local.yml run prediction pytest -rsx
Starting base ... 
Starting base ... done
============================================= test session starts ==============================================
platform linux -- Python 3.6.3, pytest-3.1.3, py-1.5.2, pluggy-0.4.0
rootdir: /app, inifile:
collected 59 items 

src/tests/test_classification.py ...F.
...

@Serhiy-Shekhovtsov
Copy link
Contributor Author

Serhiy-Shekhovtsov commented Jan 8, 2018

@reubano, coordinates for this test has been hand crafted to match the real nodule on the third image: LIDC-IDRI-0003. But the glob method returns an unsorted list. By coincidence on your machine the third item is LIDC-IDRI-0002. I will fix the test to sort the list.

image

@Serhiy-Shekhovtsov Serhiy-Shekhovtsov force-pushed the fixes/classification-prediction-fix branch from 7a53619 to 245b5dd Compare January 8, 2018 20:24
@lamby
Copy link
Contributor

lamby commented Jan 9, 2018

@Serhiy-Shekhovtsov Is this ready to merge from your point of view? :)

@Serhiy-Shekhovtsov
Copy link
Contributor Author

@lamby yes, it is.

@reubano
Copy link
Contributor

reubano commented Jan 9, 2018

Yes, tests pass for me now... maybe the travis error was a fluke? I just reran the build.

@Serhiy-Shekhovtsov
Copy link
Contributor Author

@reubano it looks so.

@reubano reubano merged commit 413bc74 into drivendataorg:master Jan 9, 2018
@reubano
Copy link
Contributor

reubano commented Jan 9, 2018

great work!!

@Serhiy-Shekhovtsov
Copy link
Contributor Author

Thank you!

louisgv pushed a commit to louisgv/concept-to-clinic that referenced this pull request Jan 16, 2018
Bug fixed. When the length of the nodules list is more then 1 – prediction fails as follows:

```
~/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py in predict(ct_path, nodule_list, model_path)
    275     results = []
    276 
--> 277     for nodule, (cropped_image, coords) in zip(nodule_list, patches):
    278         cropped_image = Variable(torch.from_numpy(cropped_image[np.newaxis, np.newaxis]).float())
    279         cropped_image.volatile = True

ValueError: too many values to unpack (expected 2)
```

More details [here](drivendataorg#272 (comment)).

## CLA
- [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
louisgv pushed a commit to louisgv/concept-to-clinic that referenced this pull request Jan 16, 2018
…g#272)

* updated gitignore to ignore jpyter notebook and temporary files

* fixed coordinates scaling for classification

* fixed mashgrid generation for cropped patch

now machgrid is generated relativaly to symmetrical padding

* added tests for real modules classification

* fixed preprocessing tests
@vessemer vessemer mentioned this pull request Jan 26, 2018
1 task
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants