New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed coordinates scaling for classification prediction #272

Merged
merged 5 commits into from Jan 9, 2018

Conversation

Projects
None yet
5 participants
@Serhiy-Shekhovtsov
Copy link
Contributor

Serhiy-Shekhovtsov commented Jan 3, 2018

We had a problem - the preprocessing is zooming the image. In other words, the pixel matrix is resized to real proportions. But coordinates we are using for prediction are not scaled accordingly and the actual patch we are making prediction on differs from the expected patch.
For example, the zooming factor for full LIDC-IDRI-0003 image is [2.5, 0.820312, 0.820312]. The prediction result for coordinates of the real nodule {'x': 367, 'y': 350, 'z': 72} is 0.0066. As you can see, it's very low. But if you scale these coordinates by the current zooming factor ({'x': 301, 'y': 286, 'z': 180}) we will get much higher probability of concerning 0.42.
More details here.

Reference to official issue

#268
TODO: Med prediction bugfix

CLA

  • I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
@@ -34,7 +34,7 @@ class Params:
preprocess.preprocess_dicom.Params
"""

def __init__(self, clip_lower=None, clip_upper=None, spacing=None, order=0, # noqa: C901
def __init__(self, clip_lower=None, clip_upper=None, spacing=False, order=0, # noqa: C901

This comment has been minimized.

@WGierke

WGierke Jan 3, 2018

Contributor

According to the [doc string]https://github.com/Serhiy-Shekhovtsov/concept-to-clinic/blob/4887eeb6778013fce10e7e91fd846c84cb7cb248/prediction/src/preprocess/preprocess_ct.py#L19) spacing should be a float or float sequence. Could you update that, please?

This comment has been minimized.

@Serhiy-Shekhovtsov

Serhiy-Shekhovtsov Jan 3, 2018

Contributor

Sure. Thanks.

This comment has been minimized.

@lamby

lamby Jan 4, 2018

Contributor

Nice catch!

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Jan 4, 2018

Any comments on suggested approach in general? :)

@Serhiy-Shekhovtsov Serhiy-Shekhovtsov force-pushed the Serhiy-Shekhovtsov:fixes/classification-prediction-fix branch 2 times, most recently from 0fbb81e to 436552a Jan 4, 2018

@caseyfitz

This comment has been minimized.

Copy link

caseyfitz commented Jan 4, 2018

Thanks for these crucial updates, @Serhiy-Shekhovtsov! We had noticed the scaling issue on our end when testing the identification service, but had not assessed how pervasive the problem was.

Have you tested your update on a nodule_list with len(nodule_list) > 1? Manually testing the gtr123_model.predict(ct_path, nodule_list) method on my end works fine when using your test: test_classify_real_nodule_full_dicom(dicom_path_003, model_path).

However, that test uses the hard-coded nodule_list == [{'x': 369, 'y': 350, 'z': 5}]. When the length of the list is extended – either with a dummy nodule, e.g. [{'x': 369, 'y': 350, 'z': 5}, {'x': 0, 'y': 0, 'z': 0}], or with actual output from the identification service – prediction fails as follows:

~/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py in predict(ct_path, nodule_list, model_path)
    275     results = []
    276 
--> 277     for nodule, (cropped_image, coords) in zip(nodule_list, patches):
    278         cropped_image = Variable(torch.from_numpy(cropped_image[np.newaxis, np.newaxis]).float())
    279         cropped_image.volatile = True

ValueError: too many values to unpack (expected 2)

This happens because the contents of the list patches are inconsistent (some elements are tuples consisting of numpy.ndarray and coordinate lists, and some are just numpy.ndarray). Here is some ipynb debugging which demonstrates that the issue stems from the output of patches_from_ct (inputs are after ipydb>):

> /home/ubuntu/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py(274)predict()
    272     patches = patches_from_ct(ct_array, meta, config['crop_size'], centroids,
    273                               stride=config['stride'], pad_value=config['filling_value'])
--> 274     raise Exception
    275 
    276     results = []

ipdb> # forced exception to look at patches
ipdb> len(patches)
2
ipdb> len(patches[0])
2
ipdb> len(patches[1])
96
ipdb> type(patches[1])
<class 'numpy.ndarray'>
ipdb> type(patches[0][0])
<class 'numpy.ndarray'>
ipdb> type(patches[0][1])
<class 'numpy.ndarray'>
ipdb> patches[0][0].shape
(96, 96, 96)
ipdb> patches[0][1].shape
(3, 24, 24, 24)
ipdb> patches[1].shape
(96, 96, 96)
ipdb> q

We'll need to figure this out before merging so that identification and prediction can play nicely with each other 😉 Thanks again for your awesome work!

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Jan 5, 2018

@caseyfitz, yes I saw this problem. Will try to solve it today. And also will improve the approach a bit and fix the failing tests.

@lamby

This comment has been minimized.

Copy link
Contributor

lamby commented Jan 5, 2018

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Jan 5, 2018

@caseyfitz turned out, one little else at this line can solve the issue with multiple centroids. I will push the fix and tests soon.

Serhiy-Shekhovtsov added a commit to Serhiy-Shekhovtsov/concept-to-clinic that referenced this pull request Jan 5, 2018

Serhiy-Shekhovtsov added a commit to Serhiy-Shekhovtsov/concept-to-clinic that referenced this pull request Jan 5, 2018

fixed classification prediction for multiple nodules
Bug fixed. When the length of the nodules list is more then 1 – prediction fails as follows:

```
~/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py in predict(ct_path, nodule_list, model_path)
    275     results = []
    276 
--> 277     for nodule, (cropped_image, coords) in zip(nodule_list, patches):
    278         cropped_image = Variable(torch.from_numpy(cropped_image[np.newaxis, np.newaxis]).float())
    279         cropped_image.volatile = True

ValueError: too many values to unpack (expected 2)
```

More details [here](drivendataorg#272 (comment)).

## CLA
- [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well

@Serhiy-Shekhovtsov Serhiy-Shekhovtsov force-pushed the Serhiy-Shekhovtsov:fixes/classification-prediction-fix branch from 436552a to 7671e9a Jan 5, 2018

reubano added a commit that referenced this pull request Jan 5, 2018

fixed classification prediction for multiple nodules
Bug fixed. When the length of the nodules list is more then 1 – prediction fails as follows:

```
~/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py in predict(ct_path, nodule_list, model_path)
    275     results = []
    276 
--> 277     for nodule, (cropped_image, coords) in zip(nodule_list, patches):
    278         cropped_image = Variable(torch.from_numpy(cropped_image[np.newaxis, np.newaxis]).float())
    279         cropped_image.volatile = True

ValueError: too many values to unpack (expected 2)
```

More details [here](#272 (comment)).

## CLA
- [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
@lamby

This comment has been minimized.

Copy link
Contributor

lamby commented Jan 5, 2018

turned out, one little else at this line can solve the issue with multiple centroids. I will push the fix and tests soon.

"with this one weird trick discovered by a schoolteacher", huh? Looking forward to tests and resolving the conflicts... :)

@Serhiy-Shekhovtsov Serhiy-Shekhovtsov force-pushed the Serhiy-Shekhovtsov:fixes/classification-prediction-fix branch 3 times, most recently from 741e59a to 7a53619 Jan 7, 2018

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Jan 7, 2018

@lamby finally got them all green.

@reubano

This comment has been minimized.

Copy link
Contributor

reubano commented Jan 8, 2018

Was the issue about converting spacing from a number to a boolean resolved? I wasn't sure if the back and forth reached a conclusion.

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Jan 8, 2018

@reubano converting to boolean is a fix for an issue of lost meta.spacing. Now params.spacing is boolean and it tells if we need to scale the image. But it doesn't wipe out the meta.spacing anymore.

@reubano

This comment has been minimized.

Copy link
Contributor

reubano commented Jan 8, 2018

@Serhiy-Shekhovtsov great! And just to be clear, this fixes #268 right? Is there any other test (besides that one) I should run to make sure this PR works as expected?

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Jan 8, 2018

To be precise - the reported issue was caused by wrong coordinates. The coordinates specified in the test were good for full size DICOM but test was running on a small image. But I have found an other issue with classification. It was confirmed by @caseyfitz here. This PR includes tests for real nodules. The problem was - coordinates were not scaled together with an image. So the centroid location was misinterpreted and the cropped patch was wrong, so was the prediction.

@reubano

This comment has been minimized.

Copy link
Contributor

reubano commented Jan 8, 2018

Ran this a few times and while @caseyfitz's use cases looks solved, the test_classify_real_nodule_full_dicom is failing for me:

sudo docker-compose -f local.yml run prediction pytest -vrsk src/tests/test_classification.py

_____________________________________ test_classify_real_nodule_full_dicom _____________________________________

dicom_paths = ['/images_full/LIDC-IDRI-0002/1.3.6.1.4.1.14519.5.2.1.6279.6001.490157381160200744295382098329/1.3.6.1.4.1.14519.5.2.1...14519.5.2.1.6279.6001.298806137288633453246975630178/1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192']
model_path = '/app/src/algorithms/classify/assets/gtr123_model.ckpt'

    def test_classify_real_nodule_full_dicom(dicom_paths, model_path):
        predicted = trained_model.predict(dicom_paths[2], [{'x': 367, 'y': 349, 'z': 75}], model_path)
        assert predicted
>       assert 0.3 <= predicted[0]['p_concerning'] <= 1
E       assert 0.3 <= 0.0021293163299560547

src/tests/test_classification.py:23: AssertionError
sudo sh tests/test_docker.sh

+ docker-compose -f local.yml run prediction pytest -rsx
Starting base ... 
Starting base ... done
============================================= test session starts ==============================================
platform linux -- Python 3.6.3, pytest-3.1.3, py-1.5.2, pluggy-0.4.0
rootdir: /app, inifile:
collected 59 items 

src/tests/test_classification.py ...F.
...
@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Jan 8, 2018

@reubano, coordinates for this test has been hand crafted to match the real nodule on the third image: LIDC-IDRI-0003. But the glob method returns an unsorted list. By coincidence on your machine the third item is LIDC-IDRI-0002. I will fix the test to sort the list.

image

@Serhiy-Shekhovtsov Serhiy-Shekhovtsov force-pushed the Serhiy-Shekhovtsov:fixes/classification-prediction-fix branch from 7a53619 to 245b5dd Jan 8, 2018

@lamby

This comment has been minimized.

Copy link
Contributor

lamby commented Jan 9, 2018

@Serhiy-Shekhovtsov Is this ready to merge from your point of view? :)

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Jan 9, 2018

@lamby yes, it is.

@reubano

This comment has been minimized.

Copy link
Contributor

reubano commented Jan 9, 2018

Yes, tests pass for me now... maybe the travis error was a fluke? I just reran the build.

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Jan 9, 2018

@reubano it looks so.

@reubano reubano merged commit 413bc74 into drivendataorg:master Jan 9, 2018

2 checks passed

concept-to-clinic/cla @Serhiy-Shekhovtsov has signed the CLA.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@reubano

This comment has been minimized.

Copy link
Contributor

reubano commented Jan 9, 2018

great work!!

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Jan 9, 2018

Thank you!

louisgv added a commit to louisgv/concept-to-clinic that referenced this pull request Jan 16, 2018

fixed classification prediction for multiple nodules
Bug fixed. When the length of the nodules list is more then 1 – prediction fails as follows:

```
~/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py in predict(ct_path, nodule_list, model_path)
    275     results = []
    276 
--> 277     for nodule, (cropped_image, coords) in zip(nodule_list, patches):
    278         cropped_image = Variable(torch.from_numpy(cropped_image[np.newaxis, np.newaxis]).float())
    279         cropped_image.volatile = True

ValueError: too many values to unpack (expected 2)
```

More details [here](drivendataorg#272 (comment)).

## CLA
- [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well

louisgv added a commit to louisgv/concept-to-clinic that referenced this pull request Jan 16, 2018

Fixed coordinates scaling for classification prediction (drivendataor…
…g#272)

* updated gitignore to ignore jpyter notebook and temporary files

* fixed coordinates scaling for classification

* fixed mashgrid generation for cropped patch

now machgrid is generated relativaly to symmetrical padding

* added tests for real modules classification

* fixed preprocessing tests

@vessemer vessemer referenced this pull request Jan 26, 2018

Merged

Classification model pipeline #298

1 of 1 task complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment