Enhance C GLSZM #257

JoostJM · 2017-05-22T14:01:03Z

Enhance C extension algorithm for calculating GLSZM. Instead of searching over entire image for voxels to process when growing a region, instantiate a stack (max size equal to number of voxels) to hold indices for voxels that have to be processed for the current region.

When an unprocessed voxel, belonging to the ROI is found, a region is started by pushing the index of that voxels to the stack. Then a loop is started where the last added voxel in the stack is popped and its neighbours are checked. If unprocessed and belonging to the same region (same gray level), indices for those voxels are also pushed to the stack. The loop then runs until stack is exhausted (i.e. no new neighbours available, indicating the region is complete). The number of loops are counted and are equal to the size of the zone.

Reset data type for mask from signed char back to char, as negative values are not needed anymore.

Profiling results (from previous to new C extension implementation):
5 test cases, normal mask 56 ms --> 6 ms
brain1 test case, full mask 1340015 ms --> 4855 ms

Performance of GLSZM calculation now comparable to that of GLRLM (approx. 2.5x slower than GLCM)

cc @Radiomics/developers
Fixes #256

JoostJM · 2017-05-22T14:06:04Z

This algorithm change does require more memory, as it instantiates a temporary array (size equal to number of voxels segmented) to hold the found indices.

pieper · 2017-05-22T15:02:34Z

Big win! Nice work.

I don't suspect the memory will be a constraining factor in most use cases, but somewhere in the documentation we should have a list of practical considerations about estimated resource requirements for various processing scenarios (expected time and memory requirements for different input types).

JoostJM · 2017-05-22T15:12:27Z

I don't suspect the memory will be a constraining factor

I don't think so too. Especially if you consider it's just one array , equal to the ROI size. Alternative performance enhancements, such as multithreading, etc, would require at least as much extra memory, if not more.

fedorov · 2017-05-22T15:20:48Z

Nice! Should we ask Dan to test this branch with the data that was problematic for him in the past?

pieper · 2017-05-22T15:24:02Z

Good idea: @blezek can you have a look?

Oh, and by the way, @JoostJM have you re-run some datasets to confirm the results match? Would be good to do a few real-world examples in case there are edge cases.

JoostJM · 2017-05-22T15:45:19Z

@pieper, I ran this on the brain1 test case with full mask and tested it on the 5 test cases for profiling, I included the results for glszm. Furthermore, the matrix results are compared between C and python calculated results as part of the testing. Do you believe it is necessary to run it in more test cases? Aside from the fact that we still want to add additional test cases.

…

-------- Oorspronkelijk bericht -------- Van: Steve Pieper <notifications@github.com> Datum: 22-05-17 17:24 (GMT+01:00) Aan: Radiomics/pyradiomics <pyradiomics@noreply.github.com> Cc: Joost van Griethuysen <j.v.griethuysen@nki.nl>, Mention <mention@noreply.github.com> Onderwerp: Re: [Radiomics/pyradiomics] Enhance C GLSZM (#257) Good idea: @blezek<https://github.com/blezek> can you have a look? Oh, and by the way, @JoostJM<https://github.com/joostjm> have you re-run some datasets to confirm the results match? Would be good to do a few real-world examples in case there are edge cases. - You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#257 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ARMRw4tZ3x-jZUKjkY1LtZSE_lW63TXuks5r8aiTgaJpZM4NiZeC>.

pieper · 2017-05-22T17:12:17Z

I know it's not possible to run exhaustive testing, but if it's just a matter of re-running an existing script with the with the updated code it would be a useful sanity check. I know in some cases changing the implementation might lead to slightly different results due to rounding errors or whatever but here I think we'd expect identical results so checking should be easy.

blezek · 2017-05-23T03:20:33Z

@pieper, I took a look at @JoostJM's code changes. For my testing I was simply looking at some of the whole brain segmentations from the http://www.oasis-brains.org/ study. Not terribly representative of what we might wish to do with pyradiomics... Was gearing up to look at the ADNI hippocampus as a test.

Very interesting results. If I comment out line 29 of shape.py, the pyradiomics runs nearly instantly. With the call to SetComputeFeretDiameter included, it takes 25 minutes! I would give a +1 to not computing the Feret diameter. The results were identical except for "original_shape_Maximum3DDiameter": 176.6918221084383 vs "original_shape_Maximum3DDiameter": 0, but took only 3 seconds to run.

JoostJM · 2017-05-23T06:26:23Z

@blezek, I found similar results (#256). I'll take a look at maximum 3D diameter and see if I can optimize.

fedorov · 2017-05-23T14:44:33Z

As a related issue, I think it may be misleading to the user of the library, that although we create an impression that calculation of individual features is possible, the reality is that it depends on the feature class. Most of the shape features are calculated in the constructor, and it is not possible for the user to disable, for example, Feret diameter calculation.

Should we consider passing the list of features that should be calculated to the feature class constructor?

pieper · 2017-05-23T15:24:11Z

Also for reference here's the ITK Feret diameter calculation. I'm guessing it could be optimized.

https://github.com/InsightSoftwareConsortium/ITK/blob/9543d93dc85db3d67c6085664cffc08d946d2da2/Modules/Filtering/LabelMap/include/itkShapeLabelMapFilter.hxx#L390-L462

JoostJM · 2017-05-23T15:49:31Z

Indeed a lot of features are pre-calculated and this should be mentioned in the documentation at the very least. In terms of performance only the feret diameter is the real issue, although I'm working on a C extension. Current profiling is down to 90 seconds (as opposed to 270 seconds). If I follow the python implemetation it even went down to 10 seconds, but I also discovered that this could produce erroneous results (I created an artificial segmentation where the diameter calculated was 82, but the real diameter was 92)

…

-------- Oorspronkelijk bericht -------- Van: Andrey Fedorov <notifications@github.com> Datum: 23-05-17 16:44 (GMT+01:00) Aan: Radiomics/pyradiomics <pyradiomics@noreply.github.com> Cc: Joost van Griethuysen <j.v.griethuysen@nki.nl>, Mention <mention@noreply.github.com> Onderwerp: Re: [Radiomics/pyradiomics] Enhance C GLSZM (#257) As a related issue, I think it may be misleading to the user of the library, that although we create an impression that calculation of individual features is possible, the reality is that it depends on the feature class. Most of the shape features are calculated in the constructor, and it is not possible for the user to disable, for example, Feret diameter calculation. Should we consider passing the list of features that should be calculated to the feature class constructor? - You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#257 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ARMRw1GLZr_znKnk5nq4dGnva2bXCUQCks5r8vDSgaJpZM4NiZeC>.

jcfr · 2017-05-23T17:15:04Z

Also for reference here's the ITK Feret diameter calculation. I'm guessing it could be optimized.

@thewtex Just to keep you in the loop, no action item from your side.

JoostJM · 2017-05-24T08:16:40Z

Also for reference here's the ITK Feret diameter calculation. I'm guessing it could be optimized.

This looks very similar to what I implemented in C. Main difference is that I also calculate the maximum 2D diameters in each direction and require it to have 3 dimensions. (Latest profiling +/- 2 mins for fully masked brain 1)

In essence, it compares all possible combinations of voxels on the border which is equal to numVox! calculations. The PyRadiomics implementation compared only combinations of voxels on the upper bound (for either z, y, or x) to voxels on the lower bound (for either z, y or x). This requires significantly less calculations. However, there are edge cases where the maximum diameter is NOT between a voxel on the lower bound and a voxel on the upper bound.

Let's discuss this at the hangout today.

JoostJM · 2017-05-24T09:11:52Z

Latest profiling (brain1, full mask)

Enhance C extension algorithm for calculating GLSZM. Instead of searching over entire image for voxels to process when growing a region, instantiate a stack (max size equal to number of voxels) to hold indices for voxels that have to be processed for the current region. When an unprocessed voxel, belonging to the ROI is found, a region is started by pushing the index of that voxels to the stack. Then a loop is started where the last added voxel in the stack is popped and its neighbours are checked. If unprocessed and belonging to the same region (same gray level), indices for those voxels are also pushed to the stack. The loop then runs until stack is exhausted (i.e. no new neighbours available, indicating the region is complete). The number of loops are counted and are equal to the size of the zone. Reset data type for mask from signed char back to char, as negative values are not needed anymore.

Add a C extension function to calculate maximum diameters in 2D and 3D. Additionally, python implementation of maximum diameter can return incorrect values in some edge cases. After discussion, we decided to remove support for python calculation of maximum diameters and have PyRadiomics return NaN when a user tries to compute these features in full-python mode.

Related to [AIM-Harvard#257 in pypa/wheel](pypa/wheel#257). Newest version of wheel now expects the requirements object returned by `pkg_resources.Requirement.parse(req)` to have an url attribute. Force the `pkg_resources` version to at least 0.3.0 to ensure this is the case.

Related to [AIM-Harvard#257 in pypa/wheel](pypa/wheel#257). Newest version of wheel now expects the requirements object returned by `pkg_resources.Requirement.parse(req)` to have an url attribute. Change the installation of setuptools to ensure correct setuptools version is used.

JoostJM force-pushed the update-c-glszm branch from ff5c75e to ab65390 Compare May 22, 2017 14:33

JoostJM force-pushed the update-c-glszm branch from 2ae29c3 to 9d9e57c Compare May 24, 2017 09:15

JoostJM added 2 commits May 24, 2017 16:57

DOCS: Document changes

1555f0e

JoostJM force-pushed the update-c-glszm branch from 9d9e57c to 1555f0e Compare May 24, 2017 15:00

JoostJM merged commit 75baf7b into AIM-Harvard:master May 29, 2017

JoostJM deleted the update-c-glszm branch May 29, 2017 06:51

JoostJM added the enhancement label May 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance C GLSZM #257

Enhance C GLSZM #257

JoostJM commented May 22, 2017

JoostJM commented May 22, 2017

pieper commented May 22, 2017

JoostJM commented May 22, 2017

fedorov commented May 22, 2017

pieper commented May 22, 2017

JoostJM commented May 22, 2017 via email

pieper commented May 22, 2017

blezek commented May 23, 2017

JoostJM commented May 23, 2017

fedorov commented May 23, 2017

pieper commented May 23, 2017

JoostJM commented May 23, 2017 via email •

edited

Loading

jcfr commented May 23, 2017

JoostJM commented May 24, 2017

JoostJM commented May 24, 2017

Enhance C GLSZM #257

Enhance C GLSZM #257

Conversation

JoostJM commented May 22, 2017

JoostJM commented May 22, 2017

pieper commented May 22, 2017

JoostJM commented May 22, 2017

fedorov commented May 22, 2017

pieper commented May 22, 2017

JoostJM commented May 22, 2017 via email

pieper commented May 22, 2017

blezek commented May 23, 2017

JoostJM commented May 23, 2017

fedorov commented May 23, 2017

pieper commented May 23, 2017

JoostJM commented May 23, 2017 via email • edited Loading

jcfr commented May 23, 2017

JoostJM commented May 24, 2017

JoostJM commented May 24, 2017

JoostJM commented May 23, 2017 via email •

edited

Loading