-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance C GLSZM #257
Enhance C GLSZM #257
Conversation
This algorithm change does require more memory, as it instantiates a temporary array (size equal to number of voxels segmented) to hold the found indices. |
Big win! Nice work. I don't suspect the memory will be a constraining factor in most use cases, but somewhere in the documentation we should have a list of practical considerations about estimated resource requirements for various processing scenarios (expected time and memory requirements for different input types). |
I don't think so too. Especially if you consider it's just one array , equal to the ROI size. Alternative performance enhancements, such as multithreading, etc, would require at least as much extra memory, if not more. |
Nice! Should we ask Dan to test this branch with the data that was problematic for him in the past? |
@pieper, I ran this on the brain1 test case with full mask and tested it on the 5 test cases for profiling, I included the results for glszm. Furthermore, the matrix results are compared between C and python calculated results as part of the testing. Do you believe it is necessary to run it in more test cases? Aside from the fact that we still want to add additional test cases.
…-------- Oorspronkelijk bericht --------
Van: Steve Pieper <notifications@github.com>
Datum: 22-05-17 17:24 (GMT+01:00)
Aan: Radiomics/pyradiomics <pyradiomics@noreply.github.com>
Cc: Joost van Griethuysen <j.v.griethuysen@nki.nl>, Mention <mention@noreply.github.com>
Onderwerp: Re: [Radiomics/pyradiomics] Enhance C GLSZM (#257)
Good idea: @blezek<https://github.com/blezek> can you have a look?
Oh, and by the way, @JoostJM<https://github.com/joostjm> have you re-run some datasets to confirm the results match? Would be good to do a few real-world examples in case there are edge cases.
-
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#257 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ARMRw4tZ3x-jZUKjkY1LtZSE_lW63TXuks5r8aiTgaJpZM4NiZeC>.
|
I know it's not possible to run exhaustive testing, but if it's just a matter of re-running an existing script with the with the updated code it would be a useful sanity check. I know in some cases changing the implementation might lead to slightly different results due to rounding errors or whatever but here I think we'd expect identical results so checking should be easy. |
@pieper, I took a look at @JoostJM's code changes. For my testing I was simply looking at some of the whole brain segmentations from the http://www.oasis-brains.org/ study. Not terribly representative of what we might wish to do with pyradiomics... Was gearing up to look at the ADNI hippocampus as a test. Very interesting results. If I comment out line 29 of shape.py, the |
As a related issue, I think it may be misleading to the user of the library, that although we create an impression that calculation of individual features is possible, the reality is that it depends on the feature class. Most of the shape features are calculated in the constructor, and it is not possible for the user to disable, for example, Feret diameter calculation. Should we consider passing the list of features that should be calculated to the feature class constructor? |
Also for reference here's the ITK Feret diameter calculation. I'm guessing it could be optimized. |
Indeed a lot of features are pre-calculated and this should be mentioned in the documentation at the very least. In terms of performance only the feret diameter is the real issue, although I'm working on a C extension. Current profiling is down to 90 seconds (as opposed to 270 seconds).
If I follow the python implemetation it even went down to 10 seconds, but I also discovered that this could produce erroneous results (I created an artificial segmentation where the diameter calculated was 82, but the real diameter was 92)
…-------- Oorspronkelijk bericht --------
Van: Andrey Fedorov <notifications@github.com>
Datum: 23-05-17 16:44 (GMT+01:00)
Aan: Radiomics/pyradiomics <pyradiomics@noreply.github.com>
Cc: Joost van Griethuysen <j.v.griethuysen@nki.nl>, Mention <mention@noreply.github.com>
Onderwerp: Re: [Radiomics/pyradiomics] Enhance C GLSZM (#257)
As a related issue, I think it may be misleading to the user of the library, that although we create an impression that calculation of individual features is possible, the reality is that it depends on the feature class. Most of the shape features are calculated in the constructor, and it is not possible for the user to disable, for example, Feret diameter calculation.
Should we consider passing the list of features that should be calculated to the feature class constructor?
-
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#257 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ARMRw1GLZr_znKnk5nq4dGnva2bXCUQCks5r8vDSgaJpZM4NiZeC>.
|
@thewtex Just to keep you in the loop, no action item from your side. |
This looks very similar to what I implemented in C. Main difference is that I also calculate the maximum 2D diameters in each direction and require it to have 3 dimensions. (Latest profiling +/- 2 mins for fully masked brain 1) In essence, it compares all possible combinations of voxels on the border which is equal to numVox! calculations. The PyRadiomics implementation compared only combinations of voxels on the upper bound (for either z, y, or x) to voxels on the lower bound (for either z, y or x). This requires significantly less calculations. However, there are edge cases where the maximum diameter is NOT between a voxel on the lower bound and a voxel on the upper bound. Let's discuss this at the hangout today. |
Enhance C extension algorithm for calculating GLSZM. Instead of searching over entire image for voxels to process when growing a region, instantiate a stack (max size equal to number of voxels) to hold indices for voxels that have to be processed for the current region. When an unprocessed voxel, belonging to the ROI is found, a region is started by pushing the index of that voxels to the stack. Then a loop is started where the last added voxel in the stack is popped and its neighbours are checked. If unprocessed and belonging to the same region (same gray level), indices for those voxels are also pushed to the stack. The loop then runs until stack is exhausted (i.e. no new neighbours available, indicating the region is complete). The number of loops are counted and are equal to the size of the zone. Reset data type for mask from signed char back to char, as negative values are not needed anymore.
Add a C extension function to calculate maximum diameters in 2D and 3D. Additionally, python implementation of maximum diameter can return incorrect values in some edge cases. After discussion, we decided to remove support for python calculation of maximum diameters and have PyRadiomics return NaN when a user tries to compute these features in full-python mode.
Related to [AIM-Harvard#257 in pypa/wheel](pypa/wheel#257). Newest version of wheel now expects the requirements object returned by `pkg_resources.Requirement.parse(req)` to have an url attribute. Force the `pkg_resources` version to at least 0.3.0 to ensure this is the case.
Related to [AIM-Harvard#257 in pypa/wheel](pypa/wheel#257). Newest version of wheel now expects the requirements object returned by `pkg_resources.Requirement.parse(req)` to have an url attribute. Force the `pkg_resources` version to at least 0.3.0 to ensure this is the case.
Related to [AIM-Harvard#257 in pypa/wheel](pypa/wheel#257). Newest version of wheel now expects the requirements object returned by `pkg_resources.Requirement.parse(req)` to have an url attribute. Change the installation of setuptools to ensure correct setuptools version is used.
Related to [AIM-Harvard#257 in pypa/wheel](pypa/wheel#257). Newest version of wheel now expects the requirements object returned by `pkg_resources.Requirement.parse(req)` to have an url attribute. Change the installation of setuptools to ensure correct setuptools version is used.
Related to [AIM-Harvard#257 in pypa/wheel](pypa/wheel#257). Newest version of wheel now expects the requirements object returned by `pkg_resources.Requirement.parse(req)` to have an url attribute. Change the installation of setuptools to ensure correct setuptools version is used.
Related to [AIM-Harvard#257 in pypa/wheel](pypa/wheel#257). Newest version of wheel now expects the requirements object returned by `pkg_resources.Requirement.parse(req)` to have an url attribute. Change the installation of setuptools to ensure correct setuptools version is used.
Related to [AIM-Harvard#257 in pypa/wheel](pypa/wheel#257). Newest version of wheel now expects the requirements object returned by `pkg_resources.Requirement.parse(req)` to have an url attribute. Change the installation of setuptools to ensure correct setuptools version is used.
Enhance C extension algorithm for calculating GLSZM. Instead of searching over entire image for voxels to process when growing a region, instantiate a stack (max size equal to number of voxels) to hold indices for voxels that have to be processed for the current region.
When an unprocessed voxel, belonging to the ROI is found, a region is started by pushing the index of that voxels to the stack. Then a loop is started where the last added voxel in the stack is popped and its neighbours are checked. If unprocessed and belonging to the same region (same gray level), indices for those voxels are also pushed to the stack. The loop then runs until stack is exhausted (i.e. no new neighbours available, indicating the region is complete). The number of loops are counted and are equal to the size of the zone.
Reset data type for mask from signed char back to char, as negative values are not needed anymore.
Profiling results (from previous to new C extension implementation):
5 test cases, normal mask 56 ms --> 6 ms
brain1 test case, full mask 1340015 ms --> 4855 ms
Performance of GLSZM calculation now comparable to that of GLRLM (approx. 2.5x slower than GLCM)
cc @Radiomics/developers
Fixes #256