Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for faster C-based matrix and shape surface computation #200

Closed
wants to merge 9 commits into from

Conversation

jcfr
Copy link
Collaborator

@jcfr jcfr commented Feb 15, 2017

This topic is "equivalent" to #158 but was rebased on the current master on 2017-02-15

Note that it doesn't (yet) include the following:

  • Update of C extensions module to support Python 3
  • Integration of scikit-build to support seamless building on multiple platform

@jcfr jcfr mentioned this pull request Feb 15, 2017
JoostJM and others added 6 commits February 15, 2017 19:03
Stats (Profiling 5 testcases):
  GLCM 9653 ms -> 7 ms
  GLDM 348 ms -> 7 ms
  NGTDM 133 ms -> 4ms

This commit adds two C Python extensions and associated tests:

* _cmatrices: C implementation for matrix computation associayed
  with GLCM, GLDM, GLDZM, GLRLM, GLSZM. and NGTDM features

* _cshape: C implementation for Shape surface computation

* tests/test_cmatrices: testing for matrix equality: Tests
  whether the python generated matrix is equal to the matrix
  generated by the C implementations (allows for machine precision errors)

Details
-------

* Add docstring to C modules

* Use of C implementation optional. At initialization, the package
  tries to use C, but if loading has failed, or calculation is forced
  to python, python is used. Note that the import of _cmatrices is done
  after initialization of logger. This ensures error will be logged
  if import fails.

* GLDM, NGTDM: C implementation accepts only one
  direction set of angles (i.e. 13 angles in a 3D image,
  4 angles in a 2D).

* GLSZM: Use "char" datatype for mask. (It is signed char
  in GLSZM).

* C code is consistent with C89 convention. All variables (pointers
  for python objects) are initialized at top of each block.

Optimizations
-------------

* GLSZM:

  - Uses "while" loops. This allows to reduce the memory usage.
    We observed that with recursive functions it was 'unexpectedly'
    failing.

  - Optimized search that finds a new index to process in
    the region growing.

Co-authored-by: Jean-Christophe Fillion-Robin <jchris.fillionr@kitware.com>
Fix a datatype bug in the python calculation of surface area (must be
in a number-like datatype, otherwise the interpolate function will not
work properly).
Remove C code for calculating the matrix in these classes.
Remove the tests for these classes from test_cmatrices
The inner  loop here runs from d2 = 1 to size[mDims[2]], meaning
it will run from x = 1 to size(x).
This is to prevent running over x = 0 (these start voxels are covered
in the previous loop, when x is fixed at 0).

However this applies to the case where angle[x] = 1. When angle[x] = -1,
x is fixed at size(x) in the previous loop and x has to run
from x = 0 to x = size(x) - 1. This should be handled by:
`if (angles[a * 3 + mDims[2]] < 0) jd[mDims[2]] --;`
However, mDims was set to mDims[1], meaning it applied the correction
to the wrong dimension. This caused the start voxels with x = 0 to be
erroneously omitted and start voxels with x = size(x) to be counted
twice. Additionally, when angly(y) is -1, the runs that would start
at y = 0 would be runs without elements, as it is erroneously set to
start at -1, which causes the while loop to exit immediately.
…ibution

Source distribution is generation using:

  python setup.py sdist

For reference:

  https://docs.python.org/2/distutils/sourcedist.html#commands
@jcfr jcfr closed this Feb 16, 2017
@jcfr jcfr force-pushed the c-matrices-rebased-jcfr-20170215 branch from 00df0a6 to cc78841 Compare February 16, 2017 00:13
@jcfr jcfr reopened this Feb 16, 2017
This commit include correct header to fix

  warning: implicit declaration of function ‘Py_InitModule3’ [-Wimplicit-function-declaration]

and it update the signature of the "interpolate" function to fix this one:

  warning: control reaches end of non-void function [-Wreturn-type]
Python 3 has a revamped extension module initialization
system. (See PEP 3121.)"

See https://docs.python.org/3/howto/cporting.html#cporting-howto
@@ -50,6 +50,7 @@ def __init__(self, inputImage, inputMask, **kwargs):
self.SurfaceArea = self._calculateSurfaceArea()

def _calculateSurfaceArea(self):
self.maskArray = self.maskArray.astype('int') # needed for the interpolate function to work correctly
Copy link
Collaborator

@JoostJM JoostJM Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcfr, this is not necessary, if you look at L42, you can see that the datatype is already forced to int.

@JoostJM
Copy link
Collaborator

JoostJM commented Feb 16, 2017

@Radiomics/developers, what is the preference, as this concerns a branch which was created before the repository was made public. I was working in parallel to @jcfr to update the C extensions for integration with the much changed master. However, I opted to make a clean branch from the current master, cherry picking and making new commits. This is a somewhat more 'clean' branch, with the old bugfixes quashed into the first commits, and without first adding, then deleting the C extension for the experimental feature classes. It is located here.

@JoostJM
Copy link
Collaborator

JoostJM commented Feb 16, 2017

@jcfr, Do you want to combine the integration of scikit-build with the C extensions in this PR, or do you want to address that in a different PR? Seeing as the tests are currently passing, I think it may be more logical to address this in a different PR.

@jcfr
Copy link
Collaborator Author

jcfr commented Feb 16, 2017

Do you want to combine the integration of scikit-build with the C extensions in this PR, or do you want
to address that in a different PR? Seeing as the tests are currently passing, I think it may be more
logical to address this in a different PR.

A different PR will make more sens

more 'clean' branch, with the old bugfixes quashed into the first commits, and without first adding,
then deleting the C extension for the experimental feature classes.

Just because, here is an updated branch without adding/deleting the feature. It now contains only three commits. ( Initially, I just thought it would be nice to have the "not ready for production" implementation of the other features still available in the history.)
See https://github.com/Radiomics/pyradiomics/compare/c-matrices-rebased-jcfr-20170216?expand=1

May be we could keep part of this commit message: 6461197

All of that said, it makes sense to move forward with your topic, it is cleaner. You will find below an updated version where I

  • squashed the style and compilation fixes where they belong
  • remove the dependency on cython (it is not used as far as i can tell ... no pyx files)

See #202

Comparing the two topic (c-matrices-rebased-jcfr-20170216 [jcfr] and c-matrices_revised [JoostJM]), the only major differences is in that in your topic, the weighting in glcm.py is removed. What this intentional ?

This link illustrate the difference: c-matrices-rebased-jcfr-20170216...c-matrices-rebased-jcfr-20170216-diff-joost-revised#diff-c8aee871337c890a9b744f600b6e8d67

@JoostJM
Copy link
Collaborator

JoostJM commented Feb 16, 2017

Comparing the two topic (c-matrices-rebased-jcfr-20170216 [jcfr] and c-matrices_revised [JoostJM]), the only major differences is in that in your topic, the weighting in glcm.py is removed. What this intentional

It is not removed as such. Weighting and removal of empty angles is done in python for both the python calculated and C calculated matrices. To prevent bugs, I put this functionality in a separate function (_applyMatrixOptions) and subsequently call that from the _calculateMatrix and _calculateCMatrix functions.

There are some other differences as well, I'll list them here:

  • Standardized the naming of matrix calculation functions (_calculateMatrix and _calculateCMatrix). This is needed to enable a simplified testing script for test_cmatrices.
  • In C calculation of GLSZM, I used a temporary array to store the calculated zones, with zones defined as pairs of two elements, with the first item defining the gray level, and the second the size. This is much less memory intensive (only 2 * NumVoxels + 1: this size is needed when ROI consists of only zones of size 1). This matrix is then loaded into the output GLSZM, which can be initialized directly in its correct size (instead of NumGrayLevels * NumVoxels, which is required otherwise). Before, I created this temporary array as a python numpy array object, and getting a pointer to the data. Here, I directly allocate memory for the array using calloc and release it with free, as the temporary data only exists in C.
  • I renamed the function to force full python to what I think is a more logical name, and added some code to prevent an import error in the featureclasses if the C extensions don't work (cMatrices and cShape are imported into the featureclass, even if not loaded, so if they are not loaded, they must still be defined, I achieve this by assigning None if the import fails.

Finally, I want to parameterize whether full-python mode should be forced, so it can be incorporated into a parameterfile, or passed in kwargs to featureextractor. I will push this later.

@JoostJM
Copy link
Collaborator

JoostJM commented Feb 16, 2017

May be we could keep part of this commit message: 6461197

I added an updated version of this commit message to the discussion thread in #202. Furthermore, I think we should incorporate this in some form in the release notes when we declare the next stable release that has this integrated.

@jcfr
Copy link
Collaborator Author

jcfr commented Feb 16, 2017

To prevent bugs, I put this functionality in a separate function

Well done. Look at it again, makes sense.

some other differences as well, I'll list them here

Thanks for the details explanation 👍

@jcfr
Copy link
Collaborator Author

jcfr commented Feb 16, 2017

Closing this PR. It is superseded by #202

@jcfr jcfr closed this Feb 16, 2017
@jcfr jcfr deleted the c-matrices-rebased-jcfr-20170215 branch February 16, 2017 15:46
@JoostJM JoostJM added this to Rejected PRs in PyRadiomics Performance Feb 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
PyRadiomics Performance
  
Rejected PRs
Development

Successfully merging this pull request may close these issues.

None yet

2 participants