Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C matrices revised jcfr 20170216 #202

Merged
merged 7 commits into from
Feb 20, 2017
Merged

Conversation

jcfr
Copy link
Collaborator

@jcfr jcfr commented Feb 16, 2017

No description provided.

JoostJM and others added 6 commits February 16, 2017 04:14
Affects GLCM, GLRLM and GLSZM calculation. On initialization,
extension is loaded and enabled by default. If an error occurs,
a warning is logged and matrices will be calculated in
full-python mode.

Alternatively, full-python mode can be forced by a call
to `radiomics.enableCExtensions(False)`.

Added `requirements-setup.txt` to specify dependencies needed
during setup (requires numpy and cython)
Affects shape class. Loading, enabling and disabling is done
similarly to the `cmatrices` extension.
Instantiate temporary data array directly in C, without first
creating and then deleting a numpy array.
Testing compares matrices calculated by python algorithms to
those calculated by the C extension. Tests the calculation of
surface area in a similar manner. Testing fails if C extension
is not available or if any element in the matrix differs more
than 1e-3 from the corresponding element in the other matrix.
…ibution

Source distribution is generation using:

  python setup.py sdist

For reference:

  https://docs.python.org/2/distutils/sourcedist.html#commands

(cherry picked from commit c237fc0)
Python 3 has a revamped extension module initialization
system. (See PEP 3121.)"

See https://docs.python.org/3/howto/cporting.html#cporting-howto

(cherry picked from commit 40d92c0)
@JoostJM
Copy link
Collaborator

JoostJM commented Feb 16, 2017

This topic is an alternative to #200 and supersedes #158.

Add a new parameter ('enableCExtensions') to the settings that can be specified in the parameter file or via `kwargs` at initialization of `featureextractor`. Add this parameter also to the parameter file schema.
Apply this upon execute, as nothing happens if setting is not changed, but is checked regularly.

Finally, fix a small bug, where a incorrect warning is logged when `enableCExtensions` is called when extensions are already enabled. In this case, nothing should happen.
@JoostJM
Copy link
Collaborator

JoostJM commented Feb 16, 2017

Stats (Profiling 5 testcases using python 2.7):
GLCM 6913 ms -> 3 ms
GLSZM 12064 ms -> 58 ms
GLRLM 1850 ms -> 10 ms
Surface Area 3241 -> 1 ms

This PR adds two C Python extensions and associated tests:

  • _cmatrices: C implementation for matrix computation associated with GLCM, GLSZM. and GLRLM features.

  • _cshape: C implementation for Shape surface computation.

  • tests/test_cmatrices: testing for matrix equality: Tests whether the python generated matrix is equal to the matrix generated by the C implementations (allows for machine precision errors). Also compares the python and C generated surface computations similarly.

Details

  • Standardize function names for calculating matrices in python and with C extensions to _calculateMatrix and _calculateCMatrix, respectively.

  • Add docstring to C modules.

  • Use of C implementation optional. At initialization, the package tries to use C, but if loading has failed, or calculation is forced to python, python is used. Note that the import of _cmatrices is done after initialization of logger. This ensures error will be logged if import fails. This can be controlled through the enableCExtensions setting, which can be provided in a parameter file or as part of the kwargs dictionary in featureextractor.

  • GLSZM: Use "char" datatype for mask. (It is signed char in GLSZM).

  • C code is consistent with C89 convention. All variables (pointers for python objects) are initialized at top of each block.

  • GLSZM: Use calloc and free for the temporary array holding the calculated zones.

Optimizations

  • GLSZM:
  • Define temporary array for holding the calculated zones. During calculation, the matrix must be able to store all possible zones, ranging from zone size 1 to total number of voxels (Ns), for each gray level (Ng). In this case, the GLSZM would be initialized with size Ng * Ns, which is very memory intensive. Instead, use a temporary array of size (Ns * 2) + 1, which stores all calculated zones in pairs of 2 elements: the first element holds the gray level, the second the size of the calculated zone. The first element after the last zone is set to -1 to serve as a stop sign for the second function, which translates the temporary array into the final GLSZM, which can be directly initialized at optimum size.

  • Uses "while" loops. This allows to reduce the memory usage. We observed that with recursive functions it was 'unexpectedly' failing.

  • Optimized search that finds a new index to process in the region growing.

@jcfr
Copy link
Collaborator Author

jcfr commented Feb 16, 2017

And at some point, we should probably look into running the C code through tool like valgrind --memcheck.

@jcfr
Copy link
Collaborator Author

jcfr commented Feb 16, 2017

Otherwise, LGTM

@JoostJM
Copy link
Collaborator

JoostJM commented Feb 17, 2017

@jcfr Here are two stack overflow threads covering using valgrind for python C extensions checking:

I will try to look into this in the coming weeks. Would you propose to wait with merging until complete, or list it as an issue to be resolved a bit later?

@JoostJM JoostJM merged commit 9f3efd9 into master Feb 20, 2017
@jcfr jcfr deleted the c-matrices_revised-jcfr-20170216 branch October 11, 2017 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants