Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cythonize DirectionGetter and whatnot #1342

Merged
merged 8 commits into from
Nov 21, 2017
Merged

Conversation

nilgoyette
Copy link
Contributor

Disclaimer: there are 2 failing tests.

I discovered after much testing that calling def pyhon code from cython is slow. Much slower than one could think! I tested on LocalTracking and PFT Tracking, though pft doesn't appear in this PR. in both cases, the speedup is around 2x, less for LocalTracking, more for PFT. One could do better by removing checks and cythonizing more but I concentrated on the important parts: DirectionGetter::get_direction, PmfGen::get_pmf and calling C code as much as possible.

What you might not like:

  • I removed some checks and I would remove more! The verifications should be done one time at the start of the algo, not in the heart of the cython code!
  • I removed many calls to numpy and coded the functions myself. Calling numpy has a cost and it's now worth it, IMO. Of course, I don't like duplicating code from numpy... :(
  • I often have a def name() function that simply calls a cdef name_c() function, which is a fast version without check. This can cause some problems, and it is indeed causing a problem :) 2 LocalTracking tests are failing because of this. My cython code is directly calling the C version and the test overrides the python version, so SimpleTissueClassifier and SimpleDirectionGetter are ignored. Of course, I could call the def or cpdef version but there's a cost to this! To discuss.

What I don't like:

  • cdf = self._adj_matrix[tuple(*direction)] This will always be slow and (!) it's using 3 floats as a dictionnary key :( I don't know the code enough to fix it but it would be much better/faster if it was a simple integer. Maybe a struct: int index, double[3] direction?
  • Using np.random() or random(), not because it's random but because it's the only remaining slow call! I was tempted to use the old C rand() but I've heard so much bad things about it that I didn't dare. Do you think it's worth it?

@nilgoyette
Copy link
Contributor Author

BTW, I think the right merge order should be 1) this PR 2) PFT (merge master, fix conflits) 3) parallel local/pft tracking

Copy link
Contributor

@gabknight gabknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR. 2X is super!! See my comments, mostly PEP8 comments. This makes sense to recode those small numpy functions here, as they are called at every tracking step (sometimes more than once per step for PFT).

  • I don't grasp the problem with def name()versus cdef name_c(). Can we modify the tests here?

  • cdf = self._adj_matrix[tuple(*direction)]. Those 3 float could be changed to a single int. In fact, the direction is always one of the vertex of the sphere. To do that, we would need to keep track of the index of the vertex on the sphere (and the direction), instead of solely the direction. This is fine for all pmf-based model as they all used a discrete sphere. However, this is not the case for the direction of EuDX (PeaksAndMetricsDirectionGetter). This might need a bit of work to be done nicely. I suggest keeping this for another PR.

@@ -202,14 +209,15 @@ def _set_adjacency_matrix(self, sphere, cos_similarity):
each value is a boolean array indicating which directions are less than
max_angle degrees from the key"""
matrix = np.dot(sphere.vertices, sphere.vertices.T)
matrix = abs(matrix) >= cos_similarity
matrix = (abs(matrix) >= cos_similarity).astype('uint8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why uint8 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because cython doesn't recognize bool. Numpy does so the line was "ok" but there are warning when we try to use it later in cython.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thx.

cpdef trilinear_interpolate4d(double[:, :, :, :] data, double[:] point,
np.ndarray out=*)
cpdef trilinear_interpolate4d(
double[:, :, :, :] data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here the function parameters should be aligned with the (.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think PEP8 says anything on this subject. I can revert the [style] change but I personally find it harder to read because of the random numbers of vars on each line.

Idem for all other alignment comments.

cdef int _trilinear_interpolate_c_4d(double[:, :, :, :] data, double[:] point,
double[::1] result) nogil:
cdef int trilinear_interpolate4d_c(
double[:, :, :, :] data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignment here?

for i in range(3):
b[i] = a[i]


def local_tracker(
DirectionGetter dg, TissueClassifier tc,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignment here?

np.ndarray[np.float_t, ndim=2, mode='c'] streamline,
double stepsize,
int fixedstep):
cdef int _local_tracker(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignment here?

double point[3], dir[3], vs[3], voxdir[3]
double[::1] pview = point, dview = dir
void (*step)(double*, double*, double) nogil
void (*step)(double * , double*, double) nogil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original line was OK, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, yes, sorry, I accidently added a space :) Will remove.

cdef:
double result
int err

err = _trilinear_interpolate_c_4d(self.metric_map[..., None], point,
self.interp_out_view)
err = trilinear_interpolate4d_c(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignment here?

include_err = _trilinear_interpolate_c_4d(self.include_map[..., None],
point, self.interp_out_view)
include_err = trilinear_interpolate4d_c(
self.include_map[..., None],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignment here?

@nilgoyette
Copy link
Contributor Author

Thank you for looking at this @gabknight

My explanation of def name() and def name_c() was a little fast. Let me try again :) Take TissueClassifier for example:

cdef class TissueClassifier:
    cpdef TissueClass check_point(self, double[::1] point):
        if point.shape[0] != 3:
            raise ValueError("Point has wrong shape")

        return self.check_point_c(&point[0])

    cdef TissueClass check_point_c(self, double* point):
         pass

The tests define their own TissueClassifier::check_point (the python version, not _c). My new cython code directly calls the _c version, thus ignoring the new test version.

  • I can't call the python version because ... that's the point of this PR! We should never call Python code when running Cython if we want to be fast.
  • I could put the new tests definition in the .pxd/.pyx but, meh, it's ugly. I'm not sure what else I can do though.
  • Well, yes, we could use cpdef instead but I read that it's quite costly. And I can't use pointers in cpdef function.

I think I need to put more emphasis on this. The def name()/def name_c() method is indeed fast but you can't override those method anymore in normal python code.

@gabknight
Copy link
Contributor

OK, thx @nilgoyette. I get a better picture now. I think we will always want to override those methods in cython anyway.

I notice that line 94 in /reconst/peak_direction_getter.pyx should read:
cdef int get_direction_c(self, double* point, double* direction):
instead of:
cpdef int get_direction(self, double[::1] point, double[::1] direction) except -1:
This caused PeaksAndMetricsDirectionGetter to produce straight line only. This was not covered by tests.

I could do a PR on your branch to modify the failing tests and add the missing test for PeaksAndMetricsDirectionGetter.

@nilgoyette
Copy link
Contributor Author

@gabknight, if you want and have the time to do it, sure! Thank you.

Copy link
Member

@skoudoro skoudoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @nilgoyette!
I just made some cosmetic comment in your PR.

I removed some checks and I would remove more! ...

totally agree

I removed many calls to numpy and coded the functions myself.

I think for this case, we should put these functions in dipy.utils and document it like we did in dipy.utils.six. It can be quite confusing if we let it there. Moreover, if for any reason, someone wants to use this function somewhere else, it's quite ugly to call them from this package. What do you think? What is the good way for that @arokem ? any rules?

I often have a def name() function that simply calls a cdef name_c() function, which is a fast version without check. This can cause some problems....

I need to think more about it

Using np.random() or random(), .....

I think you should implement your own if you think that it will really improve your performance. Like you, I heard too many bad things about the old C implementation so +1 to avoid it.

@@ -137,7 +137,7 @@ def peak_directions(odf, sphere, relative_peak_threshold=.5,
elif n == 1:
return sphere.vertices[indices], values, indices

odf_min = odf.min()
odf_min = np.min(odf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious about the difference here, so I made the small test below and it seems that odf.min() is faster than np.min(odf) in general. (Numpy 1.13.1, Python 3.6). I do not know if it is true for other versions of Numpy or Python. Any idea?

import numpy as np
a = np.random.rand(3000,160000)
%timeit a.min()
172 ms ± 2.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit np.min(a)
177 ms ± 14.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, thank you for the test :) My goal, for once, wasn't to be faster. I changed it because this method is called from Cython and odf is not a np.array anymore, it's a memoryview. A memoryview doesn't have any of the numpy method (well, yes, some but now much).

tldr np.func(data) accepts np.array AND memoryview. data.func() doesn't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok ! Good to know, Thanks !

trilinear_interpolate4d_c(self.shcoeff, point, self.coeff)
for i in range(len_pmf):
_sum = 0
for j in range(self.B.shape[1]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like you did above, you should avoid to call shape function on each iteration so I think it will be better for your performance to put self.B.shape[1] in a size_t variable.

raise ValueError("%s is not a known basis type." % basis_type)
self._B, m, n = basis(sh_order, sphere.theta, sphere.phi)
import numpy as np
cimport numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be confusing. Can you use cnp as an alias?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can, but I'll wait for another request because there's no name collision and imo it's not confusing. The cimport version is only used for the variable type, so in cdef and argument definition. For example np.float_t clearly use the cimport, while np.dot doesn't.

As I said, I'll change it to cnp if it confuses the team!

@nilgoyette
Copy link
Contributor Author

I think for this case, we should put these functions in dipy.utils

Yes, sure! I put them just above my code and, of course, I shouldn't do that :) I think I even duplicated them 2 times in different files. Having a fast_numpy pyx library file is indeed a good idea.

Using np.random() or random()
I think you should implement your own if you think that it will really improve your performance.

Well, no, I don't actually want to code my own random! My question was more: do we have actual proofs that rand() sucks or any reason why we shouldn't use it? My research on this matter wasn't conclusive. I think it provides an uniform distribution but the interval is divided by RAND_MAX, which can be as "low" as 32767. Is this a problem? Any connoiseur here?

Fixed tracking tests and PeaksAndMetricsDirectionGetter
@codecov-io
Copy link

codecov-io commented Oct 27, 2017

Codecov Report

Merging #1342 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1342      +/-   ##
==========================================
- Coverage   87.01%   86.97%   -0.04%     
==========================================
  Files         228      227       -1     
  Lines       29086    28977     -109     
  Branches     3131     3119      -12     
==========================================
- Hits        25309    25204     -105     
+ Misses       3068     3067       -1     
+ Partials      709      706       -3
Impacted Files Coverage Δ
dipy/tracking/local/tests/test_local_tracking.py 97.32% <100%> (-0.13%) ⬇️
dipy/direction/peaks.py 79.32% <100%> (ø) ⬆️
dipy/tracking/local/__init__.py 100% <100%> (ø) ⬆️
dipy/core/graph.py 75% <0%> (+1.19%) ⬆️
...ipy/tracking/local/tests/test_tissue_classifier.py 95.74% <0%> (+2.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6f205f3...573e7e5. Read the comment docs.

@gabknight
Copy link
Contributor

@nilgoyette I added an issue for the dictionary keys #1372.

Copy link
Contributor

@gabknight gabknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I will update PR #1340 once merged.

@gabknight
Copy link
Contributor

@skoudoro do you have further comments on this? If not, I will go ahead merge this PR and move forward with PR #1340 and #1184.

@skoudoro
Copy link
Member

Yes, sure! I put them just above my code and, of course, I shouldn't do that :) I think I even duplicated them 2 times in different files. Having a fast_numpy pyx library file is indeed a good idea.

I just wonder if we should do it now or in a new PR. What do you think @gabknight @nilgoyette ?

Otherwise, LGTM, you can go ahead @gabknight

@nilgoyette
Copy link
Contributor Author

@skoudoro the move is done. You can merge, if you think everything is done.

@skoudoro
Copy link
Member

Thanks @nilgoyette @gabknight !

ShreyasFadnavis pushed a commit to ShreyasFadnavis/dipy that referenced this pull request Sep 20, 2018
Cythonize DirectionGetter and whatnot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants