New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BF: nan entries cause segfault #690
Conversation
@@ -144,6 +144,8 @@ def set_affine(self, affine): | |||
self.affine_inv = None | |||
return | |||
try: | |||
if np.isnan(np.sum(affine)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using sum
is faster than, e.g., min
for finding nan's, according to this:
http://stackoverflow.com/questions/6736590/fast-check-for-nan-in-numpy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely these two lines should go above the try
block? I think the np.sum
trick is to avoid large temporary arrays, for me a simple np.any(np.isnan(x))
is faster still, and I think it's easier to read.
Does this actually address all of the errors mentioned in #654? In particular, one of the things reported there was an error in test_imaffine:test_mi_gradient (for example, see https://travis-ci.org/nipy/dipy/jobs/73233253#L2162), which is not a segfault. Is this addressed through these changes? |
I don't know if this will solve the other bug, but the segmentation fault occurs in several buildbots: The other bug appears in this buildbot: |
OK - just wanted to make sure that I have the full picture.
|
I didn't run a I'm not sure about the other bug, it may be as simple as a precission issue (e.g. the result was something like 0.9989, very close to the assertion value but still failing). I need to reproduce the bug and find out why it fails there and not in other platforms. |
This is just option 3? |
Omar - I'll check with the owner of that buildbot machine and the OSX machine - will get back to you. |
Now I think I see that this is option 2 and 3. |
Thanks @matthew-brett!, yes this is option 2 and 3. I just reproduced the bug on the buildbot. The root cause is that iteration over dictionary keys is no longer deterministic in Python 3: This explains the "intermitent" behavior. The assertion was failing because the inner product between numeric and analytical gradients was about 0.994. I think it is safe to reduce the threshold to 0.99. For the non-deterministic behavior, I guess the way to go is to replace the dictionary with a list. What do you think? |
Where is the iteration-over-dictionary code? |
Here is the dictionary: |
How about |
sure! that's fine too. I'll do the fix. |
Alright!, this now addresses both failures (checked directly on the failing buildbots) |
@@ -172,7 +172,7 @@ def test_align_origins_3d(): | |||
|
|||
def test_affreg_all_transforms(): | |||
# Test affine registration using all transforms with typical settings | |||
for ttype in factors.keys(): | |||
for ttype in sorted(factors): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment to explain why the factor keys must be sorted (in order to preserve relationship of random numbers to dict key / values)? Ditto for other instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments added. Actually only one of the tests needed the sort, but I think it is ok to still sort the keys in the other three places, just in case we extend the tests in the future.
There seems to be a new merge commit here - 33451ca - introducing lots of changes not relevant to this PR? |
Right, now I see what happened... fortunately there is |
@@ -143,6 +143,8 @@ def set_affine(self, affine): | |||
if self.affine is None: | |||
self.affine_inv = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to do small suggestions not relevant to this PR - but how about putting this line after self.affine = affine
so that self.affine_inv
is always defined (as None or an affine) even if the inverse raises an error. Otherwise it's set to whatever it was before, which could be confusing. Probably also worth noting that the method sets self.affine_inv
in the docstring too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, don't worry, I'll do a PR for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great - thanks for fixing - merging now. |
MRG: fix bugs in affine registration NaN entries in affines causing segfault on some platforms. Relaxing similarity threshold for random number tests.
This fixes the segmentation fault caused when attempting to interpolate an image at nan entries.