Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF: nan entries cause segfault #690

Merged
merged 4 commits into from Aug 1, 2015
Merged

Conversation

omarocegueda
Copy link
Contributor

This fixes the segmentation fault caused when attempting to interpolate an image at nan entries.

@@ -144,6 +144,8 @@ def set_affine(self, affine):
self.affine_inv = None
return
try:
if np.isnan(np.sum(affine)):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using sum is faster than, e.g., min for finding nan's, according to this:
http://stackoverflow.com/questions/6736590/fast-check-for-nan-in-numpy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely these two lines should go above the try block? I think the np.sum trick is to avoid large temporary arrays, for me a simple np.any(np.isnan(x)) is faster still, and I think it's easier to read.

@arokem
Copy link
Contributor

arokem commented Jul 29, 2015

Does this actually address all of the errors mentioned in #654?

In particular, one of the things reported there was an error in test_imaffine:test_mi_gradient (for example, see https://travis-ci.org/nipy/dipy/jobs/73233253#L2162), which is not a segfault. Is this addressed through these changes?

@omarocegueda
Copy link
Contributor Author

@arokem
Copy link
Contributor

arokem commented Jul 29, 2015

OK - just wanted to make sure that I have the full picture.
On Wed, Jul 29, 2015 at 4:38 PM, Omar Ocegueda notifications@github.com
wrote:

I don't know if this will solve the other bug, but the segmentation fault
occurs in several buildbots:

http://nipy.bic.berkeley.edu/builders/dipy-bdist32-33/builds/138/steps/shell_8/logs/stdio

http://nipy.bic.berkeley.edu/builders/dipy-py2.6-32/builds/557/steps/shell_6/logs/stdio

http://nipy.bic.berkeley.edu/builders/dipy-py2.7-osx-10.7/builds/310/steps/shell_6/logs/stdio

http://nipy.bic.berkeley.edu/builders/dipy-py2.7-osx-10.8/builds/378/steps/shell_6/logs/stdio

Yes - I expect this PR will solve these issues. Have you tried a a
try_branch with any of these?

The other bug appears in this buildbot:

http://nipy.bic.berkeley.edu/builders/dipy-py3.4/builds/192/steps/shell_6/logs/stdio
@matthew-brett https://github.com/matthew-brett, can I have access to
that machine too?
Do you guys think it will be better to add both fixes to this PR?

Do you have some sense of what is causing it? I don't think it matters much
if this one also gets resolved here: If you think this resolves the
segfaults, I am ready to merge this one, and you can take the other fix on
another PR, but if you prefer to keep going here, that's fine too. Your
call.


Reply to this email directly or view it on GitHub
#690 (comment).

@omarocegueda
Copy link
Contributor Author

I didn't run a try_branch but I tested the fix manually on the buildbot Matthew gave access to (It didn't brake after the fix).

I'm not sure about the other bug, it may be as simple as a precission issue (e.g. the result was something like 0.9989, very close to the assertion value but still failing). I need to reproduce the bug and find out why it fails there and not in other platforms.

@matthew-brett
Copy link
Contributor

This is just option 3?

@matthew-brett
Copy link
Contributor

Omar - I'll check with the owner of that buildbot machine and the OSX machine - will get back to you.

@matthew-brett
Copy link
Contributor

Now I think I see that this is option 2 and 3.

@omarocegueda
Copy link
Contributor Author

Thanks @matthew-brett!, yes this is option 2 and 3. I just reproduced the bug on the buildbot. The root cause is that iteration over dictionary keys is no longer deterministic in Python 3:
http://stackoverflow.com/questions/14956313/dictionary-ordering-non-deterministic-in-python3

This explains the "intermitent" behavior. The assertion was failing because the inner product between numeric and analytical gradients was about 0.994. I think it is safe to reduce the threshold to 0.99. For the non-deterministic behavior, I guess the way to go is to replace the dictionary with a list. What do you think?

@matthew-brett
Copy link
Contributor

Where is the iteration-over-dictionary code?

@omarocegueda
Copy link
Contributor Author

@matthew-brett
Copy link
Contributor

How about for ttype in sorted(factors) to make it deterministic?

@omarocegueda
Copy link
Contributor Author

sure! that's fine too. I'll do the fix.

@omarocegueda
Copy link
Contributor Author

Alright!, this now addresses both failures (checked directly on the failing buildbots)

@@ -172,7 +172,7 @@ def test_align_origins_3d():

def test_affreg_all_transforms():
# Test affine registration using all transforms with typical settings
for ttype in factors.keys():
for ttype in sorted(factors):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment to explain why the factor keys must be sorted (in order to preserve relationship of random numbers to dict key / values)? Ditto for other instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments added. Actually only one of the tests needed the sort, but I think it is ok to still sort the keys in the other three places, just in case we extend the tests in the future.

@matthew-brett
Copy link
Contributor

There seems to be a new merge commit here - 33451ca - introducing lots of changes not relevant to this PR?

@omarocegueda
Copy link
Contributor Author

Right, now I see what happened... fortunately there is cherry-pick!
Thanks!

@@ -143,6 +143,8 @@ def set_affine(self, affine):
if self.affine is None:
self.affine_inv = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to do small suggestions not relevant to this PR - but how about putting this line after self.affine = affine so that self.affine_inv is always defined (as None or an affine) even if the inverse raises an error. Otherwise it's set to whatever it was before, which could be confusing. Probably also worth noting that the method sets self.affine_inv in the docstring too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, don't worry, I'll do a PR for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthew-brett
Copy link
Contributor

Great - thanks for fixing - merging now.

matthew-brett added a commit that referenced this pull request Aug 1, 2015
MRG: fix bugs in affine registration

NaN entries in affines causing segfault on some platforms.

Relaxing similarity threshold for random number tests.
@matthew-brett matthew-brett merged commit 4943433 into dipy:master Aug 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants