New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit tests and some bug fixes #110
Unit tests and some bug fixes #110
Conversation
antnieszka
commented
Dec 5, 2016
•
edited
edited
- Implements Profile.__new__() is somewhat bugged when running simple tests using Python 2.7 #99
- Improves Add unit tests #19
- Fixes Make beprof working with numpy 1.8 #33, Remove scipy from appveyor build #117, Test inclusion in package #111, Curve object creation #116, Provide some better error handling for non-float data #113, Travis 'docs' started to fail? #118, Investigate print(Profile object) error #115
@@ -113,7 +113,8 @@ def change_domain(self, domain): | |||
raise ValueError('in change_domain():' 'the old domain does not include the new one') | |||
|
|||
y = np.interp(domain, self.x, self.y) | |||
obj = self.__class__(np.stack((domain, y), axis=1), **self.__dict__['metadata']) | |||
# np.dstack(...)[0] is used to extract nested array (previously used np.stack which behaved different) | |||
obj = self.__class__(np.dstack((domain, y))[0], **self.__dict__['metadata']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment why dstack is used here ? Why is it needed, what is it doing ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like "stack is available from numpy 1.10 and to preserve compatibility we use older function which does the same thing as np.stack((arrays), axis=1)" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, I had in mind a description for a fellow programmer who takes over the project and is trying to figure out what is going on in these few lines of code, sth like:
We need to ... domain and y because .... We pass it as argument to ... because it takes .... Dstast is ... and this is the reason why we use [0]
I'm writing missing test for normalize and this came out: >>> a
Profile([[ 1, 1],
[ 2, 20],
[ 3, 40]])
>>> a.normalize(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ant6/.virtualenvs/in_beprof3/lib/python3.4/site-packages/beprof/profile.py", line 150, in normalize
self.y /= ave
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
>>> ave = np.average(a.y[np.fabs(a.x) <= 0])
>>> ave
nan
>>> a /= ave
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind'' Our method looks like this: def normalize(self, dt):
"""
Normalize to 1 over [-dt, +dt] area
:param dt:
:return:
"""
logger.info('Running {0}.normalize(dt={1})'.format(self.__class__, dt))
try:
ave = np.average(self.y[np.fabs(self.x) <= dt])
except RuntimeWarning as e:
logger.error('in normalize(). self class is {0}, dt={1}'.format(self.__class__, dt))
raise Exception("Scaling factor error:\n" + str(e))
self.y /= ave # <-- extra try-except? So it's not idiot-proof enough. |
It's something with division... >>> a
Profile([[ 1, 1],
[ 2, 20],
[ 3, 40]])
>>> ave = np.average(a.y[np.fabs(a.x) <= 10])
>>> ave
20.333333333333332
>>> a.y /= ave
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
|
@grzanka that's sad:
More here: dipy/dipy#730 and numpy/numpy#6464
when |
In other words - Profile.y data has to be floats or it errors... |
Current coverage is 67.96% (diff: 98.21%)@@ master #110 diff @@
==========================================
Files 5 7 +2
Lines 502 768 +266
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 211 522 +311
+ Misses 291 246 -45
Partials 0 0
|
Typically we will work with floats. But you are right - we should make a separate issue for Y data with ints and methods which use division. |
assert np.isnan(p.x_at_y(4.9999999)) | ||
assert np.isnan(p.x_at_y(20.0000001)) | ||
assert np.isnan(p.x_at_y(7.5, reverse=True)) | ||
assert np.isnan(self.p.x_at_y(-1.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why assert
instead of self.assertTrue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably inspired by numpy docs... I'll change it in the upcoming commit.
@grzanka I think, we can now change required version of numpy in setup.py to 1.8.0? |
Yes, change it please |
@@ -113,7 +113,13 @@ def change_domain(self, domain): | |||
raise ValueError('in change_domain():' 'the old domain does not include the new one') | |||
|
|||
y = np.interp(domain, self.x, self.y) | |||
obj = self.__class__(np.stack((domain, y), axis=1), **self.__dict__['metadata']) | |||
# We need to join together domain and y because we are recreating 2 dimensional Curve object | |||
# (we pass it as argument to self.__class__ to do so and it takes 2 or 3 dimensional arrays as argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a discussion on curve with 3-D data. Check if it is implemented correctly, test methods etc... - separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: #114
@@ -144,11 +146,14 @@ def normalize(self, dt): | |||
except RuntimeWarning as e: | |||
logger.error('in normalize(). self class is {0}, dt={1}'.format(self.__class__, dt)) | |||
raise Exception("Scaling factor error:\n" + str(e)) | |||
self.y /= ave | |||
self.y = self.y / ave |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change was made ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing error mentioned above in comments:
>>> c = np.array([1,2,3])
>>> c
array([1, 2, 3])
>>> c /= 2.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ufunc 'divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
>>> c
array([1, 2, 3])
>>> c = c / 2.
>>> c
array([ 0.5, 1. , 1.5])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we have only integers - should I roll back this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can keep this change, but add info in docstring what happens if you have array of integers, explaining what if you:
- normalize with integer factor
- normalize with floating-point factor
But before proceeding further: consider these two snippets: self.y /= ave
and self.y = self.y / ave
- which one requires additional memory allocation ?
What if self.y
is large array or floats (say 1e6) and we normalize by factor 0.5. Which one of abovementioned snippets will run faster ?
Reading https://docs.scipy.org/doc/numpy/reference/internals.html might help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have to ideas for rescale
and normalize
divisions:
def rescale(self, factor=1.0, allow_cast=True):
if allow_cast: # will work in 99.9% cases
try:
self.y /= factor
# there is this case, where "TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced
# to provided output parameter (typecode 'l') according to the casting rule ''same_kind''" i thrown
except TypeError as e:
logger.info("Division in place is impossible, casting...\n%s" % e)
self.y = self.y / factor
elif not np.issubdtype(self.y.dtype.type, type(factor)):
raise TypeError("Type mismatch - array dtype: %s, factor type: %s\nallow_cast flag set to True should help"
% (self.y.dtype.type, type(factor)))
else:
try:
self.y /= factor
except TypeError as e:
raise TypeError("%s\nallow_cast flag set to True should help" % e)
and simpler(?) one:
def rescale(self, factor=1.0, allow_cast=True):
try:
self.y /= factor
except TypeError as e:
logger.info("Division in place is impossible.\n%s" % e)
if allow_cast:
self.y = self.y / factor
else:
raise TypeError("%s\nallow_cast flag set to True should help" % e)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second one better, but please add also proper docstring with doctests (see i.e. change_domain and rebinned methods).
Read also this: http://www.ianbicking.org/blog/2007/09/re-raising-exceptions.html
https://blog.ionelmc.ro/2014/08/03/the-most-underrated-feature-in-python-3/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, the second came to my mind when I was writing this reply, so...
What about the div zero
case? Should we somehow catch it, or just let it error or whatever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Print nice error message and let numpy exception be thrown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the above is not yet ready...
>a
Curve([[ 0, 0],
[ 5, 5],
[10, 10]])
>a.rescale(3.7)
>a
Curve([[ 0, 0],
[ 5, 1],
[10, 2]])
>a.rescale(22)
>a
Curve([[ 0, 0],
[ 5, 0],
[10, 0]])
# todo: investigate why super().__str__() errors in python 2.7 | ||
# after change of implementation of __new__ | ||
ret = curve.Curve.__str__(self) | ||
# ret = super(curve.Curve).__str__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you added second commented line ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the previous version left to discussion, but since we're using git - it's pointless, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove it, also "investigate TODO" can directly go as new issue, instead of dirty comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It already has issue: #99
|
||
# case - less or equal to 0 | ||
p1 = Profile([[1, 1], [2, 20], [3, 40]]) | ||
p1.normalize(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to normalize this way ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but we probably should inform user about it. Maybe raise ValueError()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Such exception is OK, but add also some message in it, guiding user what type of input is expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably "Expected positive input" would be enough?
…d minimal version of numpy to 1.8
if allow_cast: | ||
self.y = self.y / factor | ||
else: | ||
logger.error("allow_cast flag set to True should help") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does code execution continue or stops with some exception being rethrown ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It stops, we could go for re-raise at the end - after calculating with self.y = self.y / factor
or ignore TypeErrors thrown by self.y /= factor
or... just log it?
try:
self.y /= factor
except TypeError as e:
logger.error("Division in place is impossible")
if allow_cast:
self.y = self.y / factor
else:
logger.error("allow_cast flag set to True should help")
raise e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore error case (clean, but...):
>>> a
Curve([[1, 5],
[2, 6],
[3, 7],
[4, 8]])
>>> a.rescale(2)
Division in place is impossible
>>> a
Curve([[1, 2],
[2, 3],
[3, 3],
[4, 4]])
>>> a.rescale(2, allow_cast=False)
Division in place is impossible
allow_cast flag set to True should help
>>> a
Curve([[1, 2],
[2, 3],
[3, 3],
[4, 4]])
Works like this for code below:
def rescale(self, factor=1.0, allow_cast=True):
try:
self.y /= factor
except TypeError as e:
logger.error("Division in place is impossible")
if allow_cast:
self.y = self.y / factor
else:
logger.error("allow_cast flag set to True should help")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is - at least for me if something throws exception - it probably did not finish it's job (and possibly rolled back). So raising exception after successful division is... could be misinterpreted?
self.y = self.y / ave | ||
else: | ||
logger.error("Division in place impossible - allow_cast flag set to True should help") | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the correct way of re-throwing an exception ?
See i.e. http://blog.bstpierre.org/python-exception-handling-cleanup-and-reraise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. Another try-catch would be nice and reporting original exception - like reraiser4()
?
Edit: some of this examples are no longer supported e.g. in Python3.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For:
try:
self.y /= factor
except TypeError as e:
# (_, _, traceback) = sys.exc_info()
logger.warning("Division in place is impossible: {0}".format(e))
if allow_cast:
self.y = self.y / factor
else:
logger.error("allow_cast flag set to True should help")
raise
result:
>>> a.rescale(0, False)
Division in place is impossible: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
allow_cast flag set to True should help
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/ant6/PycharmProjects/beprof/beprof/curve.py", line 111, in rescale
self.y /= factor
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
result2
>>> a.rescale(0, True)
Division in place is impossible: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
/home/ant6/PycharmProjects/beprof/beprof/curve.py:116: RuntimeWarning: divide by zero encountered in true_divide
self.y = self.y / factor
/home/ant6/PycharmProjects/beprof/beprof/curve.py:116: RuntimeWarning: invalid value encountered in true_divide
self.y = self.y / factor
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/ant6/PycharmProjects/beprof/beprof/curve.py", line 111, in rescale
self.y /= factor
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
Both exceptions are reported
@grzanka ready for re-review |