Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit tests and some bug fixes #110

Merged
merged 24 commits into from Dec 10, 2016

Conversation

@antnieszka antnieszka added this to the 0.1.1 milestone Dec 5, 2016
@@ -113,7 +113,8 @@ def change_domain(self, domain):
raise ValueError('in change_domain():' 'the old domain does not include the new one')

y = np.interp(domain, self.x, self.y)
obj = self.__class__(np.stack((domain, y), axis=1), **self.__dict__['metadata'])
# np.dstack(...)[0] is used to extract nested array (previously used np.stack which behaved different)
obj = self.__class__(np.dstack((domain, y))[0], **self.__dict__['metadata'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment why dstack is used here ? Why is it needed, what is it doing ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like "stack is available from numpy 1.10 and to preserve compatibility we use older function which does the same thing as np.stack((arrays), axis=1)" ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, I had in mind a description for a fellow programmer who takes over the project and is trying to figure out what is going on in these few lines of code, sth like:

We need to ... domain and y because .... We pass it as argument to ... because it takes .... Dstast is ... and this is the reason why we use [0]

@antnieszka
Copy link
Contributor Author

I'm writing missing test for normalize and this came out:

>>> a
Profile([[ 1,  1],
       [ 2, 20],
       [ 3, 40]])
>>> a.normalize(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ant6/.virtualenvs/in_beprof3/lib/python3.4/site-packages/beprof/profile.py", line 150, in normalize
    self.y /= ave
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
>>> ave = np.average(a.y[np.fabs(a.x) <= 0])
>>> ave
nan
>>> a /= ave
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

Our method looks like this:

    def normalize(self, dt):
        """
        Normalize to 1 over [-dt, +dt] area
        :param dt:
        :return:
        """
        logger.info('Running {0}.normalize(dt={1})'.format(self.__class__, dt))
        try:
            ave = np.average(self.y[np.fabs(self.x) <= dt])
        except RuntimeWarning as e:
            logger.error('in normalize(). self class is {0}, dt={1}'.format(self.__class__, dt))
            raise Exception("Scaling factor error:\n" + str(e))
        self.y /= ave  # <-- extra try-except?

So it's not idiot-proof enough.

@antnieszka
Copy link
Contributor Author

antnieszka commented Dec 5, 2016

It's something with division...

>>> a
Profile([[ 1,  1],
       [ 2, 20],
       [ 3, 40]])
>>> ave = np.average(a.y[np.fabs(a.x) <= 10])
>>> ave
20.333333333333332
>>> a.y /= ave
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
(in_beprof3){16-12-05 22:18} ~ ant6% ➜ python -V
Python 3.4.5
(in_beprof3){16-12-05 22:18} ~ ant6% ➜ pip freeze
beprof==0.post0.dev1801480862596
numpy==1.11.2

@antnieszka
Copy link
Contributor Author

antnieszka commented Dec 5, 2016

@grzanka that's sad:

>>> import numpy as np
>>> a = np.array([1,2,3])
>>> b = 2.0
>>> a /= b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

More here: dipy/dipy#730 and numpy/numpy#6464
And this works:

>>> a = np.array([1.0, 2.0, 3.0])
>>> a /= 2.0
>>> a
array([ 0.5,  1. ,  1.5])

when a is filled with floats...

@antnieszka
Copy link
Contributor Author

In other words - Profile.y data has to be floats or it errors...

@codecov-io
Copy link

codecov-io commented Dec 5, 2016

Current coverage is 67.96% (diff: 98.21%)

Merging #110 into master will increase coverage by 25.93%

@@             master       #110   diff @@
==========================================
  Files             5          7     +2   
  Lines           502        768   +266   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            211        522   +311   
+ Misses          291        246    -45   
  Partials          0          0          

Powered by Codecov. Last update 298c6d6...2315c9e

@grzanka
Copy link
Collaborator

grzanka commented Dec 6, 2016

Typically we will work with floats. But you are right - we should make a separate issue for Y data with ints and methods which use division.
There should be better user-friendy error handling and we can also provide some optional casting to float in such method.

assert np.isnan(p.x_at_y(4.9999999))
assert np.isnan(p.x_at_y(20.0000001))
assert np.isnan(p.x_at_y(7.5, reverse=True))
assert np.isnan(self.p.x_at_y(-1.0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why assert instead of self.assertTrue ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably inspired by numpy docs... I'll change it in the upcoming commit.

@antnieszka
Copy link
Contributor Author

@grzanka I think, we can now change required version of numpy in setup.py to 1.8.0?

@grzanka
Copy link
Collaborator

grzanka commented Dec 6, 2016

Yes, change it please

@@ -113,7 +113,13 @@ def change_domain(self, domain):
raise ValueError('in change_domain():' 'the old domain does not include the new one')

y = np.interp(domain, self.x, self.y)
obj = self.__class__(np.stack((domain, y), axis=1), **self.__dict__['metadata'])
# We need to join together domain and y because we are recreating 2 dimensional Curve object
# (we pass it as argument to self.__class__ to do so and it takes 2 or 3 dimensional arrays as argument.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a discussion on curve with 3-D data. Check if it is implemented correctly, test methods etc... - separate issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: #114

@@ -144,11 +146,14 @@ def normalize(self, dt):
except RuntimeWarning as e:
logger.error('in normalize(). self class is {0}, dt={1}'.format(self.__class__, dt))
raise Exception("Scaling factor error:\n" + str(e))
self.y /= ave
self.y = self.y / ave
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change was made ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing error mentioned above in comments:

>>> c = np.array([1,2,3])
>>> c
array([1, 2, 3])
>>> c /= 2.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
>>> c
array([1, 2, 3])
>>> c =  c / 2.
>>> c
array([ 0.5,  1. ,  1.5])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we have only integers - should I roll back this one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can keep this change, but add info in docstring what happens if you have array of integers, explaining what if you:

  • normalize with integer factor
  • normalize with floating-point factor

But before proceeding further: consider these two snippets: self.y /= ave and self.y = self.y / ave - which one requires additional memory allocation ?

What if self.y is large array or floats (say 1e6) and we normalize by factor 0.5. Which one of abovementioned snippets will run faster ?

Reading https://docs.scipy.org/doc/numpy/reference/internals.html might help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have to ideas for rescale and normalize divisions:

    def rescale(self, factor=1.0, allow_cast=True):
        if allow_cast:  # will work in 99.9% cases
            try:
                self.y /= factor
            # there is this case, where "TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced
            # to provided output parameter (typecode 'l') according to the casting rule ''same_kind''" i thrown
            except TypeError as e: 
                logger.info("Division in place is impossible, casting...\n%s" % e)
                self.y = self.y / factor
        elif not np.issubdtype(self.y.dtype.type, type(factor)):
            raise TypeError("Type mismatch - array dtype: %s, factor type: %s\nallow_cast flag set to True should help"
                            % (self.y.dtype.type, type(factor)))
        else:
            try:
                self.y /= factor
            except TypeError as e:
                raise TypeError("%s\nallow_cast flag set to True should help" % e)

and simpler(?) one:

def rescale(self, factor=1.0, allow_cast=True):
    try:
        self.y /= factor
    except TypeError as e: 
        logger.info("Division in place is impossible.\n%s" % e)
        if allow_cast:
            self.y = self.y / factor
        else:
            raise TypeError("%s\nallow_cast flag set to True should help" % e)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second one better, but please add also proper docstring with doctests (see i.e. change_domain and rebinned methods).

Read also this: http://www.ianbicking.org/blog/2007/09/re-raising-exceptions.html
https://blog.ionelmc.ro/2014/08/03/the-most-underrated-feature-in-python-3/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the second came to my mind when I was writing this reply, so...

What about the div zero case? Should we somehow catch it, or just let it error or whatever?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print nice error message and let numpy exception be thrown

Copy link
Contributor Author

@antnieszka antnieszka Dec 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the above is not yet ready...

>a
Curve([[ 0,  0],
       [ 5,  5],
       [10, 10]])
>a.rescale(3.7)
>a
Curve([[ 0,  0],
       [ 5,  1],
       [10,  2]])
>a.rescale(22)
>a
Curve([[ 0,  0],
       [ 5,  0],
       [10,  0]])

# todo: investigate why super().__str__() errors in python 2.7
# after change of implementation of __new__
ret = curve.Curve.__str__(self)
# ret = super(curve.Curve).__str__()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you added second commented line ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was the previous version left to discussion, but since we're using git - it's pointless, yes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove it, also "investigate TODO" can directly go as new issue, instead of dirty comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It already has issue: #99


# case - less or equal to 0
p1 = Profile([[1, 1], [2, 20], [3, 40]])
p1.normalize(-1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to normalize this way ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but we probably should inform user about it. Maybe raise ValueError()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Such exception is OK, but add also some message in it, guiding user what type of input is expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably "Expected positive input" would be enough?

if allow_cast:
self.y = self.y / factor
else:
logger.error("allow_cast flag set to True should help")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does code execution continue or stops with some exception being rethrown ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It stops, we could go for re-raise at the end - after calculating with self.y = self.y / factor or ignore TypeErrors thrown by self.y /= factor or... just log it?

        try:
            self.y /= factor
        except TypeError as e:
            logger.error("Division in place is impossible")
            if allow_cast:
                self.y = self.y / factor
            else:
                logger.error("allow_cast flag set to True should help")
            raise e

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore error case (clean, but...):

>>> a
Curve([[1, 5],
       [2, 6],
       [3, 7],
       [4, 8]])
>>> a.rescale(2)
Division in place is impossible
>>> a
Curve([[1, 2],
       [2, 3],
       [3, 3],
       [4, 4]])
>>> a.rescale(2, allow_cast=False)
Division in place is impossible
allow_cast flag set to True should help
>>> a
Curve([[1, 2],
       [2, 3],
       [3, 3],
       [4, 4]])

Works like this for code below:

    def rescale(self, factor=1.0, allow_cast=True):
        try:
            self.y /= factor
        except TypeError as e:
            logger.error("Division in place is impossible")
            if allow_cast:
                self.y = self.y / factor
            else:
                logger.error("allow_cast flag set to True should help")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is - at least for me if something throws exception - it probably did not finish it's job (and possibly rolled back). So raising exception after successful division is... could be misinterpreted?

self.y = self.y / ave
else:
logger.error("Division in place impossible - allow_cast flag set to True should help")
raise e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the correct way of re-throwing an exception ?
See i.e. http://blog.bstpierre.org/python-exception-handling-cleanup-and-reraise

Copy link
Contributor Author

@antnieszka antnieszka Dec 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Another try-catch would be nice and reporting original exception - like reraiser4()?

Edit: some of this examples are no longer supported e.g. in Python3.4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For:

        try:
            self.y /= factor
        except TypeError as e:
            # (_, _, traceback) = sys.exc_info()
            logger.warning("Division in place is impossible: {0}".format(e))
            if allow_cast:
                self.y = self.y / factor
            else:
                logger.error("allow_cast flag set to True should help")
            raise

result:

>>> a.rescale(0, False)
Division in place is impossible: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
allow_cast flag set to True should help
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/ant6/PycharmProjects/beprof/beprof/curve.py", line 111, in rescale
    self.y /= factor
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

result2

>>> a.rescale(0, True)
Division in place is impossible: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
/home/ant6/PycharmProjects/beprof/beprof/curve.py:116: RuntimeWarning: divide by zero encountered in true_divide
  self.y = self.y / factor
/home/ant6/PycharmProjects/beprof/beprof/curve.py:116: RuntimeWarning: invalid value encountered in true_divide
  self.y = self.y / factor
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/ant6/PycharmProjects/beprof/beprof/curve.py", line 111, in rescale
    self.y /= factor
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

Both exceptions are reported

@antnieszka
Copy link
Contributor Author

@grzanka ready for re-review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants