Feature Request - tolerance (in datetime or any other float) #30

egemenimre · 2020-05-31T10:49:44Z

Hi,

Thank you for the excellent library. I am writing a library that harnesses astropy Time classes and I needed TimeInterval and TimeIntervalList classes. Your library is exactly what I need. However, I find that due to floating-point artefacts, the functionality is not always as expected.

My question is, would you consider augmenting the relevant methods with an optional, user-defined tolerance, passed as a parameter?

Details:
There is a problem I'm having with Time classes. Let's say you have a time at t0 and you want to test contains() this for an interval of (t_0, t_1] i.e., interval.contains(t_0). Due to round-off errors and suchlike, the answer may or may not be False.

I assume that a similar error is possible in the native datetime class, or any other problem where the underlying comparison is handled with a float.

I found that I have to do a tolerance check at the edges. As an example, please have a look at my is_in_interval method, which the equivalent of portion's contains():

def is_in_interval(self, time):
        # check upper and lower boundaries for tolerance
        if _are_times_almost_equal(self.start, time):
            # time at starting edge - is edge closed?
            if self._interval.left:
                return True
            else:
                return False

        if _are_times_almost_equal(self.end, time):
            # time at end edge - is edge closed?
            if self._interval.right:
                return True
            else:
                return False

        # Time not edges, do a regular check
        return self._interval.contains(time)

And the _are_times_almost_equal is a simple tolerance check (where EPS_TIME is a very small duration 10 nanosec for me):

def _are_times_almost_equal(t1, t2):
    return abs(t1 - t2) < _EPS_TIME

Edit: Code formatting mishap fixed

The text was updated successfully, but these errors were encountered:

AlexandreDecan · 2020-05-31T12:28:14Z

Hello,

Thank you for this feedback! I understand that floating-point artifacts could be an issue in these cases. However, dealing with these issues is slightly out-of-scope of portion since we do not make any assumption about the objects being compared. For some of them, a tolerance threshold is not required, while for some other, the tolerance is not even a single value. Moreover, supporting such a threshold means that objects require to support subtraction (or more generally, a notion of distance between two of these objects) which is a strong assumption unfortunately :-/

What I can suggest you however is to implement a tiny wrapper for your objects, in which you override the various comparison methods (e.g. __eq__ obviously, but also, __lt__, __gt__, etc. if needed, depending on which ones are already implemented in your objects) such that their computation can take into account the "tolerance threshold" before delegating the comparison to the wrapped object.

I hope this helps you! Meanwhile, I'll think about the possibility to support such wrap/unwrap operations on the fly in portion, since it is also something that could be useful for #29.

egemenimre · 2020-05-31T12:58:31Z

Hallo,

Thanks for the prompt reply. I understand that the beauty of portion is generality, and I can see how subtraction can be a limitation. Unfortunately, I do not have control of the Time class from astropy to implement a custom __eq__ (with tolerances) and I wouldn't want to use a subclass just for that. (Is this what you meant?)

So, I will have to guard against them in my implementation of TimeInterval, which uses portion under the hood. Thanks anyway! :)

AlexandreDecan · 2020-06-01T12:24:25Z

Hello,

Not necessarily a subclass, but a thin wrapper like this:

class Wrapper:
  def __init__(self, wrapped):
    self.wrapped = wrapped
  def __eq__(self, other): 
    # fancy stuff here to support comparison

It should also be possible to define __getattr__ to delegate all attributes/methods access to the wrapped object to make it more convenient to use them. Obviously, all of this requires to wrap/unwrap objects before using them with portion :-/ (that could be inconvenient though).

AlexandreDecan · 2020-06-08T14:00:47Z

Hello,

Did the workaround work? Can I close this issue for now? :-) It's on my todo-list to find a way to deal with this kind of situations, but I've no short plan to work on it :(

egemenimre · 2020-06-08T18:11:19Z

Sure, please do so. Thanks!

…

On Mon, 8 Jun 2020 at 16:01, Alexandre Decan ***@***.***> wrote: Hello, Did the workaround work? Can I close this issue for now? :-) It's on my todo-list to find a way to deal with this kind of situations, but I've no short plan to work on it :( — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#30 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AO7QEK23H3N54GT5XPYKDY3RVTVJ7ANCNFSM4NPDOEKA> .

eehusky · 2020-09-07T16:35:44Z

It aint pretty (I had a specific use case in mind and made the minimal changes i needed to make it work) but I forked your library and made some changes to support using math.isclose for floating point operations. I had to drop all support for intervals based on types that can not be coerced into a floating point and needed to make some tweaks to _Pinf to make it work seamlessly with float('inf')

Essentially, this

# Item is a value
for i in self._intervals:
    left = (item >= i.lower) if i.left == Bound.CLOSED else (item > i.lower)
    right = (item <= i.upper) if i.right == Bound.CLOSED else (item < i.upper)
    if left and right:
        return True
return False

became this.

# Item is a value
for i in self._intervals:
    left = op.ge(item, i.lower) if i.left is Bound.CLOSED else op.gt(item, i.lower)
    right = op.le(item, i.upper) if i.right is Bound.CLOSED else op.lt(item, i.upper)
    if left and right:
        return True
return False

where op.[ge,le,lt,gt,eq] can be replaced with functions like this, or replaced with the functions from import operator as op for default behavior.

def lt(a, b, rel_tol=__rel_tol__, abs_tol=__abs_tol__):
    a, b = float(a), float(b)  # Catches any type that cant be coerced, not strictly needed.
    return (a < b) and (not eq(a, b, rel_tol=rel_tol, abs_tol=abs_tol))

It should still pass your unit tests, albeit with some tests removed or modified to remove items that are based on strings.

Just food for thought, I like your library and thought I would kick back an idea for ya.

https://github.com/eehusky/portion

Cheers!

AlexandreDecan · 2020-09-07T18:54:19Z

Thanks for this! It's very interesting and I would really like to support this directly in portion without loss of generality. That's definitely something I'll try to address in a next major release.

BTW, was it easy to deal with portion codebase? Are there things that were not easy to deal with? I think at least I should try now to make any extension as easy as possible ;-)

eehusky · 2020-09-07T22:30:54Z

Yeah, once I found the right corner I needed to peel back it took me a couple hours of messing around. I was using sympy sets but wanted something a little lighter and this fit the bill perfectly once I found it...'set' is a very overloaded term when googling python stuff.

I haven't backported removing all the non applicable unit tests with strings, but I think the last commit I have should allow you to add custom operators pretty easily. By default two operator classes come built in (classic and floating point). Adding one for dates shouldnt be too difficult if you have a notion of how you want them to work.

https://github.com/eehusky/portion/blob/master/portion/fuzzy_operator.py

This was added to the top level of the interval.py module. After importing the library you can change the module level comparisons with the two functions. I set it up to default to classic <,<=,>=,==,!= so it should still work as you intended.

# From interval.py
from .fuzzy_operator import FuzzyOperator, ClassicOperator

op = ClassicOperator()

def set_operator(operator):
    global op
    op = operator

def set_tolerance(rel_tol, abs_tol):
    global op
    if not isinstance(op,FuzzyOperator):
        op = FuzzyOperator()

    op.rel_tol = rel_tol
    op.abs_tol = abs_tol

This has the obvious caveat of only module level granularity and not per instance, but crossing that bridge was going to be way more work than I wanted to put in :) I think you could do something like importing the module multiple times and having each import with a separate operator class. Maybe something like this?

import portion as P
P.set_tolerance(1e-11, 1e-13)
P.closed(1, 2) | P.closed(3, 4) | P.closed(2, 3)

import portion as Q
Q.closed(1, 2) | Q.closed(3, 4) | Q.closed(2, 3)

I wasn't thinking when i started this and now I wish I hadn't run autopep8 on the source code and messed with the formatting in interval.py

Some of the changes like switching to is for checking bound types was done with the intent of preventing me from goofing up.

<,<=,>=,>, ==,!= Only used to compare intervals to intervals. Which end up getting delegated to the operator classes.
is Checking Bound.Open or Bound.Closed
op.[eq,ne,le,lt,gt,ge] is only used for comparing items in intervals.

AlexandreDecan · 2020-09-08T06:31:53Z

Thanks for your answer! I'll have a look at the code of your fork and see what I can do to ease the implementation of custom comparisons in portion directly. It is very likely that I will ask you to review/test the changes before releasing this, if you agree :-)

AlexandreDecan · 2020-09-08T07:54:58Z

The fork is working correctly as long as the values do not define a new order (i.e. if lt(a, b) , then a < b has to hold). This prevents, for example, defining an order that is exactly the opposite of the existing one (e.g. lt(a,b) = a > b; le(a,b) = a >= b, gt(a,b) = a < b, and so on). The reason it won't work is because we rely on sort, min and maxat some places:
https://github.com/eehusky/portion/blob/master/portion/interval.py#L134

For min and max, it is easy to rewrite them using lt/gt. For sort, it requires to define a new key function that should be built from a cmp function (not something very hard to do, though).

I have to be honest and admit that integrating all this represents a lot of changes in the code, and above all a lot of risk of error in the event of future changes.

I still believe that using "manually" a wrap/unwrap mechanism on values provided to intervals still seems the best approach to me (the "safest" from the point of view of portion, anyway ^^) to manage these cases.

eehusky · 2020-09-13T17:27:07Z

To be fair, that same statement could be said now, I cant redefine the ordering of the values of the intervals. I guess I'm not following the logic behind why one would replace operators with ones that are defined backwards but okay.

But in the interest of being helpful, you could replace the calls to min/max/and sort with replacements in the new operators class (change sort to sorted). However....

The thing about min,max, and sort is that they aren't really impacted by the inexactness of floating point.
In the case of min,max, there can only be a single value returned from a collection of values that are not identical, so different tolerances have no impact. If two values are exactly equal only a single value will be returned.

Same goes for sort, the only thing that will happen if you sort floating points with a relative tolerance is that values that are ~= have the chance to be ordered differently. Most like in order of access. However, in this particular case it wont matter because those same values that are ~= are going to be close enough together to be merged into a single interval.

The point of these changes isnt to support reversing intervals, its to allow you to declare that two relatively close values can be considered the same (typically relatively close means a few ULPs). Or, in the case of the original issue, maybe that two dates are equal if they are within a few seconds?

Disclaimer, I'm coming at this from a mathematics point of view so defining < as > is madness.

AlexandreDecan · 2020-09-14T07:16:42Z

Hello,

I completely agree with you :-) I know that the proposed solution aims to solve the problem of "close comparisons", and it does it well ;)

With portion, I try to be as generic as possible, that's why I'm aiming at a solution a little more "universal" that would also allow to redefine the semantics of <, >, ==, <= and >= as well. I agree that it makes little sense, a priori, to "reverse the order" of values. It was a somewhat artificial example, but having the possibility to redefine the order allows, for example, to support objects that are not natively comparable, or to be able to sort values on the basis of another criterion (for example, book titles that would normally be sorted by lexicographical order, but that one wishes to sort by date of publication).

Another problem in the proposed solution is the use of a "module" variable, which shares its state globally. This means that it is not possible to define this variable twice (and thus to manipulate two different "interval types" at the same time). Even if you import the module twice, the variable will be shared (unless you do an absolute import and a relative import, or play with the Python import machine, which would be a bit risky). Having said that, it's a "lesser evil", and I think I'll go for a solution based on this idea, because it's simpler to implement without touching the code too much (the alternative would be to have Intervals' that you could parametrize at creation time, and create subtypes on the fly with type()' to specialize the intervals on certain types of values. I have a local branch with such an approach, but it poses other problems).

I'll do some testing as soon as I have some time, to support your use-case (as well as the "toy examples" above) directly in portion, probably by defining a module variable (and probably a function like cmp' that the user can redefine. The advantage of cmp' is that that's all it takes for ``sort'' to still work properly, and it also works in min' and max', even though the latter two were not fundamentally problematic to bypass ;-)

AlexandreDecan · 2020-09-14T11:52:45Z

Hello again,

I was working on this when I found a problematic example, and I need your lights to see how to better address it.
Let's assume we consider for the example that two floats are "close enough" if their absolute difference is, let's say, 0.1.

Let's create the following interval P.closed(0, 0.1). Should we assume that this interval is a singleton, or not?
If we assume it is not a singleton, then I believe its repr should be [0,0.1] which may seem strange given that the two bounds are considered equal (this is even more "obvious" if we consider [0, 0.0000000000001] for which I clearly expect [0] instead).

If we assume it is a singleton, then its repr should be either [0] or [1], but which one to choose? In the first case (the other is symmetrical) it means that [0,0.1] | [0.15,1] will resolve to [0] | [0.15,1] where one would expect it to resolve to [0,1] (since 0.1 and 0.15 are "close enough").

AlexandreDecan added enhancement New feature or request question Issue is a question labels May 31, 2020

AlexandreDecan mentioned this issue Jun 1, 2020

Add typing info #27

Open

AlexandreDecan closed this as completed Jun 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request - tolerance (in datetime or any other float) #30

Feature Request - tolerance (in datetime or any other float) #30

egemenimre commented May 31, 2020 •

edited

AlexandreDecan commented May 31, 2020

egemenimre commented May 31, 2020

AlexandreDecan commented Jun 1, 2020

AlexandreDecan commented Jun 8, 2020 •

edited

egemenimre commented Jun 8, 2020 via email

eehusky commented Sep 7, 2020

AlexandreDecan commented Sep 7, 2020

eehusky commented Sep 7, 2020

AlexandreDecan commented Sep 8, 2020

AlexandreDecan commented Sep 8, 2020

eehusky commented Sep 13, 2020

AlexandreDecan commented Sep 14, 2020

AlexandreDecan commented Sep 14, 2020

Feature Request - tolerance (in datetime or any other float) #30

Feature Request - tolerance (in datetime or any other float) #30

Comments

egemenimre commented May 31, 2020 • edited

AlexandreDecan commented May 31, 2020

egemenimre commented May 31, 2020

AlexandreDecan commented Jun 1, 2020

AlexandreDecan commented Jun 8, 2020 • edited

egemenimre commented Jun 8, 2020 via email

eehusky commented Sep 7, 2020

AlexandreDecan commented Sep 7, 2020

eehusky commented Sep 7, 2020

AlexandreDecan commented Sep 8, 2020

AlexandreDecan commented Sep 8, 2020

eehusky commented Sep 13, 2020

AlexandreDecan commented Sep 14, 2020

AlexandreDecan commented Sep 14, 2020

egemenimre commented May 31, 2020 •

edited

AlexandreDecan commented Jun 8, 2020 •

edited