Question: Which usage is preferable? #53

sphh · 2021-01-21T21:58:20Z

What would be the preferred way to set an optional interval check?

I want to make a callable class, which checks, if the argument is in a given interval. The interval is an optional parameter, such as in these two classes F and G:

import portion as P

class F:
    def __init__(self, interval=None):
        self.interval = interval

    def __call__(self, x):
        if self.interval is None:
            return x*x
        elif x in self.interval:
            return x*x
        else:
            raise ValueError(f'Argument x not in interval. Got {x}')

class G:
    def __init__(self, interval=P.open(-P.inf, P.inf)):
        self.interval = interval

    def __call__(self, x):
        if x in self.interval:
            return x*x
        else:
            raise ValueError(f'Argument x not in interval. Got {x}')

Which implementation (class F or G) is preferable?

Thanks for your help and insights!

The text was updated successfully, but these errors were encountered:

AlexandreDecan · 2021-01-22T07:48:56Z

Hello,

I will be in favour of the 'G' implementation, so as to avoid an extra 'if' at each call.

Notice that I will not put P.open(...) directly as a default value in the signature of __init__ for G, I will do the same as for F, i.e. interval=None and in the body of __init__ I will do if interval is None: interval = P.open(...). There is no issue to put P.open(...) as a default value in the signature, but that's usually considered as a bad practice when the object is mutable (which is not the case of intervals, so no worry), e.g., a list or a dict.

sphh · 2021-01-22T08:27:22Z

Thanks. The idea behind F was, that if I mostly have unlimited intervals, F should be faster, because it just needs one comparison compared to G. But G should be favourable, if intervals are used often (and G is more readable).

AlexandreDecan · 2021-01-22T09:11:02Z

I quickly timeit-ed the two codes, with one situation in which there is no provided interval, and one where there is one:

In [8]: F1, G1, F2, G2 = F(), G(), F(P.closed(0, 2)), G(P.closed(0, 2))

In [9]: %timeit F1(1)
206 ns ± 4.23 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [10]: %timeit G1(1)
1.38 µs ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [11]: %timeit F2(1)
1.02 µs ± 3.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [12]: %timeit G2(1)
976 ns ± 2.31 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Checking for None is much faster than a comparison with an infinite interval (difference between F1 and G1), but checking for None also adds a slight overhead when an interval is actually defined (F2 and G2) but the difference is negligible.

So based on this, and if performance is an issue, I would go for F: it's nearly 5 times faster when no interval is provided and it only adds a few overhead when it is defined.

AlexandreDecan · 2021-01-22T09:13:07Z

Btw, given the values (in µs and even in ns), and given the amount of background processes I've on my computer when I executed this code, this "benchmark" should be considered with care :-)

sphh · 2021-01-22T10:15:42Z

Good idea, to time it! I extended your tests to include checks outside the interval. I also replace the return x*x and raise ValueError(…) statements with simply pass. I get the following values:

>>> import timeit
>>> F1, G1, F2, G2 = F(), G(), F(P.closed(0, 2)), G(P.closed(0, 2))
>>> sorted(timeit.repeat(lambda: F1(1), repeat=5, number=1000000))
[0.14253522500803228, 0.1521446229889989, 0.16290795200620778, 0.1756877709995024, 0.18972873299208004]


>>> sorted(timeit.repeat(lambda: G1(1), repeat=5, number=1000000))
[0.9611717640073039, 1.0052876260015182, 1.0283825780061306, 1.0961824119876837, 1.1008262019895483]


>>> sorted(timeit.repeat(lambda: F2(-1), repeat=5, number=1000000))
[0.6657288259884808, 0.6957280539936619, 0.7324803389929002, 0.7841811819962459, 0.8237168110063067]


>>> sorted(timeit.repeat(lambda: G2(-1), repeat=5, number=1000000))
[0.6685592129942961, 0.6988589969987515, 0.7048056549974717, 0.7982992489996832, 0.8394842549896566]


>>> sorted(timeit.repeat(lambda: F2(1), repeat=5, number=1000000))
[0.7109784969943576, 0.749733508011559, 0.75496754499909, 0.7824073730007512, 0.8159977729956154]


>>> sorted(timeit.repeat(lambda: G2(1), repeat=5, number=1000000))
[0.7004794439999387, 0.7452119939989643, 0.8100131639948813, 0.8110726650047582, 0.836370117001934]


>>> sorted(timeit.repeat(lambda: F2(3), repeat=5, number=1000000))
[0.6367974419990787, 0.6983797549910378, 0.7055405920109479, 0.7218278600048507, 0.7507733389938949]


>>> sorted(timeit.repeat(lambda: G2(3), repeat=5, number=1000000))
[0.6672344270045869, 0.7339863839879399, 0.7599738859862555, 0.7782973190041957, 0.786124837002717]

Using the minimum execution times, I found that when actually checking the interval (F2 and G2), the computational time is inconsistently ±4%, but with no interval defined, checking for None is 5.8 to 6.5 times faster than checking for the infinite interval (as you already has found out).

So, to boil it down:

If performance is paramount, go for F (but it makes the code less readable).
Otherwise use G.

Thanks for your help!!

AlexandreDecan · 2021-01-22T11:14:50Z

You're welcome :-)

Another option would be to define two implementations of __call__: one for which an interval is set, and one for which no interval is set. Then, depending on the value provided to __init__, you can go for one implementation or the other. While this allows to benefit from the best of two worlds, it's likely to make the class less readable (and, depending on how it's implemented, to add an extra lookup at each call to find the chosen implementation on-the-fly).

Speaking of an extra-lookup for a call, I'm also surprised by these benchmarks: loooking for x in (-inf, +inf) is more costly that looking for it in a "classical" interval. I confirmed this:

In [5]: x, y, z = P.open(-P.inf, P.inf), P.open(-float('inf'), float('inf')), P.open(-10000, 10000)

In [6]: %timeit 0 in x
823 ns ± 2.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [7]: %timeit 0 in y
586 ns ± 2.14 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [8]: %timeit 0 in z
567 ns ± 1.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

I guess that's because P.inf are "special objects" and integers do not know how to compare with them, so the comparison is delegated to P.inf class, hence an extra call :-/ I'll dig into this, since every ns is good to take given that infinities are used everywhere in the code :-)

sphh · 2021-01-22T11:43:13Z

Switching between different __call__ functions is a good idea. Unfortunately I cannot use it in my code, because I am checking all arguments, which number can change from call to call: __call__(self, *args).

AlexandreDecan · 2021-01-22T11:52:08Z

Ok :-/

I looked at what happened with P.inf (why they ran slower than any other value). Most of the time is actually spent in the underlying isinstance function (basically, the only way for a value to be, e.g., greater or equal to P.Inf is to be P.inf as well, hence the check with isinstance). Given that infinities are singleton, I replaced these calls with an is _PInf() (_PInf is the internal class used to represent positive infinities). It works, but it's slightly slower because it needs several lookup to get the singleton instance. I'll try to see how I can improve this. If I can manage to make it as fast as a "traditional" comparison, then the speed-up gained by using is None would be more limited (probably 3-4 times instead of 5-6 times).

AlexandreDecan · 2021-01-22T12:04:59Z

The gain is very limited, only around 10% and the code is, obviously, much less readable.

AlexandreDecan added the question Issue is a question label Jan 22, 2021

AlexandreDecan closed this as completed Jan 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Which usage is preferable? #53

Question: Which usage is preferable? #53

sphh commented Jan 21, 2021

AlexandreDecan commented Jan 22, 2021

sphh commented Jan 22, 2021

AlexandreDecan commented Jan 22, 2021

AlexandreDecan commented Jan 22, 2021

sphh commented Jan 22, 2021

AlexandreDecan commented Jan 22, 2021

sphh commented Jan 22, 2021

AlexandreDecan commented Jan 22, 2021

AlexandreDecan commented Jan 22, 2021

Question: Which usage is preferable? #53

Question: Which usage is preferable? #53

Comments

sphh commented Jan 21, 2021

AlexandreDecan commented Jan 22, 2021

sphh commented Jan 22, 2021

AlexandreDecan commented Jan 22, 2021

AlexandreDecan commented Jan 22, 2021

sphh commented Jan 22, 2021

AlexandreDecan commented Jan 22, 2021

sphh commented Jan 22, 2021

AlexandreDecan commented Jan 22, 2021

AlexandreDecan commented Jan 22, 2021