Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Which usage is preferable? #53

Closed
sphh opened this issue Jan 21, 2021 · 9 comments
Closed

Question: Which usage is preferable? #53

sphh opened this issue Jan 21, 2021 · 9 comments
Labels
question Issue is a question

Comments

@sphh
Copy link

sphh commented Jan 21, 2021

What would be the preferred way to set an optional interval check?

I want to make a callable class, which checks, if the argument is in a given interval. The interval is an optional parameter, such as in these two classes F and G:

import portion as P

class F:
    def __init__(self, interval=None):
        self.interval = interval

    def __call__(self, x):
        if self.interval is None:
            return x*x
        elif x in self.interval:
            return x*x
        else:
            raise ValueError(f'Argument x not in interval. Got {x}')

class G:
    def __init__(self, interval=P.open(-P.inf, P.inf)):
        self.interval = interval

    def __call__(self, x):
        if x in self.interval:
            return x*x
        else:
            raise ValueError(f'Argument x not in interval. Got {x}')

Which implementation (class F or G) is preferable?

Thanks for your help and insights!

@AlexandreDecan AlexandreDecan added the question Issue is a question label Jan 22, 2021
@AlexandreDecan
Copy link
Owner

Hello,

I will be in favour of the 'G' implementation, so as to avoid an extra 'if' at each call.

Notice that I will not put P.open(...) directly as a default value in the signature of __init__ for G, I will do the same as for F, i.e. interval=None and in the body of __init__ I will do if interval is None: interval = P.open(...). There is no issue to put P.open(...) as a default value in the signature, but that's usually considered as a bad practice when the object is mutable (which is not the case of intervals, so no worry), e.g., a list or a dict.

@sphh
Copy link
Author

sphh commented Jan 22, 2021

Thanks. The idea behind F was, that if I mostly have unlimited intervals, F should be faster, because it just needs one comparison compared to G. But G should be favourable, if intervals are used often (and G is more readable).

@AlexandreDecan
Copy link
Owner

I quickly timeit-ed the two codes, with one situation in which there is no provided interval, and one where there is one:

In [8]: F1, G1, F2, G2 = F(), G(), F(P.closed(0, 2)), G(P.closed(0, 2))

In [9]: %timeit F1(1)
206 ns ± 4.23 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [10]: %timeit G1(1)
1.38 µs ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [11]: %timeit F2(1)
1.02 µs ± 3.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [12]: %timeit G2(1)
976 ns ± 2.31 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Checking for None is much faster than a comparison with an infinite interval (difference between F1 and G1), but checking for None also adds a slight overhead when an interval is actually defined (F2 and G2) but the difference is negligible.

So based on this, and if performance is an issue, I would go for F: it's nearly 5 times faster when no interval is provided and it only adds a few overhead when it is defined.

@AlexandreDecan
Copy link
Owner

Btw, given the values (in µs and even in ns), and given the amount of background processes I've on my computer when I executed this code, this "benchmark" should be considered with care :-)

@sphh
Copy link
Author

sphh commented Jan 22, 2021

Good idea, to time it! I extended your tests to include checks outside the interval. I also replace the return x*x and raise ValueError(…) statements with simply pass. I get the following values:

>>> import timeit
>>> F1, G1, F2, G2 = F(), G(), F(P.closed(0, 2)), G(P.closed(0, 2))
>>> sorted(timeit.repeat(lambda: F1(1), repeat=5, number=1000000))
[0.14253522500803228, 0.1521446229889989, 0.16290795200620778, 0.1756877709995024, 0.18972873299208004]


>>> sorted(timeit.repeat(lambda: G1(1), repeat=5, number=1000000))
[0.9611717640073039, 1.0052876260015182, 1.0283825780061306, 1.0961824119876837, 1.1008262019895483]


>>> sorted(timeit.repeat(lambda: F2(-1), repeat=5, number=1000000))
[0.6657288259884808, 0.6957280539936619, 0.7324803389929002, 0.7841811819962459, 0.8237168110063067]


>>> sorted(timeit.repeat(lambda: G2(-1), repeat=5, number=1000000))
[0.6685592129942961, 0.6988589969987515, 0.7048056549974717, 0.7982992489996832, 0.8394842549896566]


>>> sorted(timeit.repeat(lambda: F2(1), repeat=5, number=1000000))
[0.7109784969943576, 0.749733508011559, 0.75496754499909, 0.7824073730007512, 0.8159977729956154]


>>> sorted(timeit.repeat(lambda: G2(1), repeat=5, number=1000000))
[0.7004794439999387, 0.7452119939989643, 0.8100131639948813, 0.8110726650047582, 0.836370117001934]


>>> sorted(timeit.repeat(lambda: F2(3), repeat=5, number=1000000))
[0.6367974419990787, 0.6983797549910378, 0.7055405920109479, 0.7218278600048507, 0.7507733389938949]


>>> sorted(timeit.repeat(lambda: G2(3), repeat=5, number=1000000))
[0.6672344270045869, 0.7339863839879399, 0.7599738859862555, 0.7782973190041957, 0.786124837002717]

Using the minimum execution times, I found that when actually checking the interval (F2 and G2), the computational time is inconsistently ±4%, but with no interval defined, checking for None is 5.8 to 6.5 times faster than checking for the infinite interval (as you already has found out).

So, to boil it down:

  • If performance is paramount, go for F (but it makes the code less readable).
  • Otherwise use G.

Thanks for your help!!

@AlexandreDecan
Copy link
Owner

You're welcome :-)

Another option would be to define two implementations of __call__: one for which an interval is set, and one for which no interval is set. Then, depending on the value provided to __init__, you can go for one implementation or the other. While this allows to benefit from the best of two worlds, it's likely to make the class less readable (and, depending on how it's implemented, to add an extra lookup at each call to find the chosen implementation on-the-fly).

Speaking of an extra-lookup for a call, I'm also surprised by these benchmarks: loooking for x in (-inf, +inf) is more costly that looking for it in a "classical" interval. I confirmed this:

In [5]: x, y, z = P.open(-P.inf, P.inf), P.open(-float('inf'), float('inf')), P.open(-10000, 10000)

In [6]: %timeit 0 in x
823 ns ± 2.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [7]: %timeit 0 in y
586 ns ± 2.14 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [8]: %timeit 0 in z
567 ns ± 1.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

I guess that's because P.inf are "special objects" and integers do not know how to compare with them, so the comparison is delegated to P.inf class, hence an extra call :-/ I'll dig into this, since every ns is good to take given that infinities are used everywhere in the code :-)

@sphh
Copy link
Author

sphh commented Jan 22, 2021

Switching between different __call__ functions is a good idea. Unfortunately I cannot use it in my code, because I am checking all arguments, which number can change from call to call: __call__(self, *args).

@AlexandreDecan
Copy link
Owner

Ok :-/

I looked at what happened with P.inf (why they ran slower than any other value). Most of the time is actually spent in the underlying isinstance function (basically, the only way for a value to be, e.g., greater or equal to P.Inf is to be P.inf as well, hence the check with isinstance). Given that infinities are singleton, I replaced these calls with an is _PInf() (_PInf is the internal class used to represent positive infinities). It works, but it's slightly slower because it needs several lookup to get the singleton instance. I'll try to see how I can improve this. If I can manage to make it as fast as a "traditional" comparison, then the speed-up gained by using is None would be more limited (probably 3-4 times instead of 5-6 times).

@AlexandreDecan
Copy link
Owner

The gain is very limited, only around 10% and the code is, obviously, much less readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Issue is a question
Projects
None yet
Development

No branches or pull requests

2 participants