Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowness at inverting datetime intervals #20

Closed
antoine-gallix opened this issue Mar 4, 2020 · 9 comments
Closed

Slowness at inverting datetime intervals #20

antoine-gallix opened this issue Mar 4, 2020 · 9 comments
Labels
enhancement New feature or request

Comments

@antoine-gallix
Copy link

Hello!

We are using python intervals for our current project, and it's fits very well to our needs. I noticed yesterday that the interval inversion operation is very slow at medium scales. I have compound intervals of datetimes of the scale of hundreds of atomic intervals, and on a reasonably recent computer, the inversion of such intervals takes seconds to complete, which I find strange because I wouldn't say that kind of computation would be that heavy. Here below is some data on the processing time I'm observing. Maybe that reveals some possible optimizations for this operator.

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ Nb Intervals ┃ Processing Time     ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ 92           │ 0.13                │
│ 92           │ 0.12                │
│ 92           │ 0.12                │
│ 801          │ 7.42                │
│ 1396         │ 22.70               │
│ 1185         │ 16.29               │
│ 639          │ 4.76                │
│ 1029         │ 12.98               │
│ 1662         │ 35.29               │
│ 2266         │ 67.45               │
│ 891          │ 10.37               │
│ 1052         │ 14.55               │
│ 640          │ 5.34                │
│ 1316         │ 22.73               │
│ 762          │ 7.56                │
│ 870          │ 9.77                │
│ 1187         │ 18.50               │
│ 1002         │ 13.06               │
│ 921          │ 10.63               │
│ 650          │ 5.22                │
│ 861          │ 9.07                │
│ 1585         │ 30.88               │
│ 926          │ 10.45               │
│ 1599         │ 31.22               │
│ 1226         │ 18.58               │
│ 663          │ 5.87                │
│ 218          │ 0.58                │
│ 430          │ 2.27                │
│ 1527         │ 31.98               │
│ 92           │ 0.10                │
│ 92           │ 0.10                │
│ 92           │ 0.10                │
│ 1581         │ 30.89               │
│ 1988         │ 47.65               │
└──────────────┴─────────────────────┘
@AlexandreDecan
Copy link
Owner

Hello,

Thank your for raising this issue!

The library has not been specifically designed to handle large amounts of intervals, but there is nothing that should prevent it from doing so ;-) So I'll try to investigate where is/are the bottleneck(s) :)

Could you please indicate what's the version of python-intervals you're using? Also, would it be possible to share the code used to create the "performance report" you posted, so I can run it locally and use it to detect the bottlenecks and to see what are the changes that have a positive impact on the execution time?

Thank you!

@AlexandreDecan
Copy link
Owner

Or, if you cannot share this code (regardless of why ;-)), give some examples of intervals you try to invert. It seems, according to your message, that each interval is composed of hundreds of atomic ones. Is it right?

In that case, it is very likely that the forecoming 2.0.0 release will already exhibit some speed-up when computing the complement of such intervals. In 1.x.x, complement of an union (i.e. an Interval composed of many AtomicInterval instances) is computed by returning the intersection of the complements of each atomic interval which is very inefficient as the number of underlying atomic intervals increase. In the forecoming release, I compute the complement by iterating on pairs of consecutive atomic intervals, computing the "gap" between these pairs, and unioning them. This highly reduce the number of intermediate atomic intervals to compute (n+1 for n atomics, where it was 2n before), the number of method calls (no need to call an operation on each atomic interval), and the costly intersection at the end of the process (which was probably the most costly part of the operation).

@AlexandreDecan AlexandreDecan added the enhancement New feature or request label Mar 4, 2020
@AlexandreDecan AlexandreDecan modified the milestone: 2.0.0 Mar 4, 2020
@AlexandreDecan
Copy link
Owner

I've just tested the following snippet with 1.10.0 and the forecoming 2.0.0:

import intervals as I 
i = I.Interval(*(I.closed(i, i+1) for i in range(0, 10000, 2)))
assert ~~i == i

The last line was timeit-ed both in 1.10.0 and 2.0.0. On my laptop, I got 93.2 ms ± 984 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) for 2.0.0. After 10 minutes, this line was still running with 1.10.0 :-)

So it seems the change I made for __invert__ in fc32c52 is an efficient one ;-)

@antoine-gallix
Copy link
Author

The version that we have is 1.10.0, but it seems that you already solved the problem with the latest commit, so we will use that version instead. As a side note, I'm impressed by the fast response and investigation. That's true public service spirit! Thank you for your work on that lib.

@AlexandreDecan
Copy link
Owner

You're welcome ;-)

Note that 2.0.0 is not yet released on PyPI. I plan to release it in the next few days, but I still have some things to do before (e.g. I would like to add stub files for type annotations, I would like to write a migration guide from 1.x.x to 2.0.0 and now I would like to check for performance/bottleneck issues ^^).

TL;DR: do not consider the master branch of 2.0.0 to be stable at the moment ;-)

Also, I'm currently considering changing the name of python-intervals (1) to avoid having "python" in it (not useful), and (2) mainly to avoid any confusion with the intervals library that is also distributed on PyPI.

@AlexandreDecan
Copy link
Owner

I confirm you shouldn't rely on 2.0.0 for now, I've just found a bug in __contains__. I'll try to write a test tomorrow and to fix this (and also to speed up intersection and containment if I've time ^^). For instance, I think (I.closed(0,1) | I.closed(2, 3)).contains(I.closed(0,0.5) | I.closed(2.5, 3)) returns False where it should return True. That's a regression w.r.t. 1.10.0.

@antoine-gallix
Copy link
Author

Ok thanks for the info. We'll work around the interval inversion for now until v2

@AlexandreDecan
Copy link
Owner

I've just released 2.0.0. Notice that python-intervals is now distributed as portion.
The list of changes is available on https://github.com/AlexandreDecan/portion/blob/master/CHANGELOG.md, and there are some backwards incompatible ones (the ones that are likely to break are i.is_empty() and i.is_atomic()). I would be happy to help if you have issues with this new release.

Performance-wise, I created a small informal report here : #21

@antoine-gallix
Copy link
Author

Ok, we will do the update and adapt the code. It's still fresh so that should be no problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants