-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question: performant iteration of multi-thousand-points intervals #50
Comments
Hello, I'll have a look at your code and see how we can do this in a not-so-time-consuming way ;-) In the meantime, can you clarify what's the value of |
I already see "small improvements" (things that could speed-up the computation but shouldn't make a huge difference):
(Yeah, I know, I said I wouldn't have time until tomorrow, but it was on my mind ;-) |
I'll need to think on and try your suggestions tomorrow, thanks! I thought I could use |
Thanks for the answer. Could you elaborate a little bit on the need to maintain two separate (modified) outputs? What if, for example, we define the function provided to |
(I was away yesterday, now back to this mini-project.)
Using tuples as values is an interesting idea! (Looking at the code of Ok, took roughly 6 minutes. |
I expected I think it's probably better to copy&paste the code of Anyway, I share your feeling that an alternative implementation is probably the way to go :( I'm afraid IntervalDict is not suited for such large dataset :-/ |
I've reimplemented the whole thing with numpy arrays (turned out to be much easier than I thought) and ended up with ~30 seconds running time for everything (reading, filtering, merging same-value intervals, writing out in the original format). I guess I didn't really need the features offered by Thank you for all the support! |
You're welcome. I'm sorry I couldn't help you much with |
As a continuation of #49 :),
I'm now trying to read back from two
IntervalDict
s, compare the values assigned to each interval, modify the values, and write this all back to files.[As a follow-up to #49, I have anywhere between 1-18700 intervals per value (with ~2800 values), with average size being ~216 intervals per value. (Did not collect the median, so no idea about the actual distribution 🙂 ) ]
I am now trying to iterate over all points (sequentially or not doesn't really matter as long as I check all the points) with this code (sorry for verbosity, I like seeing it do stuff):
The speed looks very variable, mostly in the 6-12 positions/sec range, but I've also seen peak values of 98 pos/sec.
So far 13k position processed in 15 minutes; if the speed sample is representative, this will take 26 hours.
@AlexandreDecan , you mentioned a smarter way of iterating, can you please share it? 🙂
The text was updated successfully, but these errors were encountered: