-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qbegin(idx::nearest(pt, 100)) fast but qbegin(idx::nearest(pt, tree_size)) really slow #867
Comments
trying to reproduce with a simple test:
|
the test files: |
Thanks for the report! I'll look into this next week. |
@awulkiew can you reproduce the problem? |
@karme Sorry that it takes so long. Yes, I was able to reproduce it with msvc. The time depends on
There is also another issue, i.e. |
@karme I prepared a fix for this issue. The times decreased to:
so they're too short to be reliably compared for this dataset. Besides the two issues I mentioned above I found the third one. Details are in the PR. If you want to test the fix you can pull this branch: https://github.com/awulkiew/geometry/tree/fix/rtree_nearest |
looks very good so far - thanks for looking into this! |
in a bigger test (388521289 linestrings) the performance isn't good qbegin(idx::nearest(pt, 100)) 398usec |
You're testing with the fix right? I tried to reproduce it with some a synthetic test (uniform distribution with overlap):
but was unable to:
The internals of the iterator looks good too. Regardless of
Is there something specific about the data you're testing against? Btw, here is the branch where
to see whether or not there is any difference. |
i will need some time to answer the other questions / prepare a "small" test but a quick test of the last one:
shows: |
Right, something like this may happen if there is big number of nodes and values covering the point so the distance to all of them is 0. I think I have an idea what I could do about this. |
it is openstreetmap street data
I use:
to get the number of intersections. => typically there are only 1-2 bounding boxes intersecting the point |
a really bad example is:
gosh> (mercator⁻¹ '(858855 6502943 0)) |
Ah so these are hiking paths? So they may be close to each other to the point of bounding boxes being the same. And even if the distance to them is greater than 0 (overlapping point) it can be the same for all of them. Ok, I pushed a fix to the same branch. Could you check it out? |
yes, also hiking paths (but also streets / all OSM ways with highway tag)
=> definitely an improvement but still to slow |
Ah ok, by hiking paths I meant GPS tracks uploaded by the users. But you're talking about regular OSM ways/highways.
This is interesting. After the fix even the case with k=100 shouldn't gather all of the neighbors but should return right away if the closest neighbor was as far as the next closest node. I tested a case like this, with high number of overlapping boxes (100M boxes in rtree with around 10k overlapping each other):
The result without today's fix:
The result with today's fix:
So I think I'll wait for your data to reproduce it. Alternatively you could display some additional debug info like 10 first closest nodes and values in the same branch as before (and maybe in the one below, the one with
|
I can't reproduce the problem with the simple test - not even using a |
Interesting. It's possible that data access is simply that much slower. But I'd like to play with it. Maybe I'll be able to reproduce it somehow. Are you using Boost.Interprocess allocator and Btw, now that the fix is merged I'm experimenting in a different branch: |
I cannibalized the real program and now have a relatively small test case:
it is not as slow as the huge dataset but still ~2ms |
the test is at https://karme.de/delme/rtreedebug.zip |
Ok, got it. I have something to play with. And I see that you're indeed using Interprocess. |
just tested against this branch (didn't really verify the results) - looks really promising:
|
does it help if I test this branch (https://github.com/awulkiew/geometry/tree/feature/rtree_nearest2) or is it work in progress? |
@karme Yes, please. I had some problems with running your test case on a machine with windows so I'll wait for you with running it elsewhere. |
@karme I pushed another change. The internal nodes are now also prioritized based on level so the least possible number of internal nodes should be visited even when the distance to various nodes was equal (e.g. to 0) on the path to the first leaf and therefore the first value. |
@awulkiew tests look very good! |
@karme Ok then. I'll create a PR and close this issue after it's merged. Thanks for testing! |
@karme I had to resolve some conflicts. Could you please check if everything is still fine with performance? |
performance and results look good |
@karme thanks! |
If I do a
t->qbegin(idx::nearest(pt, 100))
it is really fast (<1ms) butif I do a
t->qbegin(idx::nearest(pt, tree_size))
with tree_size=37133 it is reallyslow (~500ms)
s.a. boost users mailing list thread:
"geometry: knn search of point to many linestrings?"
https://lists.boost.org/boost-users/2021/05/90908.php
especially:
https://lists.boost.org/boost-users/2021/06/90929.php
The text was updated successfully, but these errors were encountered: