-
Notifications
You must be signed in to change notification settings - Fork 8
Outliers and knn performance with full_rec_dim #20
Comments
That's interesting. I was worried that only benchmarking on random numbers would come back to bite me. I was going to suggest moving to https://github.com/KristofferC/NearestNeighbors.jl where I actually removed the whole I now see that I have an intermittent test failure in I am pretty sure that is with the |
Ok, the test failures are actually only when using the |
Ah right, I wasn't aware of NearestNeighbors.jl, I'm happy to use it instead. I assume KDTrees.jl will be deprecated at some stage soon? I found a good way to debug what's happening is to measure the branching factor of the search algorithm. Here's output from the same test as above, with a lightly instrumented version of KDTrees
Here I'm measuring the branching factor as |
Yes my plan is to deprecate In which basically says that if you set the environment variable I currently do some basic counting on how many nodes are visited but it could be expanded to give much richer statistics. |
Finally found that bug... KristofferC/NearestNeighbors.jl@76a5c4b I tried your code on using NearestNeighbors
# Find nearest neighbour in ref_tree for each point in query_points
function matchall(query_points, ref_tree)
match_inds = NearestNeighbors.knn(ref_tree, query_points, 1)
end
N = 100000
# Reference point cloud: random points in the plane
ref_points = zeros(3,N)
ref_points[1:2,:] = rand(2,N)
ref_tree = NearestNeighbors.KDTree(ref_points)
# Random point cloud aligned spatially with ref_points
aligned_points = zeros(3,N)
aligned_points[1:2,:] = rand(2,N)
# Point cloud misaligned by adding some translation vector
misaligned_points = aligned_points .+ [0.1, 0.1, 0.1]
matchall(aligned_points, ref_tree) # warm up jit
println("Time for query with aligned clouds")
@time matchall(aligned_points, ref_tree)
println("Time for query with misaligned clouds")
@time matchall(misaligned_points, ref_tree); and get: Time for query with aligned clouds
0.073362 seconds (300.01 k allocations: 19.837 MB, 5.76% gc time)
Time for query with misaligned clouds
0.069745 seconds (300.01 k allocations: 19.837 MB, 7.35% gc time) compared to full_rec_dim = 0
Time for query with aligned clouds
0.188462 seconds (698.98 k allocations: 32.791 MB, 19.04% gc time)
Time for query with misaligned clouds
0.137693 seconds (698.98 k allocations: 32.791 MB, 1.89% gc time)
full_rec_dim = 6
Time for query with aligned clouds
0.073093 seconds (698.98 k allocations: 32.791 MB, 3.36% gc time)
Time for query with misaligned clouds
3.126432 seconds (698.98 k allocations: 32.791 MB, 0.04% gc time) |
Nice, stats like that are exactly what I needed for chasing the KDTrees performance weirdness I was seeing. I'll move over to NearestNeighbors.jl very soon, thanks. Maybe the thing to come out of this issue is that performance tuning is subtle, and it would be worth having some test cases simulating various nasty things which occur in real data sets. So far I've seen the following:
|
Those performance numbers with What's the relationship with the other package of the same name https://github.com/johnmyleswhite/NearestNeighbors.jl ? Seems the implementations are very different under the hood. |
First public release got tagged but you might want to be on master anyway. I now changed the way the debug flag is active, from an environment variable to a global variable in the main package file. The environment variable way didn't work well with package precompilation because it was conditionally evaling stuff. I put up some small documentation about it too. It would be really good to have some proper benchmarks for a variety of (standardized?) data sets and compare it to a well used, respected NN package. I think the difference between my package and the other one is that mine is a bit more feature complete and has been given a bit more care when designing the data structures to get good performance and reduce memory usage. Other julia> @time KDTree(data)
3.142755 seconds (27.42 M allocations: 601.563 MB, 25.09% gc time)
julia> @time for i = 1:10^4 nearest(tree, rand(5), 5) end
1.862179 seconds (238.59 k allocations: 8.371 MB) Mine: julia> @time KDTree(data);
0.601531 seconds (100.02 k allocations: 59.510 MB, 9.61% gc time)
julia> @time knn(tree, rand(5, 10^4), 5);
0.138314 seconds (30.02 k allocations: 2.976 MB) |
Yes, I saw quite a bit of care went into the layout of the data structures in KDTrees, it's hard to imagine getting much more memory efficient there. The numbers speak for themselves! A C++ implementation that I like is nanoflann.hpp - might be worth looking at as a comparison point. I don't really know if it's among the most efficient knn libraries around, but I've used it and the implementation looks really solid to me. It does most of the obvious efficiency hacks I'd consider in C++ (type specialization based on metric and optionally the vector length; pooled tree node allocator). |
In KDTree i realized I actually wasted some memory by saving the full hyper rectangles for each node instead of just the high and low value for each splitting dimension. I changed that in NN.jl so I just store the full hyperrectangle for the original point cloud and then for each tree node I just have a KDNode type: https://github.com/KristofferC/NearestNeighbors.jl/blob/master/src/kd_tree.jl#L1 which saves the high, low, and split value together with the split dimension. I hoped it would help a bit with performance since it avoids pulling a bunch of unnecessary values into the cache line. In practice it didn't seem to do that much difference though. My main problem right now with regards to performance is that I don't actually know what the bottle neck is and what I should focus optimizing. Is it evaluating the distance function? Is it the cpu stalling while waiting for memory? Is there significant overhead in unpacking the values from the immutables in each recursion call or does the compiler realize that those values are constant etc etc? Some time ago I tried some profiling by building Julia with Intel's compiler and then using VTune. It's quite powerful because you can see a bunch of profiling data taken straight from the cpu. I should probably try that again. Thanks for the link to nanoflann, I hadn't seen it before. Running something like cloud = rand(3, 10^6)
tree = KDTree(cloud)
(@elapsed knn(tree, rand(3, 10^6), 1)) / 10^6
# 1.939144678e-6 seems I do about 1 search per 2 microseconds for a 10^6 3D point cloud which seems to be in the same ballpark as their figure at https://github.com/jlblancoc/nanoflann#32-benchmark-original-flann-vs-nanoflann When Base threading support get mature enough it would be cool if I could add some threading to the tree building and the searching. The tree builder shouldn't be too hard to thread I think (just do a few iterations on one thread and then throw each sub tree to a different thread). If you have any other ideas or suggestions please let me know :) |
Maybe you could get something useful from linux perf events if you don't want to mess with icc and VTune? Last time I looked briefly, perf seemed like it might be very impressive, but also with quite a learning curve. Maybe the tools have improved since then though. Maybe I should move some of this to a "performance benchmarking" issue in NearestNeighbors.jl and close this one? By the way, I think you forgot to push your tag to NearestNeighbors.jl? I see it registered in METADATA.jl (thanks!) but the tag isn't public. I've moved my ICP code to NearestNeighbors and it's definitely better behaved for outliers now. |
Thanks for telling me about the missing tag. I hopefully fixed it now. I had some troubles getting VTune work well again but I didn't spend that much time on it. If I can't get it to work I will look at perf but I remember VTune having a quite nice and intuitive interface. Like you say, let's move further discussion to NearestNeighbors.jl |
Closing in favor of KristofferC/NearestNeighbors.jl#7 . Thanks for all the information! |
I'm using KDTrees for implementing an iterative closest point algorithm for point cloud matching.
knn
with k=1 works very well for this, but I've found that there can be a large performance hit when the two point clouds don't sample the same part of the geometry.Part of the problem seems to be that the low dimensional approximation turned on with
full_rec_dim
isn't a good tradeoff in this case. Here's a test case:Example output:
The text was updated successfully, but these errors were encountered: