Fix breakage from the new `ReinterpretArray` #60

pablosanjose · 2017-12-06T14:38:34Z

Fix for issue #59 (see details there)

codecov-io · 2017-12-06T14:49:40Z

Codecov Report

Merging #60 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master      #60   +/-   ##
=======================================
  Coverage   93.89%   93.89%           
=======================================
  Files          14       14           
  Lines         508      508           
=======================================
  Hits          477      477           
  Misses         31       31

Impacted Files	Coverage Δ
src/kd_tree.jl	`100% <ø> (ø)`	⬆️
src/tree_data.jl	`100% <ø> (ø)`	⬆️
src/inrange.jl	`95.83% <ø> (ø)`	⬆️
src/hyperspheres.jl	`100% <ø> (ø)`	⬆️
src/ball_tree.jl	`100% <ø> (ø)`	⬆️
src/utilities.jl	`100% <ø> (ø)`	⬆️
src/hyperrectangles.jl	`69.23% <ø> (ø)`	⬆️
src/knn.jl	`96.42% <ø> (ø)`	⬆️
src/tree_ops.jl	`89.83% <ø> (ø)`	⬆️
src/datafreetree.jl	`82.14% <ø> (ø)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 012a3da...12ec911. Read the comment docs.

KristofferC

Thanks for the PR. I have thought about the Reinterpret stuff for a while but haven't fixed it yet. I will need to do some benchmarks here to see if it is perhaps worth just doing a copy to a new array instead of reinterpreting.

KristofferC · 2017-12-06T15:46:21Z

src/ball_tree.jl

@@ -4,7 +4,7 @@
 # The tree uses the triangle inequality to prune the search space
 # when finding the neighbors to a point,
 struct BallTree{V <: AbstractVector,N,T,M <: Metric} <: NNTree{V,M}
-    data::Vector{V}
+    data::AbstractVector{V}


This will be bad for performance because this is an abstract field. See (https://docs.julialang.org/en/stable/manual/performance-tips/#Avoid-containers-with-abstract-type-parameters-1),

Perhaps we need to just do like

BallTree{V <: AbstractVector, ...} data::V . . end

Indeed, this was a mistake on my part. I intended to leave data::Vector{V} there, like in BruteTree. In the latter it was necessary to do a convert in the BruteTree constructor, but not with BallTree I think.

pablosanjose · 2017-12-06T16:30:31Z

Ok! But I would have thought that copying to a new array would always be slower, no? Or perhaps there is a substantial overhead with the lazy ReinterpretArray wrapper?

KristofferC · 2017-12-06T16:31:54Z

Yes, indeed, the question is if there is any overhead with the lazy wrapper. Only benchmarking can tell :)

pablosanjose · 2017-12-08T17:47:07Z

I just did some quick tests and there is indeed a rather alarming overhead with this PR.

julia> using NearestNeighbors, BenchmarkTools
julia> srand(1234); data = rand(3, 10^4);

julia v0.6.1 + NearestNeighbors.jl (master)

julia> @btime KDTree($data);
  2.225 ms (50 allocations: 424.77 KiB)

julia v0.7 master + NearestNeighbors.jl (this PR)

julia> @btime KDTree($data);
  16.353 ms (574 allocations: 501.80 KiB)

I also created another branch ("no-reinterpret") on my fork in which all reinterpret calls were swapped to the following reinterpret_or_copy function

if VERSION < v"0.7.0-DEV.2008"
    @inline reinterpret_or_copy(::Type{T}, data, ::Val{dim}) where {T, dim} =
        reinterpret(SVector{dim,T}, data, (length(data) ÷ dim,))
else
    @inline reinterpret_or_copy(::Type{T}, data, ::Val{dim}) where {T, dim} =
        [SVector{dim,T}(ntuple(i -> data[n+i], Val(dim))) for n in 0:dim:(length(data)-1)]
end

Using this I can once more run on julia master, with similar (although a little slower) timings

julia v0.7 master + NearestNeighbors.jl ("no-reinterpret" branch)

julia> @btime KDTree($data);
  2.820 ms (475 allocations: 724.03 KiB)

So you were right! Which makes me wonder, if I this PR follows the expected usage of the 'ReinterpretArray' wrapper, and the overhead is real, how can we get the old reinterpret behaviour back (which is clearly better in this case)? I suspect, by the large number of allocations with this PR, that there is however something fixable here.

Also, should I open a PR with the "no-reinterpret" branch?

KristofferC · 2017-12-08T18:05:53Z

We need a function barrier there. So the dynamic dispatch on dim only happens once.

pablosanjose · 2017-12-08T18:14:57Z

By "there", you mean in the reinterpret_or_copy function, I assume. Decorating it with a @noinline indeed makes the timing a little better.

julia v0.7 master + NearestNeighbors.jl ("no-reinterpret" branch)

julia> @btime KDTree($data);
  2.660 ms (486 allocations: 724.20 KiB)

However, I'd like to understand why the ReinterpretArray is so slow... Copying feels like surrendering :-P

KristofferC · 2017-12-08T19:12:19Z

I think we just gotta resort to copying...

pablosanjose · 2017-12-08T20:04:38Z

That would be too bad! I have posted a question in discourse, to try to understand if the performance problem can be overcome somehow...

pablosanjose · 2017-12-09T17:08:44Z

I'm closing this in favor of #61. Perhaps in the future, when/if ReinterpretArray is once more as performant as the old reinterpret this simpler (and potentially faster) approach could be revisited, reverting #61.

In such case, it might be good to keep the @compat Array{T}(uninitialized, ...) commit from #61. (That commit should perhaps have been a a separate PR... noobness!)

KristofferC · 2018-06-14T12:26:33Z

Perhaps we can resurrect this. I think new optimizations for reinterpret got merged into master.

pablosanjose added 3 commits December 6, 2017 14:40

data::Vector to data::AbstractVector in signatures

6e992dc

convert to Vector in BruteTree builder

c9110ad

points::Vector to points::AbstractVector

a246eaf

KristofferC reviewed Dec 6, 2017

View reviewed changes

corrected abstract type in BallTree

979f34a

reinterpret -> reinterpret |> reshape

12ec911

pablosanjose mentioned this pull request Dec 9, 2017

Update to 0.7 #61

Merged

pablosanjose closed this Dec 9, 2017

pablosanjose deleted the fix-reinterpret branch January 18, 2018 13:30

KristofferC mentioned this pull request Jun 14, 2018

Add support for Julia 0.7 #66

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix breakage from the new `ReinterpretArray` #60

Fix breakage from the new `ReinterpretArray` #60

pablosanjose commented Dec 6, 2017

codecov-io commented Dec 6, 2017 •

edited

Loading

KristofferC left a comment

KristofferC Dec 6, 2017

pablosanjose Dec 6, 2017

pablosanjose commented Dec 6, 2017

KristofferC commented Dec 6, 2017

pablosanjose commented Dec 8, 2017 •

edited

Loading

KristofferC commented Dec 8, 2017

pablosanjose commented Dec 8, 2017

KristofferC commented Dec 8, 2017

pablosanjose commented Dec 8, 2017

pablosanjose commented Dec 9, 2017

KristofferC commented Jun 14, 2018

Fix breakage from the new ReinterpretArray #60

Fix breakage from the new ReinterpretArray #60

Conversation

pablosanjose commented Dec 6, 2017

codecov-io commented Dec 6, 2017 • edited Loading

Codecov Report

KristofferC left a comment

Choose a reason for hiding this comment

KristofferC Dec 6, 2017

Choose a reason for hiding this comment

pablosanjose Dec 6, 2017

Choose a reason for hiding this comment

pablosanjose commented Dec 6, 2017

KristofferC commented Dec 6, 2017

pablosanjose commented Dec 8, 2017 • edited Loading

KristofferC commented Dec 8, 2017

pablosanjose commented Dec 8, 2017

KristofferC commented Dec 8, 2017

pablosanjose commented Dec 8, 2017

pablosanjose commented Dec 9, 2017

KristofferC commented Jun 14, 2018

Fix breakage from the new `ReinterpretArray` #60

Fix breakage from the new `ReinterpretArray` #60

codecov-io commented Dec 6, 2017 •

edited

Loading

pablosanjose commented Dec 8, 2017 •

edited

Loading