Countmap is 100x slower than it should be in some cases #864

LilithHafner · 2023-05-31T15:31:54Z

julia> @btime countmap(x) setup=(x=rand(Int16.(1:10), 30));
  41.375 μs (6 allocations: 512.47 KiB)

julia> @btime countmap(x) setup=(x=rand(1:10, 30));
  266.611 ns (5 allocations: 848 bytes)

andreasnoack · 2023-05-31T18:24:16Z

See #339. There is a specialized method for small integer types which is faster when there are a lot of elements.

LilithHafner · 2023-05-31T20:02:28Z

Yeah, the runtime of both methods isO(n) and the alternate method has a better multiplicative constant factor but a much worse additive constant factor which is good for lots of elements but not so good for few elements. This is a rough approximation of when each method is better. There are some methodological issues but I think it's accurate to within an order of magnitude in all cases.

Code for figure

```julia using StatsBase, Plots function linear_regression(X, Y, x) X0 = mean(X); Y0 = mean(Y) m = sum((X .- X0) .* (Y .- Y0)) / sum((X .- X0).^2) (x-X0)m + Y0 end

t(T, a, b) = (x = rand(T.(1:a), b); minimum(@Elapsed(countmap(x)) for _ in 1:100))
f(a, b) = t(Int16, a, b)/t(Int64, a, b)
function g(a, target=1)
b = 10
while f(a, b) > target
b *= 10
end
input_sizes = round.(Int, exp.(range(log(b/10), log(b), length=10)))
time_ratios = f.(a, input_sizes)
exp(linear_regression(log.(time_ratios), log.(input_sizes), log(target)))
end

x = floor.(exp.(2:.2:log(typemax(Int16))))
@time y_even = g.(x)
@time y_slowdown = g.(x, 3)
x_speedup = vcat(x[5:5:end-3], last(x))
@time y_speedup = g.(x_speedup, 1/3)

plot(x_speedup, y_speedup, label="3x speedup")
plot!(x, y_even, xlabel="range of values", ylabel="number of elements",
xaxis=:log10, yaxis=:log10, xticks=7, yticks=7, label="even",
title="When is the method added in #339 good?")
plot!(x, y_slowdown, label="3x slowdown")
ylims!(10^2, 10^7)
savefig("figure.png")

nalimilan · 2023-06-17T11:45:42Z

Thanks for benchmarking. It could make sense to use that method only when there are more than e.g. 10_000 or 100_000 entries. Is the threshold different for Int16, Int8 and Bool?

LilithHafner · 2023-06-17T20:43:18Z

length(x) > Base.max_values(eltype(x)) seems like it would be a reasonable threshold (though avoid using Base.max_values because it is internal)

nalimilan · 2023-06-18T14:27:58Z

Why not. So you mean that for Int8 the specialized method is faster for arrays longer than 256 elements (roughly)? Would you make a pull request?

LilithHafner · 2023-06-18T23:31:56Z

I'm not eager to add heuristics and additional code complexity without robust performance analytics and unfortunately I don't have the bandwidth right now to conduct extensive benchmarking.

nalimilan · 2023-06-19T07:50:45Z

Given what you showed I would say that a rough heuristic is better than the current situation, but as you prefer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Countmap is 100x slower than it should be in some cases #864

Countmap is 100x slower than it should be in some cases #864

LilithHafner commented May 31, 2023

andreasnoack commented May 31, 2023

LilithHafner commented May 31, 2023

nalimilan commented Jun 17, 2023

LilithHafner commented Jun 17, 2023

nalimilan commented Jun 18, 2023

LilithHafner commented Jun 18, 2023

nalimilan commented Jun 19, 2023

Countmap is 100x slower than it should be in some cases #864

Countmap is 100x slower than it should be in some cases #864

Comments

LilithHafner commented May 31, 2023

andreasnoack commented May 31, 2023

LilithHafner commented May 31, 2023

nalimilan commented Jun 17, 2023

LilithHafner commented Jun 17, 2023

nalimilan commented Jun 18, 2023

LilithHafner commented Jun 18, 2023

nalimilan commented Jun 19, 2023