Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Countmap is 100x slower than it should be in some cases #864

Open
LilithHafner opened this issue May 31, 2023 · 7 comments
Open

Countmap is 100x slower than it should be in some cases #864

LilithHafner opened this issue May 31, 2023 · 7 comments

Comments

@LilithHafner
Copy link
Contributor

julia> @btime countmap(x) setup=(x=rand(Int16.(1:10), 30));
  41.375 μs (6 allocations: 512.47 KiB)

julia> @btime countmap(x) setup=(x=rand(1:10, 30));
  266.611 ns (5 allocations: 848 bytes)
@andreasnoack
Copy link
Member

See #339. There is a specialized method for small integer types which is faster when there are a lot of elements.

@LilithHafner
Copy link
Contributor Author

Yeah, the runtime of both methods isO(n) and the alternate method has a better multiplicative constant factor but a much worse additive constant factor which is good for lots of elements but not so good for few elements. This is a rough approximation of when each method is better. There are some methodological issues but I think it's accurate to within an order of magnitude in all cases.
figure

Code for figure ```julia using StatsBase, Plots function linear_regression(X, Y, x) X0 = mean(X); Y0 = mean(Y) m = sum((X .- X0) .* (Y .- Y0)) / sum((X .- X0).^2) (x-X0)m + Y0 end

t(T, a, b) = (x = rand(T.(1:a), b); minimum(@Elapsed(countmap(x)) for _ in 1:100))
f(a, b) = t(Int16, a, b)/t(Int64, a, b)
function g(a, target=1)
b = 10
while f(a, b) > target
b *= 10
end
input_sizes = round.(Int, exp.(range(log(b/10), log(b), length=10)))
time_ratios = f.(a, input_sizes)
exp(linear_regression(log.(time_ratios), log.(input_sizes), log(target)))
end

x = floor.(exp.(2:.2:log(typemax(Int16))))
@time y_even = g.(x)
@time y_slowdown = g.(x, 3)
x_speedup = vcat(x[5:5:end-3], last(x))
@time y_speedup = g.(x_speedup, 1/3)

plot(x_speedup, y_speedup, label="3x speedup")
plot!(x, y_even, xlabel="range of values", ylabel="number of elements",
xaxis=:log10, yaxis=:log10, xticks=7, yticks=7, label="even",
title="When is the method added in #339 good?")
plot!(x, y_slowdown, label="3x slowdown")
ylims!(10^2, 10^7)
savefig("figure.png")

@nalimilan
Copy link
Member

Thanks for benchmarking. It could make sense to use that method only when there are more than e.g. 10_000 or 100_000 entries. Is the threshold different for Int16, Int8 and Bool?

@LilithHafner
Copy link
Contributor Author

length(x) > Base.max_values(eltype(x)) seems like it would be a reasonable threshold (though avoid using Base.max_values because it is internal)

@nalimilan
Copy link
Member

Why not. So you mean that for Int8 the specialized method is faster for arrays longer than 256 elements (roughly)? Would you make a pull request?

@LilithHafner
Copy link
Contributor Author

I'm not eager to add heuristics and additional code complexity without robust performance analytics and unfortunately I don't have the bandwidth right now to conduct extensive benchmarking.

@nalimilan
Copy link
Member

Given what you showed I would say that a rough heuristic is better than the current situation, but as you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants