New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Countmap is 100x slower than it should be in some cases #864
Comments
See #339. There is a specialized method for small integer types which is faster when there are a lot of elements. |
Yeah, the runtime of both methods is Code for figure```julia using StatsBase, Plots function linear_regression(X, Y, x) X0 = mean(X); Y0 = mean(Y) m = sum((X .- X0) .* (Y .- Y0)) / sum((X .- X0).^2) (x-X0)m + Y0 endt(T, a, b) = (x = rand(T.(1:a), b); minimum(@Elapsed(countmap(x)) for _ in 1:100)) x = floor.(exp.(2:.2:log(typemax(Int16)))) plot(x_speedup, y_speedup, label="3x speedup")
|
Thanks for benchmarking. It could make sense to use that method only when there are more than e.g. 10_000 or 100_000 entries. Is the threshold different for |
|
Why not. So you mean that for |
I'm not eager to add heuristics and additional code complexity without robust performance analytics and unfortunately I don't have the bandwidth right now to conduct extensive benchmarking. |
Given what you showed I would say that a rough heuristic is better than the current situation, but as you prefer. |
The text was updated successfully, but these errors were encountered: