-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Dict: decrement h.ndel when overwriting deleted entry
#47825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dict: decrement h.ndel when overwriting deleted entry
#47825
Conversation
fixes #47823. Previously we never lowered this leading us to rehash unnecessarily.
Co-authored-by: Christian Rorvik <christian.rorvik@gmail.com>
base/dict.jl
Outdated
| h.vals[index] = v | ||
| h.count += 1 | ||
| h.age += 1 | ||
| h.ndel = ifelse(@inbounds(isslotmissing(h, index)), h.ndel - 1, h.ndel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to me like the @inbounds should be used by the caller of this function (ref how the index access a few lines above does not have @inbounds on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was just if the inbounds propagation carries to the next level in the call chain, and without firm understanding of that it felt safer and harmless to add the inbounds here as any bounds violation would have thrown already at this point
h.ndel when overwritind deleted entryh.ndel when overwriting deleted entry
petvana
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the PR is a clear improvement, it cannot solve #47823 fully because the size of the dictionary remains only 16. Thus, after every ~20 remove operations, it needs to rehash and allocate memory. This is because the keys are different. A possible solution is to implement an in-place rehash! in case the size is not changed.
julia> using BenchmarkTools
julia> function foo(d, reps)
for i in 1:reps
d[i] = 1
delete!(d, i)
end
nothing
end
foo (generic function with 1 method)
julia> @btime foo($(Dict{Int, Int}()), 1000) # This PR
23.679 μs (249 allocations: 37.61 KiB)Co-authored-by: Petr Vana <petvana@centrum.cz>
|
Yeah, it's unfortunate that this doesn't fix it better. It would be a good idea to investigate if there are places where we can clean out more tombstones during |
Now it provably helps ;-) julia> @btime foo($(Dict{Int, Int}()), 1000)
17.825 μs (126 allocations: 19.03 KiB) |
Would it be possible to clean them out after hitting the rehash threshold and aborting rehash if cleanup takes you back under? |
and didn't take advantage of the chance to delete tombstones.
|
@vtjnash pointed out that we also were unconditionally inserting tombstones on deletion, but we don't have to do so always (and can in fact delete them in some cases). When we do this properly, we get 20% faster with 0 rehashes for this benchmark. (there still will be possible unnecessary rehashes in more complicated scenarios) |
h.ndel when overwriting deleted entryh.ndel when overwriting deleted entry
|
@nanosoldier |
|
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. |
|
I think this is ready for us to merge. |
fixes #47823. Previously we never lowered this leading us to rehash unnecessarily. I'm not sure if this counts as a bugfix.