Skip to content

Conversation

@oscardssmith
Copy link
Member

fixes #47823. Previously we never lowered this leading us to rehash unnecessarily. I'm not sure if this counts as a bugfix.

fixes #47823. Previously we never lowered this leading us to rehash unnecessarily.
Co-authored-by: Christian Rorvik <christian.rorvik@gmail.com>
base/dict.jl Outdated
h.vals[index] = v
h.count += 1
h.age += 1
h.ndel = ifelse(@inbounds(isslotmissing(h, index)), h.ndel - 1, h.ndel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me like the @inbounds should be used by the caller of this function (ref how the index access a few lines above does not have @inbounds on them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was just if the inbounds propagation carries to the next level in the call chain, and without firm understanding of that it felt safer and harmless to add the inbounds here as any bounds violation would have thrown already at this point

@giordano giordano changed the title decrement h.ndel when overwritind deleted entry decrement h.ndel when overwriting deleted entry Dec 7, 2022
Copy link
Member

@petvana petvana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the PR is a clear improvement, it cannot solve #47823 fully because the size of the dictionary remains only 16. Thus, after every ~20 remove operations, it needs to rehash and allocate memory. This is because the keys are different. A possible solution is to implement an in-place rehash! in case the size is not changed.

julia> using BenchmarkTools

julia> function foo(d, reps)
           for i in 1:reps
               d[i] = 1
               delete!(d, i)
           end
           nothing
       end
foo (generic function with 1 method)

julia> @btime foo($(Dict{Int, Int}()), 1000) # This PR
  23.679 μs (249 allocations: 37.61 KiB)

oscardssmith and others added 2 commits December 7, 2022 13:02
Co-authored-by: Petr Vana <petvana@centrum.cz>
@oscardssmith
Copy link
Member Author

Yeah, it's unfortunate that this doesn't fix it better. It would be a good idea to investigate if there are places where we can clean out more tombstones during setindexing. Also somewhat hilariously, 3 people reviewed this and none of them realized that I was doing the check after updating the slots making the check useless (now fixed).

@petvana
Copy link
Member

petvana commented Dec 7, 2022

Also somewhat hilariously, 3 people reviewed this and none of them realized that I was doing the check after updating the slots making the check useless (now fixed).

Now it provably helps ;-)

julia> @btime foo($(Dict{Int, Int}()), 1000)
  17.825 μs (126 allocations: 19.03 KiB)

@ancapdev
Copy link
Contributor

ancapdev commented Dec 7, 2022

Yeah, it's unfortunate that this doesn't fix it better. It would be a good idea to investigate if there are places where we can clean out more tombstones during setindexing.

Would it be possible to clean them out after hitting the rehash threshold and aborting rehash if cleanup takes you back under?

and didn't take advantage of the chance to delete tombstones.
@oscardssmith
Copy link
Member Author

@vtjnash pointed out that we also were unconditionally inserting tombstones on deletion, but we don't have to do so always (and can in fact delete them in some cases). When we do this properly, we get 20% faster with 0 rehashes for this benchmark. (there still will be possible unnecessary rehashes in more complicated scenarios)

@DilumAluthge DilumAluthge changed the title decrement h.ndel when overwriting deleted entry Dict: decrement h.ndel when overwriting deleted entry Dec 7, 2022
@petvana
Copy link
Member

petvana commented Dec 9, 2022

@nanosoldier runbenchmarks("collection", vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

@oscardssmith
Copy link
Member Author

I think this is ready for us to merge.

@oscardssmith oscardssmith merged commit f082329 into master Dec 11, 2022
@oscardssmith oscardssmith deleted the oscardssmith-decrement-Dict-deleted-count branch December 11, 2022 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dict doesn't reuse deleted slots

7 participants