Skip to content

Dict doesn't reuse deleted slots #47823

@ancapdev

Description

@ancapdev

I noticed this a couple of years ago and posted it to Discourse. My observations repeated here:

I don’t know much about hash table implementations, so maybe this is a known weakness of the algorithm chosen. I did notice ndel is only ever increased (in _delete!), then used as a condition for rehashing (if more than 3/4 slots are deleted), and then reset to 0 on rehash. Does that mean deleted slots are never re-used? I thought commonly deleted slots are treated as occupied on probe, and empty for reuse on insert.

DataStructures.RobinDict doesn’t exhibit the same behavior.

Code for validation:

using BenchmarkTools
using DataStructures

function foo(d, reps)
    for i in 1:reps
        d[i] = 1
        delete!(d, i)
    end
    nothing
end

@btime foo($(Dict{Int, Int}()), 1000)
#  24.200 μs (249 allocations: 42.80 KiB)
@btime foo($(RobinDict{Int, Int}()), 1000)
#  33.101 μs (0 allocations: 0 bytes)

I've confirmed this behavior on 1.7, 1.8, and 1.9. It's forced us to use RobinDict for no other reason than this allocation issue, and it's not quite as well supported and tested (we hit a nasty infinite loop bug with it today).

Metadata

Metadata

Assignees

No one assigned

    Labels

    collectionsData structures holding multiple items, e.g. setsperformanceMust go faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions