Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Dict sizehint! to allow enough space for the requested length #37248

Merged
merged 1 commit into from Aug 31, 2020

Conversation

JeffBezanson
Copy link
Sponsor Member

fixes #9909
noticed in #37208

Test code:

function grow_dict(szh::Bool)
    d = Dict{Int,Nothing}()
    szh && sizehint!(d, 32768)
    for i in 1:32768
        d[i] = nothing
    end
    length(d.keys)
end

before:

julia> @btime grow_dict(true)
  704.747 μs (12 allocations: 1.41 MiB)
131072

julia> @btime grow_dict(false)
  702.613 μs (27 allocations: 770.16 KiB)
65536

after:

julia> @btime grow_dict(true)
  464.426 μs (7 allocations: 576.66 KiB)
65536

julia> @btime grow_dict(false)
  702.545 μs (27 allocations: 770.16 KiB)
65536

@bkamins You'll probably want to rerun the benchmarks for allunique!

@JeffBezanson JeffBezanson added performance Must go faster collections Data structures holding multiple items, e.g. sets labels Aug 28, 2020
@bkamins
Copy link
Member

bkamins commented Aug 28, 2020

All looks much better now.

Also the original use case of allunique is improved (now allunique is almost 2x faster than before my PR). Details below.

Setup:

function allunique2(C)
    seen = Dict{eltype(C), Nothing}()
    x = iterate(C)
    if haslength(C) && length(C) > 1000
        for i in OneTo(1000)
            v, s = x
            idx = ht_keyindex2!(seen, v)
            idx > 0 && return false
            _setindex!(seen, nothing, v, -idx)
            x = iterate(C, s)
        end
        sizehint!(seen, length(C))
    end
    while x !== nothing
        v, s = x
        idx = ht_keyindex2!(seen, v)
        idx > 0 && return false
        _setindex!(seen, nothing, v, -idx)
        x = iterate(C, s)
    end
    return true
end

function allunique3(C)
    seen = Dict{eltype(C), Nothing}()
    x = iterate(C)
    while x !== nothing
        v, s = x
        idx = ht_keyindex2!(seen, v)
        idx > 0 && return false
        _setindex!(seen, nothing, v, -idx)
        x = iterate(C, s)
    end
    return true
end

Before this PR:

julia> for i in 0:8
           x = [1:8000 + 20000i;]
           @show length(x)
           @btime allunique2($x)
           @btime allunique3($x)
       end
length(x) = 8000
  189.005 μs (26 allocations: 410.20 KiB)
  157.838 μs (22 allocations: 193.86 KiB)
length(x) = 28000
  650.336 μs (27 allocations: 1.46 MiB)
  651.134 μs (27 allocations: 770.16 KiB)
length(x) = 48000
  1.304 ms (27 allocations: 2.86 MiB)
  1.522 ms (32 allocations: 3.00 MiB)
length(x) = 68000
  1.188 ms (22 allocations: 1.17 MiB)
  1.863 ms (32 allocations: 3.00 MiB)
length(x) = 88000
  2.706 ms (27 allocations: 3.42 MiB)
  2.269 ms (32 allocations: 3.00 MiB)
length(x) = 108000
  3.181 ms (27 allocations: 3.42 MiB)
  2.769 ms (32 allocations: 3.00 MiB)
length(x) = 128000
  3.783 ms (27 allocations: 3.42 MiB)
  3.289 ms (32 allocations: 3.00 MiB)
length(x) = 148000
  2.863 ms (22 allocations: 2.30 MiB)
  3.854 ms (32 allocations: 3.00 MiB)
length(x) = 168000
  3.415 ms (22 allocations: 2.30 MiB)
  4.521 ms (32 allocations: 3.00 MiB)

After this PR:

julia> for i in 0:8
           x = [1:8000 + 20000i;]
           @show length(x)
           @btime allunique2($x)
           @btime allunique3($x)
       end
length(x) = 8000
  129.007 μs (22 allocations: 193.86 KiB)
  157.937 μs (22 allocations: 193.86 KiB)
length(x) = 28000
  382.389 μs (22 allocations: 625.86 KiB)
  653.205 μs (27 allocations: 770.16 KiB)
length(x) = 48000
  707.788 μs (22 allocations: 1.17 MiB)
  1.523 ms (32 allocations: 3.00 MiB)
length(x) = 68000
  1.181 ms (22 allocations: 1.17 MiB)
  1.862 ms (32 allocations: 3.00 MiB)
length(x) = 88000
  1.319 ms (22 allocations: 2.30 MiB)
  2.251 ms (32 allocations: 3.00 MiB)
length(x) = 108000
  1.762 ms (22 allocations: 2.30 MiB)
  2.690 ms (32 allocations: 3.00 MiB)
length(x) = 128000
  2.223 ms (22 allocations: 2.30 MiB)
  3.183 ms (32 allocations: 3.00 MiB)
length(x) = 148000
  2.847 ms (22 allocations: 2.30 MiB)
  3.752 ms (32 allocations: 3.00 MiB)
length(x) = 168000
  3.400 ms (22 allocations: 2.30 MiB)
  5.032 ms (32 allocations: 3.00 MiB)

Original use case:

julia> x = [1:10^8;];

julia> @time allunique(x) # version on Julia 1.5
 17.199410 seconds (82 allocations: 4.499 GiB, 1.75% gc time)
true

julia> @time allunique2(x) # version after my PR
  9.658649 seconds (22 allocations: 2.250 GiB, 0.60% gc time)
true

@JeffBezanson JeffBezanson merged commit 2154472 into master Aug 31, 2020
@JeffBezanson JeffBezanson deleted the jb/dictsizehint branch August 31, 2020 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sizehint!(::Dict, n) doesn't grow Dicts enough
2 participants