New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements for IdDict #51091
base: master
Are you sure you want to change the base?
Conversation
Thanks, any speedup numbers to whet the appetite? I'm not really comfortable duplicating all the lookup code, and the try-catch around the conversion is not a good pattern to use. This needs a different design. Maybe we can copy how |
Here's some performance numbers, function f!(d)
for x in 1:10^4
d[x] = 1
end
end
function g!(d)
for x in 1:10^4
get!(d, x, 1)
end
end On master julia> versioninfo()
Julia Version 1.11.0-DEV.356
Commit 4768d7ab8fe (2023-08-28 12:54 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × AMD Ryzen 7 1700 Eight-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-15.0.7 (ORCJIT, znver1)
Threads: 1 on 16 virtual cores
julia> @btime f!(d) setup=(d=IdDict{Int, Int}(); sizehint!(d, 10^4))
409.015 μs (9489 allocations: 148.27 KiB)
julia> @btime g!(d) setup=(d=IdDict{Int, Int}(); sizehint!(d, 10^4))
639.763 μs (18978 allocations: 296.53 KiB) On this pr @btime f!(d) setup=(d=IdDict{Int, Int}(); sizehint!(d, 10^4))
324.515 μs (9489 allocations: 148.27 KiB)
@btime g!(d) setup=(d=IdDict{Int, Int}(); sizehint!(d, 10^4))
330.156 μs (9489 allocations: 148.27 KiB) |
base/iddict.jl
Outdated
if !(val isa V) # avoid a dynamic call | ||
val = convert(V, val)::V | ||
end | ||
keyindex = ccall(:jl_eqtable_keyindex, Cssize_t, (Any, Any), d, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need to be implemented in C?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to call jl_table_assign_bp
directly from Julia as it can change d.ht
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There used to be ways, but not anymore with the generational GC. The usual way now to implement this is to switch the return object around as such:
keyindex = ccall(:jl_eqtable_keyindex, Cssize_t, (Any, Any), d, key) | |
keyindex = Ref(UInt(0)) | |
d.ht = ccall(:jl_table_assign_bp, Any, (Any, Any, Ref{UInt}), d.ht, key, keyindex) | |
keyindex = keyindex[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change this around still also? It will avoid us from needing to reflect the iddict struct into C
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note, I believe this would remove the ~20% speedup (see comments above) we get from not having to heap allocate for setindex!
. Happy to make this change if that's fine.
75060fe
to
74e0ca6
Compare
Remove old comment Switch to atomic pointers Fixes errors in ASAN as previously we did atomic stores to non atomic pointers Split empty! change into seperate pr Remove duplication in C code Add tests 15% speed up for setindex! Remove try catch Signifcant performance improvement for path where error used to be caught. Fix bug Actually fix bug Restore performance by not boxing key twice Fix comment Add comment about IdDict being mirrored in julia.h Fix comment Fix another comment Allow writing NULL into iddict Reverts the change of behaviour just in case. Remove clutter
Added inbounds as dict.jl seems to use them as well
Unfortunately, cost is ~106 now so still too high, JuliaLang#51912 is tracking issue about high cost. Lowers inlining cost of setproperty as we know rhs isa Memory{Any}
About 15% faster on microbenchmark, `f!(d)` example in pr comments
This reverts commit 7eb2ed6.
This reverts commit 519dfe7.
This reverts commit 7f20833.
Added inbounds as dict.jl seems to use them as well
Fixes bug if for example the call to `default` mutated the dict.
CI looks good, ASAN just timed out I think. |
function setindex!(d::IdDict{K,V}, @nospecialize(val), @nospecialize(key)) where {K, V} | ||
keyindex, inserted = ht_keyindex!(d, key) | ||
if !(val isa V) # avoid a dynamic call | ||
val = convert(V, val)::V |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a discrepancy here in the state of the IdDict during this convert call or if this convert failed. The ht_keyindex!
call has fully inserted the key, but not the value. For Dict
, neither has occurred yet, and all insertion and accounting happens at once in _setindex!
. But in IdDict here, it looks like half of the insertion occurs in ht_keyindex!
(effectively inserting it in the deleted state) but then the final accounting update for it doesn't happen until after the _setindex!
call later in this method. Do we update ndel
over this call so the value is consistent?
There also may need to be an age
counter added to the IdDict, so it can detect concurrent modifications that happen in this convert
call (or move this convert call back to the top of the function)
Conform to dict API, by moving `@inbounds` and changing `ht_keyindex!`. Deal with failure case when `convert` calls fails. TODO: think about concurrent modifications, should only be a problem in `get!`. Add tests for `convert` failing.
base/iddict.jl
Outdated
|
||
if keyindex < 0 | ||
# If convert call fails we need the key to be deleted | ||
@inbounds d[-keyindex] = nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@inbounds d[-keyindex] = nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the deleted state not require they key to be nothing
? Though this prevents a segfault so I'm probably missing something.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, the segfault was because I meant to write d.ht[-keyindex] = nothing
. My concern is if the convert
fails we have a valid key inserted and the value is NULL. This will prevent us from ever updating the value as we will hit this assert
Lines 63 to 69 in e22db88
if (jl_atomic_load_relaxed(&tab[index + 1]) != NULL) { | |
jl_atomic_store_release(&tab[index + 1], val); | |
jl_gc_wb(a, val); | |
return 0; | |
} | |
// `nothing` is our sentinel value for deletion, so need to keep searching if it's also our search key | |
assert(key == jl_nothing); |
I can move the key updating code out of ht_keyindex2!
if that helps.
Caused segfaults when ht_keyindex2! caused a rehash
The main improvement is from
get!
which no longer calculates the hash twice when it has to setindex and so is twice as fast. This is my first time working with the Julia C code so there might be some mistakes there but it seems to work fine for me.This is useful for a global value numbering pass I've written.