WeakKeyDict based on object ids #3002

tanmaykm · 2013-05-03T09:46:59Z

Is this the correct behaviour?

julia> v1 = [1,2]
2-element Int64 Array:
 1
 2

julia> v2 = copy(v1)
2-element Int64 Array:
 1
 2

julia> 

julia> w1 = WeakRef(v1)
WeakRef([1, 2])

julia> w2 = WeakRef(v2)
WeakRef([1, 2])

julia> 

julia> object_id(w1.value) == object_id(w2.value)
false

julia> object_id(w1) == object_id(w2)
false

julia> 

julia> w1 == w2
true

julia> hash(w1) == hash(w2)
true

Since v1 and v2 are distinct, shouldn't hash(w1) and hash(w2) be different for WeakKeyDict to work correctly?

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2013-05-03T18:47:45Z

== and === (is) are different equality functions. The functions used by Dict and WeakKeyDict are == and hash. ObjectIdDict uses === and object_id.

tanmaykm · 2013-05-03T19:20:37Z

Yes. And I think that is what my concern is about, in WeakKeyDict.

As I understand, WeakKeyDict entries are supposed to be removed when the keys (particular instances of a type) get garbage collected. I think that would not work reliably if WeakRefs to two different instances produce the same hash value.

Shouldn't the hash value of WeakRef be based on object ids of the referenced value instead of the value itself? Or may be WeakKeyDict should be implemented differently?

JeffBezanson · 2013-05-03T19:44:16Z

I think both combinations make sense. One case where you want the current behavior of WeakKeyDict is hash consing, where you construct a new object and then see if something equal to it already exists in a WeakKeyDict.

I could modify ObjectIdDict to work correctly with WeakRef entries, and then it would be easy to define a WeakKeyObjectIdDict.

StefanKarpinski · 2013-05-03T19:51:32Z

I feel like this should ideally be more composable:

WeakKeyDict{K,V} should just be Dict{WeakRef{K},V}
WeakKeyObjectIdDict should just be ObjectIdDict{WeakRef,Any}

etc. Not entirely sure if this is possible at the moment, but it seems like it's pretty close to doable at least.

StefanKarpinski · 2013-05-03T19:55:51Z

I realize this may make weak-key dicts less convenient to use since you'd have to both look each item up and dereference the actual value from the WeakRef container, but I feel like that's acceptable – it's not like WeakKeyDicts are something that ought to be used every day.

tanmaykm · 2013-05-03T19:58:13Z

But then that results in the following scenario:

ulia> wd = WeakKeyDict{Vector, Int}();

julia> v1 = [1,2];

julia> v2 = copy(v1);

julia> wd[v1] = 1;

julia> wd[v2] = 2;

julia> length(wd)
1

julia> v1 = v2 = [1,2];

julia> gc()
ERROR: key not found: [1, 2]
 in delete! at dict.jl:488
 in anonymous at dict.jl:518
 in gc at base.jl:92

I also noticed the usage of WeakKeyDict in multi.jl for RemoteRefs relies on the current behavior. But I wonder if it is correct... It seems to be using this behavior to get the same RemoteRef object instance in the constructor.

JeffBezanson · 2013-05-03T20:09:39Z

The RemoteRef constructor works because it uses getkey(). This is just the classic hash-consing pattern. I agree that other usage patterns should be supported too though.

The error is an implementation bug. It can be fixed several ways; simplest is just to tolerate missing keys in the finalizer that calls delete!.

tanmaykm · 2013-05-03T20:33:06Z

Thanks. That sounds good.

I think then a WeakKeyObjectIdDict is what would be useful as you suggested.

I would be using this in the ChainedVectors package... where I try to create sub vectors of larger buffers without making a copy and need the larger buffer to hang around till any of the sub vectors are still being used.

tanmaykm · 2013-05-07T05:58:33Z

Here's an implementation of WeakObjectIdKeyDict, in the same lines as WeakKeyDict. Does it look good?

type WeakObjectIdKeyDict{K,V} <: Associative{K,V}
    ht::Dict{Uint,Tuple}

    WeakObjectIdKeyDict() = new((Uint=>Tuple)[])
end
WeakObjectIdKeyDict() = WeakObjectIdKeyDict{Any,Any}()

function setindex!{K,V}(wih::WeakObjectIdKeyDict{K,V}, v, key)      
    oid = object_id(key::K)
    wih.ht[oid] = (WeakRef(key), v::V)
    finalizer(key, x->delete!(wih.ht, oid, nothing))
end

function getkey{K}(wih::WeakObjectIdKeyDict{K}, kk, deflt)
    k = getkey(wih.ht, object_id(kk::K), secret_table_token)
    if is(k, secret_table_token)
        return deflt
    end
    kk
end

function get{K}(wih::WeakObjectIdKeyDict{K}, key, def)
    v = get(wih.ht, object_id(key::K), def)
    is(v, def) && return v
    v[2]
end
delete!{K}(wih::WeakObjectIdKeyDict{K}, key) = delete!(wih.ht, object_id(key::K))[2]
function delete!{K}(wih::WeakObjectIdKeyDict{K}, key, def)
    v = delete!(wih.ht, object_id(key::K), def)
    is(v, def) && return v
    v[2]
end
empty!(wih::WeakObjectIdKeyDict)  = (empty!(wih.ht); wih)
haskey{K}(wih::WeakObjectIdKeyDict{K}, key) = haskey(wih.ht, object_id(key::K))
getindex{K}(wih::WeakObjectIdKeyDict{K}, key) = getindex(wih.ht, object_id(key::K))[2]
isempty(wih::WeakObjectIdKeyDict) = isempty(wih.ht)

start(wih::WeakObjectIdKeyDict) = start(wih.ht)
done(wih::WeakObjectIdKeyDict, i) = done(wih.ht, i)
function next{K}(wih::WeakObjectIdKeyDict{K}, i)
    kv, i = next(wih.ht, i)
    vv = kv[2]
    ((vv[1].value::K, vv[2]), i)
end
length(wih::WeakObjectIdKeyDict) = length(wih.ht)

[pao: syntax highlighting]

JeffBezanson · 2013-05-07T20:22:19Z

Looks good. It's interesting to note that for most purposes the key doesn't actually need to be stored in the dictionary; it's only needed in next. If you didn't need iteration, it could be implemented much more efficiently.

tanmaykm · 2013-05-07T20:48:11Z

Yes. Do you feel we should skip the iterator in favor of efficiency?

mauro3 · 2017-12-06T11:25:23Z

Another thing which seems surprising to me is:

julia> w1,w2 = WeakRef([1]), WeakRef([1])                                                                                                                                
(WeakRef([1]), WeakRef([1]))                                                                                                                                             

julia> w1==w2                                                                                                                                                            
true                                                                                                                                                                     

julia> r1,r2 = Ref([1]), Ref([1])                                                                                                                                        
(Base.RefArray{Int64,Array{Int64,1},Void}([1], 1, nothing), Base.RefArray{Int64,Array{Int64,1},Void}([1], 1, nothing))                                                   

julia> r1==r2                                                                                                                                                            
false

WeakRef compares their values with ==, Ref compares their values with ===.

I would argue that it would be clearer for WeakRef to behave like Ref, i.e. use === and object_id (and WeakKeyDict equally). Presumably, we'd then want to define a WeakKeyDictValue (needs a better name) to take up the current WeakKeyDict functionality.

EDIT: x-ref I came across #12198 (comment)

mauro3 · 2017-12-06T13:01:42Z

I just tried bulding Julia with WeakRef using === and object_id. This is the diff:

diff --git a/base/gcutils.jl b/base/gcutils.jl
index 0ff53bceb6..0f26be2af7 100644
--- a/base/gcutils.jl
+++ b/base/gcutils.jl
@@ -1,8 +1,8 @@
 # This file is a part of Julia. License is MIT: https://julialang.org/license
 
-==(w::WeakRef, v::WeakRef) = isequal(w.value, v.value)
-==(w::WeakRef, v) = isequal(w.value, v)
-==(w, v::WeakRef) = isequal(w, v.value)
+# ==(w::WeakRef, v::WeakRef) = isequal(w.value, v.value)
+# ==(w::WeakRef, v) = isequal(w.value, v)
+# ==(w, v::WeakRef) = isequal(w, v.value)
 
 """
     finalizer(f, x)
diff --git a/base/hashing.jl b/base/hashing.jl
index e2847342a8..a415f85a5b 100644
--- a/base/hashing.jl
+++ b/base/hashing.jl
@@ -14,7 +14,7 @@ Typically, any type that implements `hash` should also implement its own `==` (h
 `isequal`) to guarantee the property mentioned above.
 """
 hash(x::Any) = hash(x, zero(UInt))
-hash(w::WeakRef, h::UInt) = hash(w.value, h)
+# hash(w::WeakRef, h::UInt) = hash(w.value, h)
 
 ## hashing general objects ##

It builds fine and the only error in the test-set is from deserializing a function:

    JULIA test/serialize                        
Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
serialize: Test Failed at /home/mauro/julia/julia-master/test/serialize.jl:320
  Expression: deserialize(ds) === g2
   Evaluated: getfield(Base.Serializer.__deserialized_types__, Symbol("##976"))() === getfield(Base.Serializer.__deserialized_types__, Symbol("#g#57"))()
Stacktrace:
 [1] (::getfield(, Symbol("##53#56")))(::Base.GenericIOBuffer{Array{UInt8,1}}) at /home/mauro/julia/julia-master/test/serialize.jl:320
 [2] create_serialization_stream(::getfield(, Symbol("##53#56"))) at /home/mauro/julia/julia-master/test/serialize.jl:12
 [3] top-level scope at /home/mauro/julia/julia-master/test/serialize.jl:309
 [4] eval(::Module, ::Expr) at /home/mauro/julia/julia-master/test/testdefs.jl:13
 [5] top-level scope
 [6] include_relative(::Module, ::String) at ./loading.jl:509
 [7] include at ./sysimg.jl:15 [inlined]
 [8] include(::String) at /home/mauro/julia/julia-master/test/testdefs.jl:13
 [9] macro expansion at /home/mauro/julia/julia-master/test/testdefs.jl:22 [inlined]
 [10] macro expansion at /home/mauro/julia/julia-master/usr/share/julia/site/v0.7/Test/src/Test.jl:953 [inlined]
 [11] macro expansion at /home/mauro/julia/julia-master/test/testdefs.jl:21 [inlined]
 [12] macro expansion at util.jl:310 [inlined]
 [13] top-level scope at /home/mauro/julia/julia-master/test/testdefs.jl:19
Worker 1 failed running test serialize:
Some tests did not pass: 104 passed, 1 failed, 0 errored, 0 broken.serialize: Test Failed at /home/mauro/julia/julia-master/test/serialize.jl:320
  Expression: deserialize(ds) === g2
   Evaluated: getfield(Base.Serializer.__deserialized_types__, Symbol("##976"))() === getfield(Base.Serializer.__deserialized_types__, Symbol("#g#57"))()
Stacktrace:
 [1] record(::Test.DefaultTestSet, ::Test.Fail) at /home/mauro/julia/julia-master/usr/share/julia/site/v0.7/Test/src/Test.jl:660
 [2] (::getfield(, Symbol("##42#46")))() at /home/mauro/julia/julia-master/test/runtests.jl:184
 [3] cd(::getfield(, Symbol("##42#46")), ::String) at ./file.jl:70
 [4] top-level scope
 [5] include_relative(::Module, ::String) at ./loading.jl:509
 [6] include(::Module, ::String) at ./sysimg.jl:15
 [7] process_options(::Base.JLOptions) at ./client.jl:340
 [8] _start() at ./client.jl:406

Test Summary: | Pass  Fail  Total
  Overall     |  104     1    105
    serialize |  104     1    105
    FAILURE

So at least a WeakRef test should be added, which explicitly tests the == behavior.

This was referenced Jun 12, 2014

WIP/RFC: metadata implemented, REPL-help ported, user-made doc now possible (issue #3988) #5572

Closed

WeakObjectIdDict & more general dicts JuliaCollections/DataStructures.jl#42

Closed

ViralBShah removed the feature label Feb 14, 2015

mauro3 mentioned this issue Dec 6, 2017

SharedArray access in @parallel for causes Segmentation fault #14764

Closed

mauro3 mentioned this issue Dec 6, 2017

WeakKeyDict should not convert keys as otherwise it can drop them #24941

Closed

This was referenced Jul 19, 2018

WeakKeyIdDict implementation (keeping WeakKeyDict) #28182

Open

WIP/RFC: Converted WeakKeyDict to hash & compare with object-id #28161

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WeakKeyDict based on object ids #3002

WeakKeyDict based on object ids #3002

tanmaykm commented May 3, 2013

JeffBezanson commented May 3, 2013

tanmaykm commented May 3, 2013

JeffBezanson commented May 3, 2013

StefanKarpinski commented May 3, 2013

StefanKarpinski commented May 3, 2013

tanmaykm commented May 3, 2013

JeffBezanson commented May 3, 2013

tanmaykm commented May 3, 2013

tanmaykm commented May 7, 2013

JeffBezanson commented May 7, 2013

tanmaykm commented May 7, 2013

mauro3 commented Dec 6, 2017 •

edited

mauro3 commented Dec 6, 2017

WeakKeyDict based on object ids #3002

WeakKeyDict based on object ids #3002

Comments

tanmaykm commented May 3, 2013

JeffBezanson commented May 3, 2013

tanmaykm commented May 3, 2013

JeffBezanson commented May 3, 2013

StefanKarpinski commented May 3, 2013

StefanKarpinski commented May 3, 2013

tanmaykm commented May 3, 2013

JeffBezanson commented May 3, 2013

tanmaykm commented May 3, 2013

tanmaykm commented May 7, 2013

JeffBezanson commented May 7, 2013

tanmaykm commented May 7, 2013

mauro3 commented Dec 6, 2017 • edited

mauro3 commented Dec 6, 2017

mauro3 commented Dec 6, 2017 •

edited