-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster creation of OrderedDict, revisited #221
Conversation
…back for infinite iterators is the original loop over k/v pairs
…mpatible with 0.4
Current coverage is 89.67% (diff: 53.65%)@@ master #221 diff @@
==========================================
Files 28 29 +1
Lines 2338 2363 +25
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 2156 2119 -37
- Misses 182 244 +62
Partials 0 0
|
…elta to pass for this PR. It is difficult because most of the code is version specific.
…n specific code choice
…on method too, so less code 0.4 only and more test coverage in 0.5
next, done, keys, values, setdiff, setdiff!, | ||
union, union!, intersect, filter, filter!, | ||
hash, eltype, KeyIterator, ValueIterator, convert, copy, | ||
merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if this particular spacing change weren't part of this commit.
@@ -6,8 +6,10 @@ | |||
@test isa(OrderedDict(Pair(1, 1.0)), OrderedDict{Int,Float64}) | |||
@test isa(OrderedDict(Pair(1, 1.0), Pair(2, 2.0)), OrderedDict{Int,Float64}) | |||
@test isa(OrderedDict(Pair(1, 1.0), Pair(2, 2.0), Pair(3, 3.0)), OrderedDict{Int,Float64}) | |||
@test isa(OrderedDict([Pair(1, 1.0), Pair(2, 2.0), Pair(3, 3.0)]), OrderedDict{Int,Float64}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though Pair
was used above, I suspect these tests were written before 1 => 1.0
was equivalent to Pair(1,1.0)
. I think it would be clearer to change all of these to use =>
notation.
|
||
dict_with_eltype{K, V}(kv, ::Type{Tuple{K, V}}) = OrderedDict{K, V}(kv) | ||
dict_with_eltype{K, V}(kv, ::Type{Pair{K, V}}) = OrderedDict{K, V}(kv) | ||
dict_with_eltype(kv, t) = OrderedDict{Any, Any}(kv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: extra space before t
@static if VERSION >= v"0.5.0" | ||
function OrderedDict(kv, isz::Union{HasLength, HasShape}) | ||
n = length(kv) | ||
slotsz = max(16, (n*3)>>1 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR could be a lot smaller if you did something like the following:
@static if VERSION >= v"0.5.0"
get_n_slotsz(kv) = get_slot_size(kv, iteratorsize(kv))
get_n_slotsz(kv, isz::SizeUnknown) = (0,16)
function get_n_slotsz(kv, isz::Union{HasLength, HasShape})
n = length(kv)
slotsz = max(16, (n*3)>>1)
return n, slotsz
end
else
get_n_slotsz(kv) = (0,16)
end
index = hashindex(k,slotsz) | ||
h.keys[i] = k | ||
h.vals[i] = v | ||
h.slots[index] = i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work, as it doesn't handle index collisions. Use ht_keyindex2
.
|
||
dict_with_eltype{K,V}(kv, ::Type{Tuple{K,V}}) = OrderedDict{K,V}(kv, iteratorsize(kv)) | ||
dict_with_eltype{K,V}(kv, ::Type{Pair{K,V}}) = OrderedDict{K,V}(kv, iteratorsize(kv)) | ||
dict_with_eltype(kv, t) = OrderedDict{Any,Any}(kv, iteratorsize(kv)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block can be removed if you define get_n_slotsz
above. (There is probably a better name for that function...)
…t in order to handle hash collisions
Thanks for the thoughtful review @kmsquire . I think I've implemented all of your suggestions. |
index = ht_keyindex2(h, k) | ||
@inbounds h.keys[i] = k | ||
@inbounds h.vals[i] = v | ||
@inbounds h.slots[-index] = i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closer, but there needs to be a little more defensive programming here. Since kv
is an arbitrary set of key-value pairs, it's possible that a key is repeated, so you'll need to check the return value of index
and respond accordingly. You should double check against setindex!
to make sure there aren't any other corner cases.
I think this should do it. I spent some time with setindex! and friends. We don't have to worry about converting the types of the incoming keys and values as we have created the new OrderedDict to match those K and V types. ht_keyindex2 should handle any necessary rehashing. This commit handles incoming duplicated keys. I think that's it. Please let me know if you can think of anything else. |
end | ||
return h | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpick: spaces at end of line.
I've pushed a commit that cleans up the extra spaces. Pete Peter M. Haverty, Ph.D. On Mon, Nov 21, 2016 at 3:52 PM, Kevin Squire notifications@github.com
|
Overall, I think this is good. I'm slightly concerned that That said, that's no worse than the current situation, and it should be rare. I'll try to test a little later (but if you have time and want to take a poke at it, please do). Either way, I plan to merge this evening (PST) if there are no other comments, and if a fix is needed for the above, we can fix this constructor at the same time. |
This new constructor algo works for very short OrderedDicts. Let me think about this one a bit more. |
This PR enables much faster creation of OrderedDict objects in julia 0.5 and is backwards compatible with 0.4.