Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster creation of OrderedDict, revisited #221

Closed
wants to merge 19 commits into from
Closed

Faster creation of OrderedDict, revisited #221

wants to merge 19 commits into from

Conversation

phaverty
Copy link

This PR enables much faster creation of OrderedDict objects in julia 0.5 and is backwards compatible with 0.4.

Peter M. Haverty added 2 commits July 27, 2016 21:36
@codecov-io
Copy link

codecov-io commented Oct 29, 2016

Current coverage is 89.67% (diff: 53.65%)

Merging #221 into master will decrease coverage by 2.54%

@@             master       #221   diff @@
==========================================
  Files            28         29     +1   
  Lines          2338       2363    +25   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits           2156       2119    -37   
- Misses          182        244    +62   
  Partials          0          0          

Powered by Codecov. Last update 52615e8...2678dd4

Peter M. Haverty added 2 commits November 14, 2016 10:55
…elta to pass for this PR. It is difficult because most of the code is version specific.
@phaverty phaverty closed this Nov 16, 2016
@phaverty phaverty reopened this Nov 16, 2016
next, done, keys, values, setdiff, setdiff!,
union, union!, intersect, filter, filter!,
hash, eltype, KeyIterator, ValueIterator, convert, copy,
merge
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this particular spacing change weren't part of this commit.

@@ -6,8 +6,10 @@
@test isa(OrderedDict(Pair(1, 1.0)), OrderedDict{Int,Float64})
@test isa(OrderedDict(Pair(1, 1.0), Pair(2, 2.0)), OrderedDict{Int,Float64})
@test isa(OrderedDict(Pair(1, 1.0), Pair(2, 2.0), Pair(3, 3.0)), OrderedDict{Int,Float64})
@test isa(OrderedDict([Pair(1, 1.0), Pair(2, 2.0), Pair(3, 3.0)]), OrderedDict{Int,Float64})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though Pair was used above, I suspect these tests were written before 1 => 1.0 was equivalent to Pair(1,1.0). I think it would be clearer to change all of these to use => notation.


dict_with_eltype{K, V}(kv, ::Type{Tuple{K, V}}) = OrderedDict{K, V}(kv)
dict_with_eltype{K, V}(kv, ::Type{Pair{K, V}}) = OrderedDict{K, V}(kv)
dict_with_eltype(kv, t) = OrderedDict{Any, Any}(kv)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: extra space before t

@static if VERSION >= v"0.5.0"
function OrderedDict(kv, isz::Union{HasLength, HasShape})
n = length(kv)
slotsz = max(16, (n*3)>>1 )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR could be a lot smaller if you did something like the following:

    @static if VERSION >= v"0.5.0"
        get_n_slotsz(kv) = get_slot_size(kv, iteratorsize(kv))
        get_n_slotsz(kv, isz::SizeUnknown) = (0,16)
        function get_n_slotsz(kv, isz::Union{HasLength, HasShape})
            n = length(kv)
            slotsz = max(16, (n*3)>>1)
            return n, slotsz
        end
    else
        get_n_slotsz(kv) = (0,16)
    end

index = hashindex(k,slotsz)
h.keys[i] = k
h.vals[i] = v
h.slots[index] = i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work, as it doesn't handle index collisions. Use ht_keyindex2.


dict_with_eltype{K,V}(kv, ::Type{Tuple{K,V}}) = OrderedDict{K,V}(kv, iteratorsize(kv))
dict_with_eltype{K,V}(kv, ::Type{Pair{K,V}}) = OrderedDict{K,V}(kv, iteratorsize(kv))
dict_with_eltype(kv, t) = OrderedDict{Any,Any}(kv, iteratorsize(kv))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block can be removed if you define get_n_slotsz above. (There is probably a better name for that function...)

@phaverty
Copy link
Author

Thanks for the thoughtful review @kmsquire . I think I've implemented all of your suggestions.

index = ht_keyindex2(h, k)
@inbounds h.keys[i] = k
@inbounds h.vals[i] = v
@inbounds h.slots[-index] = i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closer, but there needs to be a little more defensive programming here. Since kv is an arbitrary set of key-value pairs, it's possible that a key is repeated, so you'll need to check the return value of index and respond accordingly. You should double check against setindex! to make sure there aren't any other corner cases.

@phaverty
Copy link
Author

I think this should do it. I spent some time with setindex! and friends. We don't have to worry about converting the types of the incoming keys and values as we have created the new OrderedDict to match those K and V types. ht_keyindex2 should handle any necessary rehashing. This commit handles incoming duplicated keys. I think that's it. Please let me know if you can think of anything else.

end
return h
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick: spaces at end of line.

@phaverty
Copy link
Author

I've pushed a commit that cleans up the extra spaces.

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty@gene.com

On Mon, Nov 21, 2016 at 3:52 PM, Kevin Squire notifications@github.com
wrote:

@kmsquire commented on this pull request.

In src/ordered_dict.jl
#221 (review)
:

     end
     return h
 end

Minor nitpick: spaces at end of line.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#221 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AH02K01ojtnbmLrFRm-iDlfrCA9h5-Ceks5rAi7AgaJpZM4KkPgh
.

@kmsquire
Copy link
Member

Overall, I think this is good.

I'm slightly concerned that OrderedDicts (and this PR) might be susceptible to JuliaLang/julia#15077, but the implementation has changed enough since then that it's hard to test (we pass the test in Base, but the arrays are not sized the same as described here, and I believe if it were sized the same, the test would fail, since we don't have the appropriate checks in place. Cc: @StefanKarpinski

That said, that's no worse than the current situation, and it should be rare. I'll try to test a little later (but if you have time and want to take a poke at it, please do). Either way, I plan to merge this evening (PST) if there are no other comments, and if a fix is needed for the above, we can fix this constructor at the same time.

@phaverty
Copy link
Author

This new constructor algo works for very short OrderedDicts. Let me think about this one a bit more.

@phaverty phaverty closed this Nov 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants