-
Notifications
You must be signed in to change notification settings - Fork 262
Filling a new OrderedDict quickly #211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…he creation of new OrderedDicts was one of the slower parts of NamedArray subsetting. This PR creates an inner constructor for making a new, empty OrderedDict of a specified size. As this new OrderedDict is known to be empty and of the correct size, it can be filled in a simplified fashion. This is good for a 1.6X speedup for creating an OrderedDict with 1e3 key/value pairs. I have added tests for the new constructor.
|
Closed and re-opened to get CI to run. Sorry for the noise. This version will be the keeper. |
Current coverage is 91.16% (diff: 88.46%)@@ master #211 diff @@
==========================================
Files 27 27
Lines 2288 2297 +9
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 2057 2094 +37
+ Misses 231 203 -28
Partials 0 0
|
|
CI failure is on nightly only and is due to absence of Base.complement in 0.5. |
|
CI passes on both 0.5 and 0.4 now after the "complement" fix. |
src/ordered_dict.jl
Outdated
| end | ||
| function OrderedDict(kv) | ||
| h = OrderedDict{K,V}() | ||
| h = OrderedDict{K,V}(length(kv)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces an assumption that kv supports length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for having a look. I assumed that there had to be a 'length' method
for 'kv' because it was used in a for loop
for (k,v) in kv
, but I guess 'for' only requires the three functions necessary for an
iterator. Do I want/need to do some kind of Traits thing to dispatch on
whether or not 'kv' has 'length' and then fall back to the old version if
it does not?
While I've got you on the line, do you have any thoughts about having an
inner constructor that makes an empty OrderedDict that can only be used in
the context of these filling functions? Perhaps it would be better to
copy/paste
slotsz = max(16, (n*3)>>1 )
h = new(zeros(Int32,slotsz), Array(K,n), Array(V,n), 0, false)
into the two relevant places.
Pete
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty@gene.com
On Wed, Jul 27, 2016 at 2:45 PM, Keno Fischer notifications@github.com
wrote:
In src/ordered_dict.jl
#211 (comment)
:function OrderedDict(kv)
h = OrderedDict{K,V}()h = OrderedDict{K,V}(length(kv))This introduces an assumption that kv supports length.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/JuliaLang/DataStructures.jl/pull/211/files/6e26b3fd7faa3ad1303fec37e180b3adcfe9171c#r72530227,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AH02K9RnHMKNrLSC7RY_I4MrxsB7uY6Uks5qZ9GPgaJpZM4JVhGj
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already is Base.HasLength(): https://github.com/JuliaLang/julia/blob/bcc2121dc31647fc0999d09c10df91d35f216838/base/generator.jl#L44
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HasLength is a great idea, thanks. However, I can't see a way to use that
for 0.4. Are infinite iterators really that common? Perhaps it is better
to fail for rare iterators than to require 0.5?
Pete
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty@gene.com
On Wed, Jul 27, 2016 at 3:22 PM, Kristoffer Carlsson <
notifications@github.com> wrote:
In src/ordered_dict.jl
#211 (comment)
:function OrderedDict(kv)
h = OrderedDict{K,V}()h = OrderedDict{K,V}(length(kv))There already is Base.HasLength():
https://github.com/JuliaLang/julia/blob/bcc2121dc31647fc0999d09c10df91d35f216838/base/generator.jl#L44—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/JuliaLang/DataStructures.jl/pull/211/files/6e26b3fd7faa3ad1303fec37e180b3adcfe9171c#r72535317,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AH02KxAQMCfHP-GTtvtIQxZaBHyfI-Mrks5qZ9oMgaJpZM4JVhGj
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just retain the old version for 0.4?
…back for infinite iterators is the original loop over k/v pairs
…back for infinite iterators is the original loop over k/v pairs
…back for infinite iterators is the original loop over k/v pairs
|
Now that 0.5 is out, is there a path forward for this PR? It does use one item that is specific to 0.5. |
While working on using OrderedDict within NamedArrays, I found that the creation of new OrderedDicts was one of the slower parts of NamedArray subsetting. This PR creates an inner constructor for making a new, empty OrderedDict of a specified size. As this new OrderedDict is known to be empty and of the correct size, it can be filled in a simplified fashion. This is good for a 1.6X speedup for creating an OrderedDict with 1e3 key/value pairs.
I have added tests for the new constructor.