Skip to content

Conversation

@phaverty
Copy link

While working on using OrderedDict within NamedArrays, I found that the creation of new OrderedDicts was one of the slower parts of NamedArray subsetting. This PR creates an inner constructor for making a new, empty OrderedDict of a specified size. As this new OrderedDict is known to be empty and of the correct size, it can be filled in a simplified fashion. This is good for a 1.6X speedup for creating an OrderedDict with 1e3 key/value pairs.

I have added tests for the new constructor.

…he creation of

new OrderedDicts was one of the slower parts of NamedArray subsetting. This PR
creates an inner constructor for making a new, empty OrderedDict of a specified
size. As this new OrderedDict is known to be empty and of the correct size, it
can be filled in a simplified fashion. This is good for a 1.6X speedup for
creating an OrderedDict with 1e3 key/value pairs.

I have added tests for the new constructor.
@phaverty phaverty closed this Jul 26, 2016
@phaverty phaverty reopened this Jul 26, 2016
@phaverty
Copy link
Author

Closed and re-opened to get CI to run. Sorry for the noise. This version will be the keeper.

@codecov-io
Copy link

codecov-io commented Jul 27, 2016

Current coverage is 91.16% (diff: 88.46%)

Merging #211 into master will increase coverage by 1.25%

@@             master       #211   diff @@
==========================================
  Files            27         27          
  Lines          2288       2297     +9   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           2057       2094    +37   
+ Misses          231        203    -28   
  Partials          0          0          

Powered by Codecov. Last update ccc1e2f...7fa0e05

@phaverty
Copy link
Author

CI failure is on nightly only and is due to absence of Base.complement in 0.5.

@phaverty phaverty closed this Jul 27, 2016
@phaverty phaverty reopened this Jul 27, 2016
@phaverty
Copy link
Author

CI passes on both 0.5 and 0.4 now after the "complement" fix.

end
function OrderedDict(kv)
h = OrderedDict{K,V}()
h = OrderedDict{K,V}(length(kv))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces an assumption that kv supports length.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for having a look. I assumed that there had to be a 'length' method
for 'kv' because it was used in a for loop

for (k,v) in kv

, but I guess 'for' only requires the three functions necessary for an
iterator. Do I want/need to do some kind of Traits thing to dispatch on
whether or not 'kv' has 'length' and then fall back to the old version if
it does not?

While I've got you on the line, do you have any thoughts about having an
inner constructor that makes an empty OrderedDict that can only be used in
the context of these filling functions? Perhaps it would be better to
copy/paste

slotsz = max(16, (n*3)>>1 )
h = new(zeros(Int32,slotsz), Array(K,n), Array(V,n), 0, false)

into the two relevant places.

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty@gene.com

On Wed, Jul 27, 2016 at 2:45 PM, Keno Fischer notifications@github.com
wrote:

In src/ordered_dict.jl
#211 (comment)
:

 function OrderedDict(kv)
  •    h = OrderedDict{K,V}()
    
  •    h = OrderedDict{K,V}(length(kv))
    

This introduces an assumption that kv supports length.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/JuliaLang/DataStructures.jl/pull/211/files/6e26b3fd7faa3ad1303fec37e180b3adcfe9171c#r72530227,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AH02K9RnHMKNrLSC7RY_I4MrxsB7uY6Uks5qZ9GPgaJpZM4JVhGj
.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HasLength is a great idea, thanks. However, I can't see a way to use that
for 0.4. Are infinite iterators really that common? Perhaps it is better
to fail for rare iterators than to require 0.5?

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty@gene.com

On Wed, Jul 27, 2016 at 3:22 PM, Kristoffer Carlsson <
notifications@github.com> wrote:

In src/ordered_dict.jl
#211 (comment)
:

 function OrderedDict(kv)
  •    h = OrderedDict{K,V}()
    
  •    h = OrderedDict{K,V}(length(kv))
    

There already is Base.HasLength():
https://github.com/JuliaLang/julia/blob/bcc2121dc31647fc0999d09c10df91d35f216838/base/generator.jl#L44


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/JuliaLang/DataStructures.jl/pull/211/files/6e26b3fd7faa3ad1303fec37e180b3adcfe9171c#r72535317,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AH02KxAQMCfHP-GTtvtIQxZaBHyfI-Mrks5qZ9oMgaJpZM4JVhGj
.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just retain the old version for 0.4?

Peter M. Haverty added 4 commits July 27, 2016 21:36
…back for infinite iterators is the original loop over k/v pairs
…back for infinite iterators is the original loop over k/v pairs
…back for infinite iterators is the original loop over k/v pairs
@phaverty
Copy link
Author

phaverty commented Oct 4, 2016

Now that 0.5 is out, is there a path forward for this PR? It does use one item that is specific to 0.5.

@phaverty phaverty closed this Oct 6, 2016
@phaverty phaverty reopened this Oct 6, 2016
@phaverty phaverty closed this Nov 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants