Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Allow sorting on OrderedDicts #225

Merged
merged 1 commit into from
Nov 18, 2016
Merged

RFC: Allow sorting on OrderedDicts #225

merged 1 commit into from
Nov 18, 2016

Conversation

kmsquire
Copy link
Member

  • Also define sort(d::Dict) to yield an OrderedDict

Supersedes #43

@kmsquire
Copy link
Member Author

Cc: @femtotrader

@quinnj
Copy link
Contributor

quinnj commented Nov 14, 2016

Shouldn't this produce a SortedDict instead? Ordered refers to insertion order right? Not any kind of sort ordering?

@codecov-io
Copy link

codecov-io commented Nov 14, 2016

Current coverage is 92.21% (diff: 92.30%)

Merging #225 into master will increase coverage by <.01%

@@             master       #225   diff @@
==========================================
  Files            28         29     +1   
  Lines          2338       2351    +13   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           2156       2168    +12   
- Misses          182        183     +1   
  Partials          0          0          

Powered by Codecov. Last update 5d53946...a7ff9fa

@kmsquire
Copy link
Member Author

@quinnj, that's an option if you only want to sort by keys, and you can already do this with:

julia> using DataStructures

julia> d = OrderedDict(zip(reverse('a':'z'), 26:-1:1));

julia> SortedDict(d)
DataStructures.SortedDict{Char,Int64,Base.Order.ForwardOrdering} with 26 entries:
  'a' => 1
  'b' => 2
  'c' => 3
  'd' => 4
  'e' => 5
  'f' => 6
  'g' => 7
  'h' => 8
  'i' => 9
  'j' => 10
  'k' => 11
  'l' => 12
  'm' => 13
  'n' => 14
  'o' => 15
  'p' => 16
  'q' => 17
  'r' => 18
  's' => 19
  't' => 20
  'u' => 21
  'v' => 22
  'w' => 23
  'x' => 24
  'y' => 25
  'z' => 26

However:

  1. this code also let's one sort by value, and an OrderedDict is the easiest structure to store the output in.
  2. it may be more efficient to insert a large number of keys, followed by sorting, rather than using a SortedDict. (Untested, and probably depends on a number of factors.)

As for the meaning of Ordered, the default ordering is, of course, insertion order, but I think it can also be taken to mean "maintaining order" (ala arrays). Of course, it doesn't support all operations that arrays do.

@femtotrader seems to have a use for this, so perhaps he can comment on his use case.

@kmsquire
Copy link
Member Author

kmsquire commented Nov 14, 2016

Julia nightly is timing out on linux. All other builds pass.

@femtotrader
Copy link
Contributor

femtotrader commented Nov 15, 2016

OrderedDict is insertion order ordered and so can be reordered in many ways.

but I wonder if there an associated key/value datastructure which is value ordered? (because SortedDict is key ordered)

This will probably be much more efficient when dict have a huge number of keys.

Use case is shown in JuliaStats/StatsBase.jl#223

@kmsquire
Copy link
Member Author

I haven't looked at the code recently, but it should be possible to modify a SortedDict to sort by value instead of key.

I still think the code here is useful, though.

@kmsquire kmsquire changed the title Allow sorting on OrderedDicts RFC: Allow sorting on OrderedDicts Nov 15, 2016
@femtotrader
Copy link
Contributor

+1 I also think the code here is useful (even if it's less efficient than using an appropriate datastructure).

Having a SortedDict by values is also something that should be implemented.

Should I open an issue for that?

@StephenVavasis
Copy link
Contributor

A data structure that supports lookup by keys or values could be implemented via two SortedDict's, one for keys=>values and the other values=>keys.

However, I'm skeptical about adding this data structure to the library DataStructures.jl. The reason for my skepticism is that there are many variants on this idea, and they would each require a different implementation. For example, perhaps keys=>values would be a SortedDict while values=>keys would be a SortedMultiDict (if multiple keys yield the same value). Or maybe one direction would be a Dict if you need support only for lookup but not for ordering. Furthermore, if each (key,value) pair has additional nonindexable data associated, i.e., there are actually triples of the form (key,value,otherstuff) then this would lead to further implementation choices regarding where the other stuff is stored. It seems to me that each implementation choice involves a different piece of code in DataStructures.jl. Every time a new piece of code is added to DataStructures.jl, a whole bunch of boilerplate code (eltype(), eachindex(), isempty(), etc) needs to be written, and all of this needs testing. This is a lot code to write!

@kmsquire
Copy link
Member Author

kmsquire commented Nov 16, 2016

@StephenVavasis, I think the recommendation was that the dictionary be sorted by value for iteration purposes. You are right that this would require a different data structure (e.g., a hash which points into a tree sorted by value), and this isn't the simple change I suggested above.

* Also define sort(Dict) to yield an OrderedDict
@kmsquire
Copy link
Member Author

Tests are passing. I plan to merge this later today PST (e.g., in about 4-6 hours) if there aren't any further comments.
Cc: @rawls238 @DanielArndt

@DanielArndt
Copy link
Collaborator

Wish I could give this the time it deserves.. but a took a quick look through and it looks good to me!

@kmsquire kmsquire merged commit af2c97e into master Nov 18, 2016
@kmsquire kmsquire deleted the kms/dict_sorting branch November 18, 2016 03:28
@kmsquire
Copy link
Member Author

@DanielArndt, thanks for the quick look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants