This commit adds two new data structures to the library: SortedMultiDict #88

StephenVavasis · 2015-04-29T22:31:50Z

and SortedSet. See README.rst for more info.

StephenVavasis · 2015-05-03T21:39:04Z

Jiahao,

I will remove the println's (I assume you mean for "test1", "test2" etc.) in the next commit. As for the inbound statements, I put them in the inner routines in balancedTree.jl; these routines are not supposed to be called directly by the user but instead are called by the wrapper routines which provide their own correctness checks. However, as you say, there is not much impact on performance so it is no problem to remove them. Finally, I didn't understand the point of declaring allowdups=false; sometimes checkcorrectness is called with allowdups=true and other times false, so I'm not clear on the benefit of setting a default value.

jiahao · 2015-05-03T22:32:43Z

Just to clarify,

f(x=0) = x+1

defines two methods

f(x) = x+1
f() = f(0)

Specifying a default simply allows users to not have to specify that argument always. It doesn't mean that the variable always has the default variable.

StephenVavasis · 2015-05-04T00:46:44Z

Jiahao,

This is fine; I can also add the default argument to checkcorrectness in the next commit. Thanks for taking the time to read my code.

Just so you know, I have in mind to make one more major commit to this sub-package (sorted containers), namely, I would like to use a cache-optimized tree structure. Right now I'm looking at the 2005 SIAM J. Comput. paper by Bender, Demaine and Farach-Colton, which appears to describe an appropriate structure.

-- Steve

StephenVavasis · 2015-05-08T20:36:28Z

Jiahao,

Should I make the updates discussed in your previous messages and submit another pull request, or are you still going through the code?

jiahao · 2015-05-08T20:41:17Z

Go ahead with the updates; I haven't read more since.

StephenVavasis · 2015-05-09T01:22:19Z

Jiahao,

I have made a new commit and pull request addressing the topics discussed. I would like to mention also that the DataStructures test set failed in the latest build of Julia (commit 3ab9af1*). It failed in the test_ordereddict with an UndefVarError, line 48, 'Ranges not defined'. This is not my code so I won't attempt to fix it (unless you want me to try).

StephenVavasis · 2015-05-15T01:37:22Z

Jiahao,

The discussion concerning issue #93 shows that the sorted containers need more work. In particular, the whole token interface has a performance issue, so I will need to create a higher-performance semitoken interface that will involve several new functions. So please put this pull request on hold until I make another commit with the new semitoken interface.

-- Steve

kmsquire · 2015-05-15T05:47:58Z

Hi Stephen, the standard way to do this is to edit the title and add "WIP" (work in progress) to the front.

StephenVavasis · 2015-05-29T04:41:37Z

The branch called 'SortedContainers' in https://github.com/StephenVavasis/DataStructures.jl has a new commit that is ready to be merged into the master; I have fixed the performance issue and switched to a semitoken-based interface. I'm not sure if I need to submit a new pull request since there is already a pending pull request from that branch.

tkelman · 2015-05-29T07:18:11Z

See at the top of the page,

StephenVavasis wants to merge 8 commits into JuliaLang:master from StephenVavasis:SortedContainers

So any additional commits you push to your SortedContainers branch at https://github.com/StephenVavasis/DataStructures.jl will automatically be added to this same pull request. According to Travis CI, there's an error when running on Julia 0.3. Do you have a local copy of Julia 0.3 where you could run Pkg.test("DataStructures") with this branch checked out in ~/.julia/v0.3/DataStructures ?

StephenVavasis · 2015-05-29T16:39:15Z

@tkelman, sorry, I should have run the tests in 0.3 to notice that I forgot some compat declarations. This is fixed now in the latest commit on the SortedContainers branch. By the way, the package testing gives a deprecation warning concerning OrderedDict (not my code) in both 0.3 and 0.4.

kmsquire · 2015-05-30T05:31:10Z

src/containerloops.jl

What does SC stand for?

I suppose it stands for Sorted Container, which is a bit redundant, isn't
it? I can fix this name.

-- Steve

On Fri, 29 May 2015, Kevin Squire wrote:

In src/containerloops.jl:

@@ -0,0 +1,240 @@
+import Base.keys
+import Base.values
+
+## These are the containers that can be looped over
+## The suffix 0 is for SortedDict and SortedMultiDict
+## The suffix 1 is for SortedSet. No suffix means any of the
+## three.
+
+typealias SCContainers0 Union(SortedDict, SortedMultiDict)
+typealias SCContainers Union(SCContainers0, SortedSet)

What does SC stand for?

—
Reply to this email directly or view it on GitHub.[AHhEh8nA6KxKxoW-hT2ff45Uu3MfoInQks5oOUKfgaJpZM4EMDTc.gif]

Okay. Rather than suffixing types and functions with 0 or 1, would you mind being slightly more descriptive? It's great that you described the naming convention in a comment (and your descriptive comments overall are very much appreciated), but it would be nice to be able to look at the code and immediately understand what it means.

Of course, the names themselves do get pretty long at some point... perhaps an SDD prefix for SortedDict+SortedMultiDict and and SS for SortedSet? I'm not wedded to this, so if you have a better idea, go for it.

kmsquire · 2015-05-30T06:18:40Z

Although not documented (AFAICT--cc: @vtjnash), the new Ref and RefValue types are dereferenced using x[] (i.e. getindex(x::RefValue)), as

ulia> x = 1
1

julia> r = Ref(x)
Base.RefValue{Int64}(1)

julia> r[]
1

julia> typeof(r[])
Int64

Because it's conceptually the same operation as deref(x::Token), we should probably have a similar shorthand for tokens. (In truth, I find it a little strange, but it'll probably stick.)

kmsquire · 2015-05-30T06:24:39Z

README.rst

For Julia 0.4, the standard way to create dictionaries and arrays (edit: and sets) will be to explicitly supply the types, as

julia> Dict{Int, String}() Dict{Int64,AbstractString} with 0 entries julia> Array{Int,2}(2,3) 2x3 Array{Int64,2}: 21474836485 0 0 1 0 0 julia> Vector{Float64}() 0-element Array{Float64,1} julia> Set{UTF8String}() Set{UTF8String}()

@kmsquire and @vtjnash,

I looked at refpointer.jl, and here is my interpretation of the purpose of this code. The idea is to be able to pass an immutable object, say x, a Float64, to a function f and yet the function can still mutate it. This is accomplished as follows. If x is bare Float64, then it is wrapped in a type. If x is an element of an array a at position i, then (a,i) are passed to f. However, f does not need to know what is the source of x since its formal argument can be Ref{Float64}, which covers both cases.

I can certainly extend this to the three sorted containers (@kmsqure: I'm not sure this is directly related to the deref operation in my code, but maybe you can explain more) I have a couple of questions about this construction.

(1) Is there a plan to extend this to associative? In other words, if D is of type Dict{ASCIIString,Float} and k is a string (key), then one could say x = RefDict(D,k) to get another Ref{Float} object? If so, is the extension to associative already written?

(2) You are probably already aware of this, but there is a performance issue with this construct. Specifically, if one says x = Ref(a,i) where a is an array and i is a subscript, then the new object x is created on the heap rather than the stack (even though internally it consists of just a pointer and an integer), so you pay a price to use this in an inner loop.

I believe this was in response to this comment: #88 (comment)

See my response there.
(On github, it's attached inline to an unrelated comment).

kmsquire · 2015-05-30T18:13:56Z

From #88 (comment):

I looked at refpointer.jl, and here is my interpretation of the purpose of this code. The idea is to be able to pass an immutable object, say x, a Float64, to a function f and yet the function can still mutate it. This is accomplished as follows. If x is bare Float64, then it is wrapped in a type. If x is an element of an array a at position i, then (a,i) are passed to f. However, f does not need to know what is the source of x since its formal argument can be Ref{Float64}, which covers both cases.

Sorry, I really should have been more specific. I was only suggesting that you add

getindex(t::SDToken) = deref(t)

so that t[] becomes an alias for deref(t).

Sorry to lead you down a rabbit hole!

StephenVavasis · 2015-06-08T02:50:23Z

@kmsquire,

I have uploaded a new commit of DataStructures that addresses your comments. Unfortunately, it doesn't run under 0.3.9 according to Travis because I made a reference to Base.Ref, which is not in 0.3.9. I should have protected the references to Base.Ref with some version checks. And compounding this mistake, I am not currently able to install 0.3.9 to fix and test this myself because of

JuliaLang/julia#11612

So before you look at this commit, please wait a day or two until I can figure out how to install 0.3.9 and then update my code with version-checks so that it passes Travis.

Meanwhile, I'll just mention again to @vtjnash that I have implemented Ref{T} for SortedDict and SortedMultiDict in this commit even though AFAIK nobody yet has implemented Ref{T} for the base Dict{K,T}. Ideally, someone will implement it for Dict soon so that I can make my implementation compatible.

kmsquire · 2015-06-08T03:29:41Z

Roger that. Just ping again when it's ready for review.

StephenVavasis · 2015-06-08T16:06:46Z

Kevin, I fixed the 0.3.9 compatibility issue, so the package is ready for another round of reviewing. I did not yet address the two related pull requests #90 and #99.

kmsquire · 2015-06-26T08:52:12Z

README.rst

While I remember the discussion where you found this to be true, I'm find this description challenging to understand. In particular, this syntax seems to indicate that I can pass a tuple (m,s) to deref, yet I don't see that function anywhere--did I miss it? Can you point to the original discussion where you discovered this?

StephenVavasis · 2015-06-28T03:50:51Z

Kevin,

Thanks for the discussion. A couple of points: First, I have redefined the token type for this pull request to be a tuple (m,s) where m is the container and s is the semitoken. In the previous version of my code, token was a small immutable structure. This redefinition is precisely so that deref((m,s)) and similar functions can execute without necessarily creating an auxiliary object on the heap to hold the token. However, the auxiliary object is created if the code is invoked like this:

   t = (m,s)
   k,v = deref(t)

rather than like this

   k,v = deref((m,s))

Second, with regard to Ref{T}, I found the documentation of Ref{T} in the manual (I think this documentation may be new; I couldn't find it when I looked last month) and re-read the code in refpointer.jl. I think now that my implementation is incomplete because I also need to write an 'unsafe_convert' operation so that my ref_value would work in ccall. .

But perhaps it is premature to include this feature at all? It is not yet implemented for the built-in Dict, so maybe it is best for me to postpone including it until it is implemented for Dict.

Also, regarding Ref{T}, it is true that the manual classifies it as part of the ccall interface, but as far as I can see, it also provides a useful mechanism for Julia itself. In particular, it can be used to pass a Float64 (for example) to a function in a way so that the function can store its answer at the passed-in Float64 and pass it back to the caller. This works even though the called function does not know whether the passed-in location is a bare Float64 or a Float64 that occurs inside some larger structure.

kmsquire · 2015-06-29T02:15:42Z

Hi Stephen, thanks for the explanation. I had missed the change of Tokens from immutable types to tuples. In truth, I like find the immutable type version clearer, but I understand the reasoning.

Regarding the Ref type, thank you for your thoughts. I knew of the introduction of Ref, which is why I mentioned it, but I hadn't considered its use beyond ccall. It makes sense that it could be used in this way, and since you have a strong C++ background, it makes sense that you would gravitate toward this use.

I'm going to agree with your last statement, though--for now, let's wait on this until the Ref implementation shakes out a little more. In one case (JuliaLang/julia#11683), uses of Ref were reverted because of performance issues, and it's unclear on the recommended use with regards to Julia code at this point. But by all means, if use of it is introduced for Dict objects, it should certainly be introduced here.

kmsquire · 2015-07-15T04:35:10Z

src/tokens2.jl

Just curious, is there a reason you chose to use keytype and datatype instead of writing this as (the more standard)

@inline function setindex!{K,V}(m::SortedDict{K,V}, d_, i::IntSemiToken) has_data((m,i)) m.bt.data[i.address] = KDRec{K,V}(m.bt.data[i.address].parent, m.bt.data[i.address].k, convert(V,d_))

Either way, setindex! would be specialized for the different K and V types, (and there's no need to capture the Ordering if you don't use it).

Would you mind changing this (and the other occurrences indicated below) to this style?

kmsquire · 2015-07-15T04:42:35Z

src/sortedDict.jl

Even if you make the changes I suggested, I did see that this might be useful when using typealiases, such as SDMContainer, since those aren't parameterized. I'll note that these can be written as

@inline keytype{K}(m::SortedDict{K}) = K @inline datatype{K,D}(m::SortedDict{K,D}) = D

Minor: For consistency with the other dictionary types (here and in Base), the D type parameter is typically written as V for value. But I don't care if that is changed here--I'd rather get this finished and in soon.

StephenVavasis · 2015-07-19T01:43:58Z

Kevin,

I did some small-scale checking, and it appears that the parameterized convert(K,k_) version and the nonparameterized convert(keytype(m), k_) version are generating the same code. I used the latter partly because in an earlier round of reviewing you made a comment that too many parameters make the code hard to read, and I figured this was a way to reduce the number of parameters.

Meanwhile, the idea of using small functions returning a type rather than explicitly spelling out all the parameters does have a precedent in Julia. For example, here is a snippet from dict.jl:

_tt1{A,B}(::Type{Tuple{A,B}}) = A
_tt2{A,B}(::Type{Tuple{A,B}}) = B
eltype{D}(::Type{KeyIterator{D}}) = _tt1(eltype(D))
eltype{D}(::Type{ValueIterator{D}}) = _tt2(eltype(D))

The most recent commit on the SortedContainres branch is failing because I forgot to update test_sortedcontainers to match the change in the eltype function. I am about to post a new commit that fixes this.

kmsquire · 2015-07-19T04:04:26Z

Ok, fair enough--this is an improvement over having all three type parameters. I personally find the version with one type parameter cleaner still, but I won't hold you to it.

kmsquire · 2015-07-19T04:10:12Z

@yuyichao, if you have time, it would be great if this PR got another set of eyes. I try to review it as I find the time, but I usually only find the time to take it in small parts, Stephen makes changes responding to my comments, and I look at it again a week or two later. It goes pretty slowly.

yuyichao · 2015-07-19T05:17:34Z

Unfortunately I don't think I'm qualified to review this.

I don't see anything obviously wrong and I basically don't know anything about data structures... (The only time I learnt about it is in a one semester undergrad level course......)

ScottPJones · 2015-07-19T15:37:55Z

@kmsquire If you don't mind me being the extra set of eyes, I'll go over it this afternoon.
This is useful stuff for me anyway, and I've done a lot of this sort of thing before (although in C).

kmsquire · 2015-07-20T01:42:46Z

@ScottPJones, sure that would be fine. Generally, I believe the algorithm to be fine. What I'm mostly looking for are things which might be done more efficiently, or done in a more Julian way. Or documentation that might be unclear. If you see any of that, can you highlight it? When you get the chance, of course.

ScottPJones · 2015-07-20T02:20:17Z

Done more efficiently, I can handle, done in a more Julian way, well, I'll try to apply all that I've learned from you all over the last three months! I've already started going over it.

kmsquire · 2015-07-20T02:22:36Z

@StephenVavasis, at this point, it would probably be good if you went through the inline comments at the github page with your changes, and review to see which comments you have or have not addressed.

(Sometimes it's not obvious, because you respond mostly by email, but this is the interface used to review your changes.)

Once all looks good, it would then be good to squash the commits (hopefully, @yuyichao's directions on your other PR helped).

StephenVavasis · 2015-07-21T07:30:52Z

Kevin,

I've gone through the comments in this thread to make sure they have been addressed. One comment you made on May 30 implies that you would like to see more constructors for SortedDict. In particular, you seem to be implying that SortedDict should implement the same constructors as Dict. Is this correct? If so, I can write more constructors and make another commit. Aside from that, I think I have addressed all the other comments. Similarly, I can also make SortedSet implement the same constructors as Set.

-- Steve

kmsquire · 2015-07-21T14:47:56Z

Perfect. Why don't you squash the commits, make sure the patch applies, and we'll merge. We can address anything else in future commits.

kmsquire · 2015-07-21T14:50:18Z

And yes, it would be good if SortedDict and SortedSet had the same kinds of constructors as those in base.

kmsquire · 2015-07-21T15:01:06Z

Also, Jeffrey Sarnoff left some feedback on the julia-dev mailing list, in a thread where I asked for help reviewing this PR. I'll post a link later if you need it.

StephenVavasis · 2015-07-22T10:47:04Z

Kevin,

I have made the documentation edits suggested by Jeffrey Sarnoff and squashed the commits. I decided to postpone writing the additional constructors because there are some subteties that need more thought. For example, the following is a valid Dict: d = Dict(4=>"a", 5.0=>"b") even though the first key is an Int and the second is a Float64. How should the corresponding call to the SortedDict constructor behave? I need to mull this over.

-- Steve

yuyichao · 2015-07-22T10:53:13Z

julia> @which Dict(4=>"a", 5.0=>"b")
call{V}(::Type{Dict{K,V}}, ps::Pair{K,V}...) at dict.jl:427

And in general you should refer to the Dict constuctors in dict.jl

StephenVavasis · 2015-07-22T12:57:26Z

Yichao,

Yes, I will try to mimic all the constructors in set.jl and dict.jl. But meanwhile, maybe we can proceed with this pull request without the additional constructors. Also, after this pull request is done I'll return to correcting the functionality for 'in(k, keys(...))'.

-- Steve

yuyichao · 2015-07-22T13:02:33Z

I agree and just to be clear, I'm not saying you should implement that before this is merged, just want to point you to the relavant code you should mirror.

I think @kmsquire should decide when to merge this although you may need a rebase before that.

kmsquire · 2015-07-22T17:29:32Z

Hi Stephen, thanks for squashing. You still have 2 commits, which is fine, but even merging these together would be good.

As @yuyichao pointed out, can you also rebase on the current master?

You might be familiar with this (or you might not), but just in case, this should look something like

git fetch
git rebase origin/master
# edit any files with merge conflicts
git add <file_which_had_conflict.jl> <other_file_which_had_conflict>
git rebase --continue
git push -f <your repo> SortedContainers  # force push is needed

For merge conflicts, I usually edit by hand, but you might look at http://stackoverflow.com/questions/161813/fix-merge-conflicts-in-git for inspiration.

@inbounds

and SortedSet. See README.rst for more info. Deleted @inbounds; deleted extraneous println; inserted default argument allowdups as per Jiahao's suggestions. Rewrote container loops to partially address issue number 93 (excessive heap allocation) Corrected tiny typo Fixed more bugs in containerloops All sorted containers updated with semitoken interface instead of token interface As mentioned in issue 93 and ensuing discussion, the previous 'token' interface for indexing containers suffers from a performance issue. It has been replaced with a semitoken interface. Fix error on 0.4-dev, allow running tests without installing Fixed some documentation typos regarding new semitoken interface. Add @compat declarations for backwards compatibility with Julia 0.3 This commit addresses Kevin Squire's recent comments. Edits to previous commit to make package compatible with Julia 0.3.9 Remove ref_value and Ref{T} type from source, test, and documentation Added 'eltype' functions for all container loops; fixed old-style tuple-type eltypes Fix the statements that test eltype Edit README.rst according to comments by Jeffrey Sarnoff

StephenVavasis · 2015-07-24T21:41:28Z

I have rebased to the most recent master. -- Steve

kmsquire · 2015-08-06T11:57:46Z

Merging. As always, sorry for the delay. Thanks for your work and your patience, @StephenVavasis.

This commit adds two new data structures to the library: SortedMultiDict

IainNZ mentioned this pull request May 8, 2015

add Multidict #90

Closed

kmsquire reviewed May 30, 2015
View reviewed changes

StephenVavasis mentioned this pull request Jun 7, 2015

On 0.4 allow SortedDicts to be constructed like regular Dicts #99

Closed

kmsquire referenced this pull request in StephenVavasis/DataStructures.jl Jun 26, 2015

This commit addresses Kevin Squire's recent comments.

91b92c5

kmsquire reviewed Jun 26, 2015
View reviewed changes

kmsquire reviewed Jul 15, 2015
View reviewed changes

kmsquire mentioned this pull request Jul 19, 2015

keys don't seem to be properly typed #103

Closed

StephenVavasis force-pushed the SortedContainers branch from 7e42fa9 to ece4940 Compare July 22, 2015 10:43

StephenVavasis force-pushed the SortedContainers branch from ece4940 to 41b31d3 Compare July 24, 2015 21:29

kmsquire added a commit that referenced this pull request Aug 6, 2015

Merge pull request #88 from StephenVavasis/SortedContainers

013e57e

This commit adds two new data structures to the library: SortedMultiDict

kmsquire merged commit 013e57e into JuliaCollections:master Aug 6, 2015

This commit adds two new data structures to the library: SortedMultiDict #88

This commit adds two new data structures to the library: SortedMultiDict #88

Uh oh!

Conversation

StephenVavasis commented Apr 29, 2015

Uh oh!

StephenVavasis commented May 3, 2015

Uh oh!

jiahao commented May 3, 2015

Uh oh!

StephenVavasis commented May 4, 2015

Uh oh!

StephenVavasis commented May 8, 2015

Uh oh!

jiahao commented May 8, 2015

Uh oh!

StephenVavasis commented May 9, 2015

Uh oh!

StephenVavasis commented May 15, 2015

Uh oh!

kmsquire commented May 15, 2015

Uh oh!

StephenVavasis commented May 29, 2015

Uh oh!

tkelman commented May 29, 2015

Uh oh!

StephenVavasis commented May 29, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmsquire commented May 30, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmsquire commented May 30, 2015

Uh oh!

StephenVavasis commented Jun 8, 2015

Uh oh!

kmsquire commented Jun 8, 2015

Uh oh!

StephenVavasis commented Jun 8, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StephenVavasis commented Jun 28, 2015

Uh oh!

kmsquire commented Jun 29, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StephenVavasis commented Jul 19, 2015

Uh oh!

kmsquire commented Jul 19, 2015

Uh oh!

kmsquire commented Jul 19, 2015

Uh oh!

yuyichao commented Jul 19, 2015

Uh oh!

ScottPJones commented Jul 19, 2015

Uh oh!

kmsquire commented Jul 20, 2015

Uh oh!

ScottPJones commented Jul 20, 2015

Uh oh!

kmsquire commented Jul 20, 2015

Uh oh!

StephenVavasis commented Jul 21, 2015

Uh oh!

kmsquire commented Jul 21, 2015

Uh oh!