Skip to content

Conversation

StephenVavasis
Copy link
Contributor

and SortedSet. See README.rst for more info.

@StephenVavasis
Copy link
Contributor Author

Jiahao,

I will remove the println's (I assume you mean for "test1", "test2" etc.) in the next commit. As for the inbound statements, I put them in the inner routines in balancedTree.jl; these routines are not supposed to be called directly by the user but instead are called by the wrapper routines which provide their own correctness checks. However, as you say, there is not much impact on performance so it is no problem to remove them. Finally, I didn't understand the point of declaring allowdups=false; sometimes checkcorrectness is called with allowdups=true and other times false, so I'm not clear on the benefit of setting a default value.

@jiahao
Copy link
Contributor

jiahao commented May 3, 2015

Just to clarify,

f(x=0) = x+1

defines two methods

f(x) = x+1
f() = f(0)

Specifying a default simply allows users to not have to specify that argument always. It doesn't mean that the variable always has the default variable.

@StephenVavasis
Copy link
Contributor Author

Jiahao,

This is fine; I can also add the default argument to checkcorrectness in the next commit. Thanks for taking the time to read my code.

Just so you know, I have in mind to make one more major commit to this sub-package (sorted containers), namely, I would like to use a cache-optimized tree structure. Right now I'm looking at the 2005 SIAM J. Comput. paper by Bender, Demaine and Farach-Colton, which appears to describe an appropriate structure.

-- Steve

@IainNZ IainNZ mentioned this pull request May 8, 2015
@StephenVavasis
Copy link
Contributor Author

Jiahao,

Should I make the updates discussed in your previous messages and submit another pull request, or are you still going through the code?

@jiahao
Copy link
Contributor

jiahao commented May 8, 2015

Go ahead with the updates; I haven't read more since.

@StephenVavasis
Copy link
Contributor Author

Jiahao,

I have made a new commit and pull request addressing the topics discussed. I would like to mention also that the DataStructures test set failed in the latest build of Julia (commit 3ab9af1*). It failed in the test_ordereddict with an UndefVarError, line 48, 'Ranges not defined'. This is not my code so I won't attempt to fix it (unless you want me to try).

@StephenVavasis
Copy link
Contributor Author

Jiahao,

The discussion concerning issue #93 shows that the sorted containers need more work. In particular, the whole token interface has a performance issue, so I will need to create a higher-performance semitoken interface that will involve several new functions. So please put this pull request on hold until I make another commit with the new semitoken interface.

-- Steve

@kmsquire
Copy link
Member

Hi Stephen, the standard way to do this is to edit the title and add "WIP" (work in progress) to the front.

@StephenVavasis
Copy link
Contributor Author

The branch called 'SortedContainers' in https://github.com/StephenVavasis/DataStructures.jl has a new commit that is ready to be merged into the master; I have fixed the performance issue and switched to a semitoken-based interface. I'm not sure if I need to submit a new pull request since there is already a pending pull request from that branch.

@tkelman
Copy link
Contributor

tkelman commented May 29, 2015

See at the top of the page,

StephenVavasis wants to merge 8 commits into JuliaLang:master from StephenVavasis:SortedContainers

So any additional commits you push to your SortedContainers branch at https://github.com/StephenVavasis/DataStructures.jl will automatically be added to this same pull request. According to Travis CI, there's an error when running on Julia 0.3. Do you have a local copy of Julia 0.3 where you could run Pkg.test("DataStructures") with this branch checked out in ~/.julia/v0.3/DataStructures ?

@StephenVavasis
Copy link
Contributor Author

@tkelman, sorry, I should have run the tests in 0.3 to notice that I forgot some compat declarations. This is fixed now in the latest commit on the SortedContainers branch. By the way, the package testing gives a deprecation warning concerning OrderedDict (not my code) in both 0.3 and 0.4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does SC stand for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it stands for Sorted Container, which is a bit redundant, isn't
it? I can fix this name.

-- Steve

On Fri, 29 May 2015, Kevin Squire wrote:

In src/containerloops.jl:

@@ -0,0 +1,240 @@
+import Base.keys
+import Base.values
+
+## These are the containers that can be looped over
+## The suffix 0 is for SortedDict and SortedMultiDict
+## The suffix 1 is for SortedSet. No suffix means any of the
+## three.
+
+typealias SCContainers0 Union(SortedDict, SortedMultiDict)
+typealias SCContainers Union(SCContainers0, SortedSet)

What does SC stand for?


Reply to this email directly or view it on GitHub.[AHhEh8nA6KxKxoW-hT2ff45Uu3MfoInQks5oOUKfgaJpZM4EMDTc.gif]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Rather than suffixing types and functions with 0 or 1, would you mind being slightly more descriptive? It's great that you described the naming convention in a comment (and your descriptive comments overall are very much appreciated), but it would be nice to be able to look at the code and immediately understand what it means.

Of course, the names themselves do get pretty long at some point... perhaps an SDD prefix for SortedDict+SortedMultiDict and and SS for SortedSet? I'm not wedded to this, so if you have a better idea, go for it.

@kmsquire
Copy link
Member

Although not documented (AFAICT--cc: @vtjnash), the new Ref and RefValue types are dereferenced using x[] (i.e. getindex(x::RefValue)), as

ulia> x = 1
1

julia> r = Ref(x)
Base.RefValue{Int64}(1)

julia> r[]
1

julia> typeof(r[])
Int64

Because it's conceptually the same operation as deref(x::Token), we should probably have a similar shorthand for tokens. (In truth, I find it a little strange, but it'll probably stick.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Julia 0.4, the standard way to create dictionaries and arrays (edit: and sets) will be to explicitly supply the types, as

julia> Dict{Int, String}()
Dict{Int64,AbstractString} with 0 entries

julia> Array{Int,2}(2,3)
2x3 Array{Int64,2}:
 21474836485  0  0
           1  0  0

julia> Vector{Float64}()
0-element Array{Float64,1}

julia> Set{UTF8String}()
Set{UTF8String}()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kmsquire and @vtjnash,

I looked at refpointer.jl, and here is my interpretation of the purpose of this code. The idea is to be able to pass an immutable object, say x, a Float64, to a function f and yet the function can still mutate it. This is accomplished as follows. If x is bare Float64, then it is wrapped in a type. If x is an element of an array a at position i, then (a,i) are passed to f. However, f does not need to know what is the source of x since its formal argument can be Ref{Float64}, which covers both cases.

I can certainly extend this to the three sorted containers (@kmsqure: I'm not sure this is directly related to the deref operation in my code, but maybe you can explain more) I have a couple of questions about this construction.

(1) Is there a plan to extend this to associative? In other words, if D is of type Dict{ASCIIString,Float} and k is a string (key), then one could say x = RefDict(D,k) to get another Ref{Float} object? If so, is the extension to associative already written?

(2) You are probably already aware of this, but there is a performance issue with this construct. Specifically, if one says x = Ref(a,i) where a is an array and i is a subscript, then the new object x is created on the heap rather than the stack (even though internally it consists of just a pointer and an integer), so you pay a price to use this in an inner loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was in response to this comment: #88 (comment)

See my response there.
(On github, it's attached inline to an unrelated comment).

@kmsquire
Copy link
Member

From #88 (comment):

I looked at refpointer.jl, and here is my interpretation of the purpose of this code. The idea is to be able to pass an immutable object, say x, a Float64, to a function f and yet the function can still mutate it. This is accomplished as follows. If x is bare Float64, then it is wrapped in a type. If x is an element of an array a at position i, then (a,i) are passed to f. However, f does not need to know what is the source of x since its formal argument can be Ref{Float64}, which covers both cases.

Sorry, I really should have been more specific. I was only suggesting that you add

getindex(t::SDToken) = deref(t)

so that t[] becomes an alias for deref(t).

Sorry to lead you down a rabbit hole!

@StephenVavasis
Copy link
Contributor Author

@kmsquire,

I have uploaded a new commit of DataStructures that addresses your comments. Unfortunately, it doesn't run under 0.3.9 according to Travis because I made a reference to Base.Ref, which is not in 0.3.9. I should have protected the references to Base.Ref with some version checks. And compounding this mistake, I am not currently able to install 0.3.9 to fix and test this myself because of

JuliaLang/julia#11612

So before you look at this commit, please wait a day or two until I can figure out how to install 0.3.9 and then update my code with version-checks so that it passes Travis.

Meanwhile, I'll just mention again to @vtjnash that I have implemented Ref{T} for SortedDict and SortedMultiDict in this commit even though AFAIK nobody yet has implemented Ref{T} for the base Dict{K,T}. Ideally, someone will implement it for Dict soon so that I can make my implementation compatible.

@kmsquire
Copy link
Member

kmsquire commented Jun 8, 2015

Roger that. Just ping again when it's ready for review.

@StephenVavasis
Copy link
Contributor Author

Kevin, I fixed the 0.3.9 compatibility issue, so the package is ready for another round of reviewing. I did not yet address the two related pull requests #90 and #99.

kmsquire referenced this pull request in StephenVavasis/DataStructures.jl Jun 26, 2015
README.rst Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I remember the discussion where you found this to be true, I'm find this description challenging to understand. In particular, this syntax seems to indicate that I can pass a tuple (m,s) to deref, yet I don't see that function anywhere--did I miss it? Can you point to the original discussion where you discovered this?

@StephenVavasis
Copy link
Contributor Author

Kevin,

Thanks for the discussion. A couple of points: First, I have redefined the token type for this pull request to be a tuple (m,s) where m is the container and s is the semitoken. In the previous version of my code, token was a small immutable structure. This redefinition is precisely so that deref((m,s)) and similar functions can execute without necessarily creating an auxiliary object on the heap to hold the token. However, the auxiliary object is created if the code is invoked like this:

   t = (m,s)
   k,v = deref(t)

rather than like this

   k,v = deref((m,s))

Second, with regard to Ref{T}, I found the documentation of Ref{T} in the manual (I think this documentation may be new; I couldn't find it when I looked last month) and re-read the code in refpointer.jl. I think now that my implementation is incomplete because I also need to write an 'unsafe_convert' operation so that my ref_value would work in ccall. .

But perhaps it is premature to include this feature at all? It is not yet implemented for the built-in Dict, so maybe it is best for me to postpone including it until it is implemented for Dict.

Also, regarding Ref{T}, it is true that the manual classifies it as part of the ccall interface, but as far as I can see, it also provides a useful mechanism for Julia itself. In particular, it can be used to pass a Float64 (for example) to a function in a way so that the function can store its answer at the passed-in Float64 and pass it back to the caller. This works even though the called function does not know whether the passed-in location is a bare Float64 or a Float64 that occurs inside some larger structure.

@kmsquire
Copy link
Member

Hi Stephen, thanks for the explanation. I had missed the change of Tokens from immutable types to tuples. In truth, I like find the immutable type version clearer, but I understand the reasoning.

Regarding the Ref type, thank you for your thoughts. I knew of the introduction of Ref, which is why I mentioned it, but I hadn't considered its use beyond ccall. It makes sense that it could be used in this way, and since you have a strong C++ background, it makes sense that you would gravitate toward this use.

I'm going to agree with your last statement, though--for now, let's wait on this until the Ref implementation shakes out a little more. In one case (JuliaLang/julia#11683), uses of Ref were reverted because of performance issues, and it's unclear on the recommended use with regards to Julia code at this point. But by all means, if use of it is introduced for Dict objects, it should certainly be introduced here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is there a reason you chose to use keytype and datatype instead of writing this as (the more standard)

@inline function setindex!{K,V}(m::SortedDict{K,V},
                                d_,
                                i::IntSemiToken)
     has_data((m,i))
     m.bt.data[i.address] = KDRec{K,V}(m.bt.data[i.address].parent,
                                       m.bt.data[i.address].k, 
                                       convert(V,d_))

Either way, setindex! would be specialized for the different K and V types, (and there's no need to capture the Ordering if you don't use it).

Would you mind changing this (and the other occurrences indicated below) to this style?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if you make the changes I suggested, I did see that this might be useful when using typealiases, such as SDMContainer, since those aren't parameterized. I'll note that these can be written as

@inline keytype{K}(m::SortedDict{K}) = K
@inline datatype{K,D}(m::SortedDict{K,D}) = D

Minor: For consistency with the other dictionary types (here and in Base), the D type parameter is typically written as V for value. But I don't care if that is changed here--I'd rather get this finished and in soon.

@StephenVavasis
Copy link
Contributor Author

Kevin,

I did some small-scale checking, and it appears that the parameterized convert(K,k_) version and the nonparameterized convert(keytype(m), k_) version are generating the same code. I used the latter partly because in an earlier round of reviewing you made a comment that too many parameters make the code hard to read, and I figured this was a way to reduce the number of parameters.

Meanwhile, the idea of using small functions returning a type rather than explicitly spelling out all the parameters does have a precedent in Julia. For example, here is a snippet from dict.jl:

_tt1{A,B}(::Type{Tuple{A,B}}) = A
_tt2{A,B}(::Type{Tuple{A,B}}) = B
eltype{D}(::Type{KeyIterator{D}}) = _tt1(eltype(D))
eltype{D}(::Type{ValueIterator{D}}) = _tt2(eltype(D))

The most recent commit on the SortedContainres branch is failing because I forgot to update test_sortedcontainers to match the change in the eltype function. I am about to post a new commit that fixes this.

@kmsquire
Copy link
Member

Ok, fair enough--this is an improvement over having all three type parameters. I personally find the version with one type parameter cleaner still, but I won't hold you to it.

@kmsquire
Copy link
Member

@yuyichao, if you have time, it would be great if this PR got another set of eyes. I try to review it as I find the time, but I usually only find the time to take it in small parts, Stephen makes changes responding to my comments, and I look at it again a week or two later. It goes pretty slowly.

@yuyichao
Copy link
Contributor

Unfortunately I don't think I'm qualified to review this.

I don't see anything obviously wrong and I basically don't know anything about data structures... (The only time I learnt about it is in a one semester undergrad level course......)

@ScottPJones
Copy link
Contributor

@kmsquire If you don't mind me being the extra set of eyes, I'll go over it this afternoon.
This is useful stuff for me anyway, and I've done a lot of this sort of thing before (although in C).

@kmsquire
Copy link
Member

@ScottPJones, sure that would be fine. Generally, I believe the algorithm to be fine. What I'm mostly looking for are things which might be done more efficiently, or done in a more Julian way. Or documentation that might be unclear. If you see any of that, can you highlight it? When you get the chance, of course.

@ScottPJones
Copy link
Contributor

Done more efficiently, I can handle, done in a more Julian way, well, I'll try to apply all that I've learned from you all over the last three months! I've already started going over it.

@kmsquire
Copy link
Member

@StephenVavasis, at this point, it would probably be good if you went through the inline comments at the github page with your changes, and review to see which comments you have or have not addressed.

(Sometimes it's not obvious, because you respond mostly by email, but this is the interface used to review your changes.)

Once all looks good, it would then be good to squash the commits (hopefully, @yuyichao's directions on your other PR helped).

@StephenVavasis
Copy link
Contributor Author

Kevin,

I've gone through the comments in this thread to make sure they have been addressed. One comment you made on May 30 implies that you would like to see more constructors for SortedDict. In particular, you seem to be implying that SortedDict should implement the same constructors as Dict. Is this correct? If so, I can write more constructors and make another commit. Aside from that, I think I have addressed all the other comments. Similarly, I can also make SortedSet implement the same constructors as Set.

-- Steve

@kmsquire
Copy link
Member

Perfect. Why don't you squash the commits, make sure the patch applies, and we'll merge. We can address anything else in future commits.

@kmsquire
Copy link
Member

And yes, it would be good if SortedDict and SortedSet had the same kinds of constructors as those in base.

@kmsquire
Copy link
Member

Also, Jeffrey Sarnoff left some feedback on the julia-dev mailing list, in a thread where I asked for help reviewing this PR. I'll post a link later if you need it.

@StephenVavasis
Copy link
Contributor Author

Kevin,

I have made the documentation edits suggested by Jeffrey Sarnoff and squashed the commits. I decided to postpone writing the additional constructors because there are some subteties that need more thought. For example, the following is a valid Dict: d = Dict(4=>"a", 5.0=>"b") even though the first key is an Int and the second is a Float64. How should the corresponding call to the SortedDict constructor behave? I need to mull this over.

-- Steve

@yuyichao
Copy link
Contributor

julia> @which Dict(4=>"a", 5.0=>"b")
call{V}(::Type{Dict{K,V}}, ps::Pair{K,V}...) at dict.jl:427

And in general you should refer to the Dict constuctors in dict.jl

@StephenVavasis
Copy link
Contributor Author

Yichao,

Yes, I will try to mimic all the constructors in set.jl and dict.jl. But meanwhile, maybe we can proceed with this pull request without the additional constructors. Also, after this pull request is done I'll return to correcting the functionality for 'in(k, keys(...))'.

-- Steve

@yuyichao
Copy link
Contributor

I agree and just to be clear, I'm not saying you should implement that before this is merged, just want to point you to the relavant code you should mirror.

I think @kmsquire should decide when to merge this although you may need a rebase before that.

@kmsquire
Copy link
Member

Hi Stephen, thanks for squashing. You still have 2 commits, which is fine, but even merging these together would be good.

As @yuyichao pointed out, can you also rebase on the current master?

You might be familiar with this (or you might not), but just in case, this should look something like

git fetch
git rebase origin/master
# edit any files with merge conflicts
git add <file_which_had_conflict.jl> <other_file_which_had_conflict>
git rebase --continue
git push -f <your repo> SortedContainers  # force push is needed

For merge conflicts, I usually edit by hand, but you might look at http://stackoverflow.com/questions/161813/fix-merge-conflicts-in-git for inspiration.

and SortedSet.  See README.rst for more info.

Deleted @inbounds; deleted extraneous println; inserted default argument allowdups as per Jiahao's suggestions.

Rewrote container loops to partially address issue number 93 (excessive heap allocation)

Corrected tiny typo

Fixed more bugs in containerloops

All sorted containers updated with semitoken interface instead of token interface
As mentioned in issue 93 and ensuing discussion, the previous 'token' interface
for indexing containers suffers from a performance issue.  It has been replaced
with a semitoken interface.

Fix error on 0.4-dev, allow running tests without installing

Fixed some documentation typos regarding new semitoken interface.

Add @compat declarations for backwards compatibility with Julia 0.3

This commit addresses Kevin Squire's recent comments.

Edits to previous commit to make package compatible with Julia 0.3.9

Remove ref_value and Ref{T} type from source, test, and documentation

Added 'eltype' functions for all container loops; fixed old-style tuple-type eltypes

Fix the statements that test eltype

Edit README.rst according to comments by Jeffrey Sarnoff
@StephenVavasis
Copy link
Contributor Author

I have rebased to the most recent master. -- Steve

@kmsquire
Copy link
Member

kmsquire commented Aug 6, 2015

Merging. As always, sorry for the delay. Thanks for your work and your patience, @StephenVavasis.

kmsquire added a commit that referenced this pull request Aug 6, 2015
This commit adds two new data structures to the library: SortedMultiDict
@kmsquire kmsquire merged commit 013e57e into JuliaCollections:master Aug 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants