Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduce concise syntax for Dict construction? #12930

Closed
malmaud opened this issue Sep 3, 2015 · 44 comments
Closed

Reintroduce concise syntax for Dict construction? #12930

malmaud opened this issue Sep 3, 2015 · 44 comments
Labels
needs decision A decision on this change is needed

Comments

@malmaud
Copy link
Contributor

malmaud commented Sep 3, 2015

As dicussed in https://groups.google.com/forum/#!topic/julia-users/1bwx3fjSO5A, many are unhappy with how verbose Dict literal construction has become in 0.4. I'm aware there were real problems with the old syntax, but maybe we can still think of a way to allow a more parsimonious syntax going forward.

@stevengj stevengj added the needs decision A decision on this change is needed label Sep 3, 2015
@stevengj
Copy link
Member

stevengj commented Sep 3, 2015

See #6739 for discussion on the original change.

@stevengj
Copy link
Member

stevengj commented Sep 3, 2015

What's so verbose about Dict(3=>4, 5=>6)? It is only three four more characters than [3=>4, 5=>6]. (I can't count.)

@kmsquire
Copy link
Member

kmsquire commented Sep 3, 2015

To add a little more context, this is especially true when dealing with Dicts of Dicts (e.g., when printing Julia representation of JSON objects):

julia> using JSON

julia> a="{\"menu\": {
                \"id\": \"file\",
                \"value\": \"File\",
                \"popup\": {
                  \"menuitem\": [
                    {\"value\": \"New\", \"onclick\": \"CreateNewDoc()\"},
                    {\"value\": \"Open\", \"onclick\": \"OpenDoc()\"},
                    {\"value\": \"Close\", \"onclick\": \"CloseDoc()\"}
                  ]
                }
              }}
              "
"{\"menu\": {\n         \"id\": \"file\",\n         \"value\": \"File\",\n         \"popup\": {\n           \"menuitem\": [\n             {\"value\": \"New\", \"onclick\": \"CreateNewDoc()\"},\n             {\"value\": \"Open\", \"onclick\": \"OpenDoc()\"},\n             {\"value\": \"Close\", \"onclick\": \"CloseDoc()\"}\n           ]\n         }\n       }}\n       "

julia> println(JSON.parse(a))
Dict{AbstractString,Any}("menu"=>Dict{AbstractString,Any}("id"=>"file","value"=>"File","popup"=>Dict{AbstractString,Any}("menuitem"=>Any[Dict{AbstractString,Any}("onclick"=>"CreateNewDoc()","value"=>"New"),Dict{AbstractString,Any}("onclick"=>"OpenDoc()","value"=>"Open"),Dict{AbstractString,Any}("onclick"=>"CloseDoc()","value"=>"Close")])))

@stevengj
Copy link
Member

stevengj commented Sep 3, 2015

@kmsquire, even if you need to specify the type, the old syntax was still nearly as verbose (three fewer characters): (AbstractString=>Any)[ ... ] vs. Dict{AbstractString,Any}( ... ).

@malmaud
Copy link
Contributor Author

malmaud commented Sep 3, 2015

Just so concisely summarize the original change motivation, it seems like the decision to make => first-class (which I totally agree with) is what disallows [a=>b, c=>d] (since it would be ambiguous with a Vector of Pairs). What was the probelm with curly-brace syntax?

@stevengj
Copy link
Member

stevengj commented Sep 3, 2015

@malmaud, curly braces used to be Any[...] (ala Matlab cell arrays), and this needs to be deprecated for at least one major release, before it can be repurposed.

Also, punctuation is precious. Even when curly braces are available to be repurposed, is it really worth using them to save typing 3-4 characters?

@malmaud
Copy link
Contributor Author

malmaud commented Sep 3, 2015

Ah right. So maybe this is as simple as repurposing curlies in .5.

@mbauman
Copy link
Sponsor Member

mbauman commented Sep 3, 2015

I think the leading contender is tuple types: #8470

With 85f4597, curly braces could maybe be used for both Tuples (with types) and something else (with values, or more specifically just Pairs). Sure, it's two meanings for the same syntax, but they're used in very different contexts with very different content between the braces.

@ScottPJones
Copy link
Contributor

My current problem is more the inconsistencies between the type inference with [ ] and Dict( )
(see https://groups.google.com/forum/#!topic/julia-users/1bwx3fjSO5A)
Some more issues:
Dict("a"=>1,"b"=>2) => Dict{ASCIIString,Int64}, but Dict("á"=>1,"b"=>2) => Dict{Any,Int64}.
That could have come back as Dict{UTF8String,Int64}, or at least Dict(AbstractString,Int64).

@JeffBezanson
Copy link
Sponsor Member

I'm against going back on this.

@IainNZ
Copy link
Member

IainNZ commented Sep 3, 2015

The comparison in the OP of the julia-users thread is not even remotely fair, because they are specifying the type in one case and not the other.

@lobingera
Copy link

@stevengj, yes dicts are a mighty tool (See Python), spending Syntax in that might be a good investment.

@mdcfrancis
Copy link

@IainNZ - slightly unfair except for the following

julia> { :a => 1 }

WARNING: deprecated syntax "{a=>b, ...}".
Use "Dict{Any,Any}(a=>b, ...)" instead.
Dict{Any,Any} with 1 entry:
      :a => 1

If you follow the depreciation it suggests that one should use types when replacing {}.

@IainNZ
Copy link
Member

IainNZ commented Sep 3, 2015

Subtle point I guess, because { } means Dict{Any,Any}, but its not clear {Any,Any} was wanted - in fact you the example used {Symbol,Any} - which is more like [] in 0.3

@mdcfrancis
Copy link

If we were being consistent we would be removing [1,2,3 ] as well and making people type Vector( 1,2,3 ) etc. I see no reason why Vectors are more special that associative collections.

@jakebolewski
Copy link
Member

I don't see why we are debating this now, 0.4 is close to being finally branched. Discussion about this change is almost a year old at this point.

@mdcfrancis
Copy link

@jakebolewski because right now we are spending a large amount of time updating packages and code to use 0.4 - this is the first time for many where they are seeing the impact of this change.

@IainNZ
Copy link
Member

IainNZ commented Sep 3, 2015

@mdcfrancis I don't think that necessarily follows re [] and Vector, but if you'd like to submit a PR implementing a special syntax for Dict I'm sure it'll be assessed on its merits for Julia 0.5.

@JeffBezanson
Copy link
Sponsor Member

There isn't enough syntax for every data structure, and I would argue that distinguishing data structures by bracket type is not terribly clear anyway. I also think vectors really are more fundamental than dictionaries. How are dictionaries implemented after all?

I might add that { } is well-established notation for sets, so maybe { } should only construct sets. But I don't want to debate whether sets or dicts are more important.

@jrevels
Copy link
Member

jrevels commented Sep 3, 2015

Just my 2 cents, but I greatly prefer the new syntax. Dict{K,V}(...) reads clearer to me than (K=>V)[...]. It is also more explicit; it's obvious that you're constructing a Dict rather than an array of Pairs.

If you follow the depreciation it suggests that one should use types when replacing {}.

To back up @IainNZ's point, if you use the old syntax for a type-inferred Dict, the deprecation warning actually shows you the correct new syntax for making the same Dict:

julia> ["a"=>2, "b"=>3]

WARNING: deprecated syntax "[a=>b, ...]".
Use "Dict(a=>b, ...)" instead.
Dict{ASCIIString,Int64} with 2 entries:
  "b" => 3
  "a" => 2

That fact that this was a source of confusion in the first place is an argument in favor of the new syntax, IMO.

I would argue that distinguishing data structures by bracket type is not terribly clear anyway.

I very much agree with this.

I might add that { } is well-established notation for sets, so maybe { } should only construct sets. But I don't want to debate whether sets or dicts are more important.

My vote is strongly in favor of using {} for #8470, instead of using them to construct a new value (I suppose a type is iteslf a value of type DataType, but you know what I mean).

@mdcfrancis
Copy link

To clarify on a few points

  • I'm very happy with the 0.4 change that changes (T1,T2) to Tuple{T1,T2} that is very clear of intent.
  • I'm not convinced that it is obvious that {T1,T2} is the type of a tuple, though I guess I can get my head around it.
  • Removal of the (K=>V)[ ... ] syntax is an improvement, to be honest I didn't know of its existence ( nor would I have used it. )

The feeling around associative collections is simply that they are very prevalent in coding, especially when you are interfacing with other systems. If you look at Escher (for example) you'll see them all over the place.

For my common use cases [ :a => 1, :b => [ :x => 2.3] ] is more that sufficient, e.g. a list of pairs which may be coerced when required into an associative. Perhaps the challenge here is more that this syntax is deprecated where it should really be encouraged?

@nolta
Copy link
Member

nolta commented Sep 3, 2015

What's so verbose about Dict(3=>4, 5=>6)? It is only four more characters

My worry is that + -> .+ was only one more character, and look how well that turned out (#7226).

@jrevels
Copy link
Member

jrevels commented Sep 3, 2015

The feeling around associative collections is simply that they are very prevalent in coding, especially when you are interfacing with other systems.

No dispute there!

For my common use cases [ :a => 1, :b => [ :x => 2.3] ] is more that sufficient, e.g. a list of pairs which may be coerced when required into an associative. Perhaps the challenge here is more that this syntax is deprecated where it should really be encouraged?

The reason why running [:a => 1, :b => [:x => 2.3]] currently throws a deprecation warning, but still actually follows through with old behavior, is to give folks time to adjust before removing the old behavior entirely. I'm not sure when the changeover will actually occur (maybe once v0.4 actually releases), but once it does, this will indeed be the right syntax for constructing an array of Pairs.

If the new behavior is sufficient for your case and you want to make the switch now, you can explicitly "opt in" by using the Pair constructor instead of the => operator:

julia> [Pair(:a, 1), Pair(:b, [Pair(:x, 2.3)])]
2-element Array{Pair{Symbol,B},1}:
 :a=>1
 :b=>[:x=>2.3]

Definitely not as pretty as using the => operator, but it will work until the new behavior fully comes into play.

I'm not convinced that it is obvious that {T1,T2} is the type of a tuple, though I guess I can get my head around it.

The idea does require some getting used to if you're used to the current Julia syntax. It's more intuitive when you think about it in relation to the role value tuples play in function application:

f applied to arguments (1,2,3)f(1,2,3)
T applied to parameters {A,B,C}T{A,B,C}

It could also really cleans up syntax that currently uses Val types (or some similar wrapper type), which can come into play when writing generated functions for type-stable transformations over heterogeneous tuples (but I digress, discussion regarding the tuple type change should probably stay in #8470).

@ScottPJones
Copy link
Contributor

I also think vectors really are more fundamental than dictionaries. How are dictionaries implemented after all?

I think that is the wrong question. We are talking about the syntax and concepts here, not implementation details.
If you think of it from a conceptual viewpoint, associative arrays (aka dictionaries) are more fundamental than integer subscripted vectors or arrays (Lua is very nice that way, as is CachéObjectScript and M/Mumps).
What is a vector, but an associative array with the keys restricted to integers?

Also, why do dictionaries even have to be implemented with vectors? (unless you really want to get down to the nitty gritty, where the entire memory of the computer is a vector of bytes).
In COS, globals (persistent, distributed, atomic) associative arrays were implemented with B+ trees,
and local associative arrays with a variety of structures (p-tries, vectors that stored the base index and span and allowed for missing values [arrays that had only had integer subscripts yet], hash tables), whatever was most efficient, but all invisible to the programmer.

@malmaud
Copy link
Contributor Author

malmaud commented Sep 3, 2015

If the blessing that is generic programming allow lists of pairs to realistically work in most contexts that dictionaries currently work, that seems like it would be a pretty good solution.

@mdcfrancis
Copy link

@malmaud and from the experiment I'm doing at the moment, you can trivially implement the associative methods for Vector{Pair{K,V}}, so perhaps the real issue here is that the deprecation should not have been such and should have been a switch to the Pair vector syntax with a thin shim which supports associative like behavior, which is often cheaper / smaller for small collections.

@mdcfrancis
Copy link

@JeffBezanson - how would you feel about a PR for that? e.g. go directly to the pair syntax ?

@mbauman
Copy link
Sponsor Member

mbauman commented Sep 3, 2015

If the blessing that is generic programming allow lists of pairs to realistically work in most contexts that dictionaries currently work, that seems like it would be a pretty good solution.

That would be cool for it to work as a linear-search "dictionary", but unfortunately there's a clash of meanings with numeric keys.

d = [Pair(2=>3), Pair(3=>4)]
d[2] == (3=>4) # Is it the second element?
d[2] == 3      # Or is it the key lookup?

@mdcfrancis
Copy link

@mbauman it would be the key lookup, but I agree that is an odd case

@mbauman
Copy link
Sponsor Member

mbauman commented Sep 3, 2015

Then it's no longer a Vector of Pairs. Here's the trouble:

d = Any[Pair(2=>3), Pair(3=>4)]
d[2] == (3=>4)
d = [Pair(2=>3), Pair(3=>4)]
d[2] == 3

This would be bizarre. If inference at some point fails to concretely type an array comprehension, your data structure now behaves extremely differently.

@ScottPJones
Copy link
Contributor

In 0.5, will x = [ :a => 1, :c => [ 2, 3] ] give me a Vector{Pair{Symbol,Any}}?
and typeof(x[2]) gives me Pair{Symbol,Vector{Int}}? (really, it gives Array{Int64,1} instead of Vector{Int}, but they are ===).

That is what I'd expect.

@mdcfrancis I don't agree with it doing a key lookup instead of returning the Pair.
Instead, I'd have a Dict constructor that converts a (possibly nested) vector of pairs into a Dict of the right type, and returns it.

I.e. x = Dict([:a => 1, :b => [2,3]]) returns something of type Dict{Symbol, Any}, where
x[:a] returns 1, and x[:b] returns [2,3].

How about that? Syntax is easy, doesn't change any proposed 0.5 syntax, and gives easy to read associate array literals.

@malmaud
Copy link
Contributor Author

malmaud commented Sep 3, 2015

@mbauman Ya, all I was thinking of really is that functions that expect a dict, f(d)=something(d), would instead look like f(d)=something(asdict(d)). Define asdict(d::Associative)=d and asdict={T<:Pair}(x::Vector{T})=Dict(x) (or some light-weight dict alternative that has that key-value semantics).

@ScottPJones
Copy link
Contributor

@kmsquire, why did you bother with all of those " to quote the JSON string? Just use """ instead, and you don't have to change them (except watch out for $'s in the text!)

@ScottPJones
Copy link
Contributor

@malmaud Sounds like we are thinking on exactly the same lines.

@stevengj
Copy link
Member

stevengj commented Sep 3, 2015

@nolta, I don't think the .+ vs. + transition is comparable. Requiring .+ for array + scalar was problematic because + is extremely well established syntax for this operation in scientific computing. Also, using + required no special support in the Julia parser, only ordinary method overloading. Whereas Julia's old Dict syntax is neither universal nor implementable without special parser support.

@mdcfrancis
Copy link

The proposal is no worse than what exists in 0.3 today which a lot of people were happy with ( pairs convert to dictionary ). As @mbauman points out the ambiguity is for integer keys, for the rest of the universe of types the behavior would be consistent (key based lookups with linear performance). We could (if required) special case integer so that it does not perform the key lookup (probably a good idea).

This would not solve the case where a function is expecting an associative which seems like the main reason not to do this. We would have to go through the code and change API points to accept Vector{Pair} or Associative - I suspect this is less work than changing all the usage of { pair } and [ pair ] though and would be inline with the future direction.

@jrevels
Copy link
Member

jrevels commented Sep 3, 2015

I don't think changing the indexing behavior of a Vector based on simply on its eltype is a good idea...something with type Vector should behave like a vector, not a dictionary. If you want it to behave like a dictionary...well, that's what Dict is for.

@mdcfrancis
Copy link

You are probably correct, though this does not change the indexing, it just extends it. A Vector{Pair{String,Any}} would still behave like any other vector you can push elements onto it, you can reference by integer index etc. Just that when you indexed it by a String the lookup would be on the contents of the element.

@kmsquire
Copy link
Member

kmsquire commented Sep 4, 2015

@kmsquire, why did you bother with all of those " to quote the JSON string? Just use """ instead, and you don't have to change them (except watch out for $'s in the text!)

True. That example was copy-pasted from the JSON.jl tests, and whoever wrote that originally probably wasn't aware of """ at the time.

@mdcfrancis
Copy link

@one-more-minute suggested the following macro for supporting direct JSON syntax in Julia.

https://groups.google.com/d/msg/julia-users/1bwx3fjSO5A/V_inIa7eCAAJ

I'm also looking forward to having vectors of pairs as soon as is practical in the next version.

These two items would remove my objections to the removal of the terse syntax (as it would still exist for my purposes :) ) . At what point do we think we will be able to remove the backward compatibility from the [] syntax?

@ScottPJones
Copy link
Contributor

I do love how this discussion led to a reasonable solution for @mdcfrancis (and certainly others, including myself), within less than 24 hours. It might have seemed like pointless complaining at first to some people, but look at the results.

@mauro3
Copy link
Contributor

mauro3 commented Sep 4, 2015

Backwards compatibility will be removed in 0.5, after one release cycle of deprecation warnings.

@mdcfrancis
Copy link

@ScottPJones - agreed. @mauro3 should we close this issue and open a concise description with a 0.5 tag so that it is rembered ? We can place the link back to here.

@JeffBezanson
Copy link
Sponsor Member

That really is a nice macro. Good example of a situation where a macro is a good solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests