Skip to content

Julep: broadcast with dictionaries #25904

@andyferris

Description

@andyferris

Current behavior

I just learned that apparently broadcast doesn't work as I expected on dictionaries:

julia> d = Dict(:a => "Alice", :b => "Bob", :c => "Charlie")
Dict{Symbol,String} with 3 entries:
  :a => "Alice"
  :b => "Bob"
  :c => "Charlie"

julia> map(kv -> kv[1] => kv[2], d)
Dict{Symbol,String} with 3 entries:
  :a => "Alice"
  :b => "Bob"
  :c => "Charlie"

julia> broadcast(kv -> kv[1] => kv[2], d)
ERROR: KeyError: key 1 not found
Stacktrace:
 [1] getindex at ./dict.jl:474 [inlined]
 [2] (::##17#18)(::Dict{Symbol,String}) at ./REPL[11]:1
 [3] broadcast(::Function, ::Dict{Symbol,String}) at ./broadcast.jl:455

julia> broadcast(length, d)
3

AFAICT it seems broadcast is working on dictionaries like they are scalars or something. Someone correct me if I'm wrong - but maybe we never added any broadcast methods for dictionaries?

Proposal

Make broadcast work with dictionaries. I can see two possibilities, of which I favor option 2:

  1. broadcast for 1D indexable containers generally works like map, so we can extend this to AbstractDict and have broadcast map the key-value pairs
  2. broadcast deals with matching indices and performing some operation on values. So broadcast(f, dict) could be a bit like map(f, values(dict)) except the dictionary structure and it's indices are preserved in the output.

It seems to me that map deals with the iteration protocol and broadcast is more about matching indices and mapping those values. Using option 2 would give:

julia> length.(d)
Dict{Symbol,Int64} with 3 entries:
  :a => 5
  :b => 3
  :c => 7

IMO this kind of syntax could help a lot with manipulating dictionaries (one example is using AbstractDict for output of groupby-style operations, and then acting on the groups) while making no breaking changes to dictionary iteration or map.

To me it also dovetails with the "getindices julep" via this comment #24019 (comment) which basically suggests we can get multi-valued getindex between different indexable container types including dictionaries via the lowering transformation:

a.[inds] --> broadcast(i -> a[i], inds)

EDIT: Addendum (27/07/2019)

I forgot to mention a crucial use case of broadcasting over dictionaries where you have two or more dictionaries with the same keys and you want to match up the values based on the keys. For example dict1 .+ dict2. For our hash-based Dict, the user can't really predict the iteration order and they may actually differ between dict1 and dict2 even if the user can guarantee that they share the same set of keys (for example, dict2 might have been rehashed and dict1 not, but IMO this implementation detail shouldn't affect the semantics of the broadcast operation).

Metadata

Metadata

Assignees

No one assigned

    Labels

    broadcastApplying a function over a collectioncollectionsData structures holding multiple items, e.g. setsjulepJulia Enhancement Proposal

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions