Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mergewith[!](combine, dicts...) #34296

Merged
merged 13 commits into from Jan 28, 2020
Merged

Conversation

tkf
Copy link
Member

@tkf tkf commented Jan 7, 2020

Can we remove the restriction ::Function in merge[!](combine::Function, d::AbstractDict, others::AbstractDict...)? Was this there to disambiguate the case where the first argument is AbstractDict but also is callable? Or was it there to allow merge[!](d::MyDict, ::Any)?

@JeffBezanson
Copy link
Sponsor Member

I would prefer it if the function form were called something like mergewith! instead of conflicting with merge!(dict, dict2) as it does now.

@tkf
Copy link
Member Author

tkf commented Jan 8, 2020

That would be great. Can we also have the curried version mergewith!(f) = (args...) -> mergewith!(f, args...) (which was impossible to implement safely with merge!)? This would be pretty useful as a "reducing function factory"; e.g., reduce(mergewith!(+), dicts).

@tkf
Copy link
Member Author

tkf commented Jan 8, 2020

@andyferris @oxinabox Maybe you are interested in this discussion as this increases dictionary API.

@tkf tkf changed the title Allow non-Function callable as combine argument of merge[!] Add mergewith[!](combine, dicts) Jan 9, 2020
@tkf
Copy link
Member Author

tkf commented Jan 9, 2020

I added mergewith and mergewith!.

@tkf tkf changed the title Add mergewith[!](combine, dicts) Add mergewith[!](combine, dicts...) Jan 13, 2020
@JeffBezanson JeffBezanson added the status:triage This should be discussed on a triage call label Jan 13, 2020
@andyferris
Copy link
Member

andyferris commented Jan 14, 2020

Is the probem with merge(f::Function, dicts...) that the dispatch tables are hard to define without amibuguities? Or that you want some combination function f which is not a Function? I've been using Callable a lot more lately - is that suitable? I'm just wondering if that goes against a pattern we general have like findall(container) vs findall(f, container) - would we chose findallwith(f, container)?

Is the odd thing about this method rather the "implict reduction" over varargs dictionaries? The (long) discussion at #32860 seems to at times suggest such implicit folding/reducing might be superceded by an explicit one - leaving here just merge(dict1, dict2) and merge(f, dict1, dict2), for example. An appropriately curried merge(f) or merge!(f:Callable, dict) might mean reduce(..., dicts) could work out ok.

@nalimilan
Copy link
Member

I agree that it sounds problematic to introduce *with variants of functions. The list is potentially quite long...

AFAICT there's no ambiguity between merge(d::AbstractDict, others::AbstractDict...) and merge(combine, d::AbstractDict, others::AbstractDict...). The corner case where combine::AbstractDict doesn't seem to make sense (and anyway you could always pass x -> combine(x) in that case).

@tkf
Copy link
Member Author

tkf commented Jan 14, 2020

Or that you want some combination function f which is not a Function? I've been using Callable a lot more lately - is that suitable?

Yes, for example, PyCall.PyObject can wrap a Python function but it cannot be a subtype of a Function. As you can make any type callable, I think merge/mergewith should be usable with any object as a combiner.

In principle, we can do this without introducing mergewith[!]. But I think the problem is that it's hard to define methods. Ideally, it should be possible to call merge with any kinds of associative data structures and arbitrary (possibly non-Base.Callable) callable; e.g., merge(py"lambda x, y: x + y", ::Dict, ::NamedTuple, ::ImmutableDict, ::HashDictionary). To do this, we need that all implementation to use a format like this:

right(_, x) = x
merge!(dest::MyType, other, rest...) = _merge_impl(right, dest, other, others...)
merge!(combine, dest::MyType, other, rest...) = _merge_impl(combine, dest, other, others...)

# Disambiguation:
merge!(dest::MyType, other::MyType, rest...) = _merge_impl(right, dest, other, others...)

I think it's OK. But it's ugly; especially the disambiguation merge!(dest::MyType, other::MyType, rest...).

I'm just wondering if that goes against a pattern we general have like findall(container) vs findall(f, container) - would we chose findallwith(f, container)?

I agree that it sounds problematic to introduce *with variants of functions. The list is potentially quite long...

I think we only need this for "monoid-like" functions; i.e., the ones with signatures "(::T, ::T) :: T" and "(op, ::T, ::T) :: T" (note: ::T here is a bit informal; I want to consider merge(::Dict, ::NamedTuple) :: Dict as well). Only this kind of functions has the problem I noted above (especially the disambiguation). Are there other functions like this in Base? Possibly intersectwith, unionwith, and somethingwith? But none of them has corresponding method that take a callable (e.g., intersect(f, sets...)). So, can't we safely treat mergewith as a special case?

@andyferris
Copy link
Member

andyferris commented Jan 15, 2020

I have to admit, the callable amibuguity is a bit annoying. You can't add a type to the Callable union (though you can use Function as a supertype, so long as you don't want any other supertypes) and in cases like PyObject, it's more of a run-time property rather than a type property.

We do have a pattern for dealing with these things - traits. Speculatively, we could define in Base something like:

export iscallable

iscallable(::Any) = false
iscallable(::Function) = true
iscallable(::Type) = true

and query iscallable in merge/merge! (and there are probably other multimethods in Base using Function or Callable in their signature that may benefit too). This trait function could be easily implemetned and used in third-party packages by both new types (including PyObject) and new functions.

@andyferris
Copy link
Member

But none of them has corresponding method that take a callable (e.g., intersect(f, sets...)).

Actually, doesn't intersect(f, collections...) make sense for the same reason that unique(f, collection) does?

@tkf
Copy link
Member Author

tkf commented Jan 15, 2020

I agree defining trait is a possible approach, but why bring such an "advanced" solution when you have a dumb solution? It also makes it harder to know what merge is going to do, especially when iscallable needs to depend on run-time property as in iscallable(::PyObject) (which makes it not really a genuine trait).

I think it's a nice property that you can tell what merge and mergewith do by just looking at the syntax and without knowing the type (ignoring merge(::Function, _...)). This is probably a bit against the praise of multiple dispatch in Julia community but using different functions have its own merits. Note that what I'm saying is not against findall(container) vs findall(f, container) because you can tell what its going to do by its arity which is a syntactic property. But you can't tell much by looking at the arity of merge due to the support for vararg "implicit reduction", as you said in the first comment.

Actually, doesn't intersect(f, collections...) make sense for the same reason that unique(f, collection) does?

I think it does. But I think the list of *with functions is not large, even when including such hypothetical functions.

@nalimilan
Copy link
Member

PyCall are indeed the most problematic example, but it's use with merge! sounds quite hypothetical. What's the concrete example which motivates this change?

@tkf
Copy link
Member Author

tkf commented Jan 15, 2020

I don't understand why PyCall example is hypothetical. Personally, I'd use it if it were usable. Also, that's exactly the reason why the type restriction of retry is removed #30382, for example.

In Julia, everything is potentially callable. I think it'd be nice if Base APIs are designed given this property.

@andyferris
Copy link
Member

What's the concrete example which motivates this change?

I don't understand why PyCall example is hypothetical.

@tkf - something in particular must have motivated you to create this PR - it is often helpful to understand these cases, so I feel the question is fair. Was there something you were trying to do that didn't work for you?

In Julia, everything is potentially callable. I think it'd be nice if Base APIs are designed given this property.

Agreed.

@tkf
Copy link
Member Author

tkf commented Jan 16, 2020

OK. That's a fair point. I just didn't want to bring it up as it's not a clean example.

It came up when I tried something like

julia> using Transducers: Map, reducingfunction

julia> rf = reducingfunction(Map(x -> 2x), +)
Reduction(
    Map(Main.λ❓),
    BottomRF(
        +))

julia> merge(rf, Dict(:a => 1), Dict(:a => 1))
ERROR: MethodError: no method matching merge(::Transducers.Reduction{Map{var"#3#4"},Transducers.BottomRF{typeof(+)}}, ::Dict{Symbol,Int64}, ::Dict{Symbol,Int64})
Closest candidates are:
  merge(::OrderedCollections.OrderedDict, ::AbstractDict...) at /home/takafumi/.julia/packages/OrderedCollections/E21Rb/src/ordered_dict.jl:434
  merge(::DataStructures.SortedDict{K,D,Ord}, ::AbstractDict{K,D}...) where {K, D, Ord<:Base.Order.Ordering} at /home/takafumi/.julia/packages/DataStructures/iymwN/src/sorted_dict.jl:598
  merge(::AbstractDict, ::AbstractDict...) at abstractdict.jl:284
  ...
Stacktrace:
 [1] top-level scope at REPL[9]:1

julia> merge((args...) -> rf(args...), Dict(:a => 1), Dict(:a => 1))
Dict{Symbol,Int64} with 1 entry:
  :a => 3

But I thought that it was not a good example as it is "fixable" and, being the library author, arguably I should. I can make whatever returned by reducingfunction a subtype of Function. It's not a Function just because of some historical reason.

However, I don't think tweaking the type hierarchy is the right approach. PyObject is a good example. Another example is BroadcastableCallable from BroadcastableStructs.jl which is a subtype of non-callable special abstract type called BroadcastableStruct. Also, if you are not the library author, you can't change the type hierarchy.

@JeffBezanson
Copy link
Sponsor Member

With multiple dispatch, just as the meaning of a function is important to get consistent, it's also important to get the meanings of the argument slots consistent when possible. It's not ideal for a function to treat an argument in a fundamentally different way based on its type: e.g. "if this is a Function, then call it on pairs of values, otherwise assume it's a container providing the values to use". Given the meaning of merge, that's not something it would naturally do when passed a function (it would probably be an error, which is a very useful default for catching bugs).

base/abstractdict.jl Outdated Show resolved Hide resolved
Co-Authored-By: Jameson Nash <vtjnash@gmail.com>
@JeffBezanson JeffBezanson removed the status:triage This should be discussed on a triage call label Jan 16, 2020
base/abstractdict.jl Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants