Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move ambiguity warnings to run time #6190

Closed
lindahua opened this issue Mar 17, 2014 · 71 comments · Fixed by #16125
Closed

move ambiguity warnings to run time #6190

lindahua opened this issue Mar 17, 2014 · 71 comments · Fixed by #16125
Labels
needs decision A decision on this change is needed

Comments

@lindahua
Copy link
Contributor

When method extension coupled with multiple dispatch mechanism, we tend to be always find ourselves in the loop of continuing adding disambiguating methods. Seriously, this is unsustainable in the long run.

Consider the following:

# suppose this is defined in base

type A <: BaseT end

+(::A, ::A) = println("A + A")   # this needs to be in front
+(::A, ::BaseT) = println("A + BaseT")
+(::BaseT, ::A) = println("BaseT + A")

If someone want to write a package that extends BaseT, he will need to write

module modB
import Base: BaseT, A, +

type B <: BaseT end

# for the second type, you have to put three methods in front
# in order to reduce ambiguity
+(::A, ::B) = println("A and B")
+(::B, ::A) = println("B and A")
+(::B, ::B) = println("B and B")

+(::B, ::BaseT) = println("B and BaseT") 
+(::BaseT, ::B) = println("BaseT and B")
end

If there are n subtypes of BaseT in the Base, O(n^2) disambiguating methods need to be added.

What's worse, if another one independently creates another package that extends BaseT, as below:

module modC
import Base: BaseT, A, +

type C <: BaseT end

+(::A, ::C) = println("A and C")
+(::C, ::A) = println("C and A")
+(::C, ::C) = println("C and C")

foo(::C, ::BaseT) = println("C and BaseT") 
foo(::BaseT, ::C) = println("BaseT and C")
end

Then when one want to import both packages (even without using, just import), there will be ambiguities warnings about +(B, C) etc.

This issue has caused a lot of headaches for packages that try to define their own customized array types, for example:

Images and some other packages also encounter such problems.

Whereas this might be solved by continuously adding disambiguating methods, problems still exist (or even worse) when you use two or more of these packages together.
Are we going to add disambiguating methods for every pair of array types residing in different packages?

Just try the following, you will see screens full of ambiguities warnings (though just importing either one will be fine):

import DataArrays
import Images

We seriously need a fundamental solution to this.

@lindahua
Copy link
Contributor Author

One way to mitigate this is to delay the ambiguity warning/error to when you actually call a method that the compiler thinks is ambiguous to resolve.

For example, if you don't ever actually add an image with a data array, there's no need to complain about ambiguities.

@timholy
Copy link
Member

timholy commented Mar 17, 2014

I recently started a branch in Images that makes an Image not be an AbstractArray (or any type descended from the Array hierarchy). I haven't finished that work, but the whole motivation was a similar concern about the long-term sustainability.

@JeffBezanson
Copy link
Member

One possible answer is that method ambiguities simply don't matter that much. The important question is which of +(::A, ::BaseT) and +(::BaseT, ::B) you want to get called. From the signatures, they are equally applicable to (A, B), so maybe picking one arbitrarily is fine.

In that case, we could eliminate the warnings, or maybe only print them for definitions in the same module, just to help ensure you've handled all locally-important cases.

Another solution is to resolve left-to-right, which is easy to understand, but also a bit arbitrary.

@timholy
Copy link
Member

timholy commented Mar 17, 2014

I do like @lindahua's idea suggestion to delay the warning. IIUC, the warnings are generated at parsing time. Can we somehow queue them up silently, and issue them only when an ambiguous function gets compiled? To me that seems like it might be the best of all worlds:

  • warnings are issued only for methods that get used in practice (the method will not be compiled unless it is used)
  • there would be no performance hit for execution---the warning is issued at compile time, but it's not part of the compiled function
  • it's just as safe as our current approach, in that any time anything is ambiguous, the user will be warned (no silent random choices)

@timholy
Copy link
Member

timholy commented Mar 17, 2014

And one more: it would seemingly fix #270 in most circumstances, because module-loading would be complete before compilation occurs.

@lindahua
Copy link
Contributor Author

I agree with @timholy.

@JeffBezanson
Copy link
Member

I think it might be more annoying to have the warnings appear at random times during execution. As a library developer it then becomes hard to ensure users won't see warnings. Also the warning will appear in response to a user action (calling a function with a particular combination of arguments), but it's not the user's fault and they aren't in a position to do anything about it.

@lindahua
Copy link
Contributor Author

The users still have some options to eliminate the ambiguities, for example, they can convert either argument to an ordinary array in the case of +(Image, DataArray).

However, under current situation, the users have essentially no way to suppress the several screens of warnings if they do

using DataArrays
using Images

even when there is no code that attempts to add a data array and an image (it is far more often they use these two packages in different parts of their codes)

Even if the maintainers of both packages do all their best to eliminate the warnings within their respective packages, warnings persist when multiple packages are used.

@StefanKarpinski
Copy link
Member

One immediate thing we can do to help is just delay ambiguity warnings until the end of module definition. If the method is no longer ambiguous at that point, don't issue the warning. This is effectively just fixing #270 by itself and seems completely uncontroversial. I agree with @JeffBezanson that delaying ambiguity warnings to method compilation time seems like a bad idea.

@lindahua
Copy link
Contributor Author

C++ issues an ambiguity error only when a function is called in an ambiguous way, and this way doesn't seem to cause a lot of problems.

Also, delaying ambiguity warnings to the end of module definition doesn't seem to help the case where more than one packages extend the arithmetic operators for customized arrays. Such a problem will become more and more common, as we have more packages.

@StefanKarpinski
Copy link
Member

C++ issues an ambiguity error only when a function is called in an ambiguous way, and this way doesn't seem to cause a lot of problems.

C++ can do this statically, so it's a completely different situation. Delaying warnings until the end of module definition definitely doesn't address the core problem, but it is somewhat helpful and uncontroversial, which is a good starting point. I agree that there's a very serious problem here that we need to do something about.

@JeffBezanson
Copy link
Member

Dahua is right that delaying warnings to the end of a file does not help the situation described in this issue. If we ultimately decide to delay the warnings to compilation or calls, that behavior would be subsumed.

@lindahua
Copy link
Contributor Author

Delaying warnings to the end of file, compilation time, or run-time is just kind of a stopgap to mitigate the situation. There's actually a more fundamental problem at the heart of our mechanism of method extension and resolution, which may cause more serious issues as the eco-system grows.

The key issue is that packaging system is about separating concerns and responsibilities. However, method extension (together with the current way of method resolution) on the other hand creates a subtle coupling across packages. Such dependency across package could potentially get out of hand.

I don't think I have a very good solution at this point. I just want to bring this to the attention of the community, and hopefully we can eventually figure out a satisfactory solution.

@timholy
Copy link
Member

timholy commented Mar 17, 2014

While I'm not sure it's better than deferring the warning, another approach would be the following: say I'm the developer of Images (just supposing...). I'm currently sitting in Images/src. I issue the following command:

generate_ambiguity_errors(outputfilename, "Images", "DataArrays")

This automatically generates a file containing a bunch of functions that look like this:

(+)(A::Image, B::DataArray) = error("Ambiguous method. File an issue at Images.jl if you want this fixed.")

@timholy
Copy link
Member

timholy commented Mar 17, 2014

+1 for @lindahua's summary two posts above. It's funny how one's thinking can be on similar trajectories; just yesterday I thought to myself, "if there's any one thing that can get in the way of the ultimate success of Julia, this is it."

@JeffBezanson
Copy link
Member

What I like about that idea is that it makes ambiguity warnings explicitly a tool for library developers, which is kind of what it should be. You wouldn't want to actually include all the (+)(A::Image, B::DataArray) definitions though since we'd have every package explicitly mentioning every other package.

@timholy
Copy link
Member

timholy commented Mar 17, 2014

I was thinking that inclusion would be conditional on if isdefined(Main, :DataArrays). That way it's only cross-referenced if it has been loaded.

@lindahua
Copy link
Contributor Author

This is a lovely idea. But what if @johnmyleswhite also writes a generate_ambiguity_errors on his part when developing DataArrays -- would the same set of methods are duplicated in the other package. Also, what if that involves more than two packages?

@jakebolewski
Copy link
Member

Isn't method ambiguity in the presence of polymorphic typed arguments a well known problem with multiple-dispatch systems? You can find whole discussions / papers on the internet devoted to such topics: http://c2.com/cgi/wiki?MultiMethods. Can you even define such a system to be totally unambiguous in all circumstances? Wouldn't more sophisticated namespace support go a least some way towards alleviating this problem?

@timholy
Copy link
Member

timholy commented Mar 17, 2014

I think you want both package authors to use that script, with the package names reversed in terms of order. Yes, the error methods would be duplicated. In fixing the errors, the protocol should be to CC the other package developer(s). Perhaps a package called DataArrays-Images.jl would be created (which would not define a module) to hold any actual common methods (not error stubs) that rely on both. What I'd like to have happen is that generate_ambiguity_errors would then (after creation of DataArrays-Images.jl) be re-run as

generate_ambiguity_errors(outputfilename, "Images", "DataArrays", "DataArrays-Images")

from within Images, and

generate_ambiguity_errors(outputfilename, "DataArrays", "Images", "DataArrays-Images")

from within DataArrays.

Inside of src/Images.jl I'd have

if isdefined(Main, :DataArrays)
    include("DataArrays_ambiguities.jl")
    include(Base.find_in_path("DataArrays-Images"))
end

and you'd have its converse in src/DataArrays.jl. DataArrays_ambiguities.jl shouldn't (if generate_ambiguity_errors works properly) include any methods that duplicate those in DataArrays-Images, but you could load the latter last just for safety (say, if there are updates to DataArrays-Images but the script hasn't been re-run).

Naturally, all this relies on deferring warnings until module-loading is complete, but I think that's all that's required.

Also, what if that involves more than two packages?

My head hurts already, please show mercy 😄. But in principle, it seems you could just extend the list of package arguments, and any ambiguous method that comes up during the load sequence would result in an error stub being created. You'd probably have to do both

generate_ambiguity_errors(outputfilenameAB, "MyPkg", "PkgA", "PkgB")
generate_ambiguity_errors(outputfilenameBA, "MyPkg", "PkgB", "PkgA")

@jakebolewski
Copy link
Member

I'm sure others are fully aware of this, but this paper was particularly interesting (http://lambda-the-ultimate.org/node/3061). Seems like there is plenty of room for further experimentation.

@toivoh
Copy link
Contributor

toivoh commented Mar 17, 2014

I'm not sure that I want to blame multiple dispatch or ambiguity warnings for this - rather I think that they make the underlying problem more visible. If you as a user have x, y, z of types <: BaseT, and you can calculate x+y and y+z, wouldn't you expect to be able to calculate x+z as well? That seems like a nasty surprise to get once you run your code, or run it in a corner case.

If we want to use a type such as BaseT, where objects of different subtypes can be added together, and especially if those subtypes will be created independently, we should use a pattern that supports this in a scalable way. One is the promotion system used in Base; surely there is others.

Multiple dispatch really put your abstractions to the test, but I don't think that this needs to be a bad thing. If I knew more of the abstraction that BaseT stands for I would know more of what kind of pattern might be suitable. It is my hope that as Julia matures, a body of design patterns for how to use multiple dispatch will emerge.

Multiple dispatch does introduce a coupling between different methods of the same function. Given a function where people should be able to add methods independently, there's a number of patterns for which kind of method signatures might be added without causing ambiguity, but if you use more than one, you will likely get ambiguity. So even if whoever created the function didn't specify an admissible pattern for overloading, the existing body of methods for it will more or less force to keep using the same pattern as they follow (or swim in ambiguity warnings). Perhaps there should eventually be a way to restrict the way that a given function can be overloaded.

@carlobaldassi
Copy link
Member

This isn't a very well thought out proposal (and I still need to read the papers which have been linked), but I had a rough idea and wanted to check if it's feasible and if there's any interest. What about something along these lines:

  1. ambiguity warnings inside methods are raised at parse time (possibly at the end of parsing)
  2. ambiguity warnings between different modules are raised at call time, but
  3. the user has some way to pick a method to resolve the ambiguity (I mean an easy one, i.e. other than resorting to invoke)

This would allow independent packages to evolve separately, without issuing unnecessary warnings. Point 3 is of course problematic. Maybe the user could simply decide which of two (or more) types takes precedence: the user could decide that DataArray is preferred to Image (e.g. resolve_dispatch(::Type{Image}, ::Type{DataArray}) = DataArray), so that +(::DataArray, ::AbstractArray) would take precedence over +(::AbstractArray, ::Image). The way this may work is that once an ambiguous call is encountered, an internal methods would be called (EDITED: there was a totally wrong implementation idea here).

It's possible that this is simultaneously too complicated and not powerful enough though...

@lindahua
Copy link
Contributor Author

One may consider BaseT as AbstractArray. The type hierarchy of all sorts of arrays is the one that is causing all these headaches. Authors of customized array types want their array types to be able to work with AbstractArray, and thus adding a lot of functions like

type MyArray <: AbstractArray
    ....
end

+ (a::MyArray, b::AbstractArray) = ...
+ (a::AbstractArray, b::MyArray) = ...

Then when we have MyArray, YourArray, and HisArray, the ambiguity issues begin to get out of hand.

There's nothing wrong with allowing customized arrays to work with the array types in Base. But the current approach is obviously not scalable. I agree with @toivoh that we will need to work out a pattern to use multiple dispatch more smartly here.

@lindahua
Copy link
Contributor Author

I generally agree with @carlobaldassi that we may want to differentiate between "local" ambiguities (those within the same module) and "external" ambiguities (those across modules).

We may raise warnings for local ambiguities when a module is loaded and encourage package authors to address them, and raise warnings for cross-module ambiguities in compile-time (or even run-time, e.g. when functions are passed as arguments), and provides some easy way for the authors of client codes to make a choice (at least, we have invoke for now).

@JeffBezanson
Copy link
Member

@toivoh has an excellent point --- coming up with good abstractions can eliminate whole classes of ambiguity problems. For example we'd never have the problem of both (+)(x::Float64, y::Real) and (+)(x::Real, y::Float32) being defined. It's worth looking closely at why this happens.

All of the potential controversy is pushed into promote_type. Ambiguities will now arise in one place instead of over many operators. Further, it's clear that one type must be picked as the winner. Disagreements would be obvious, e.g. maybe manifesting as another package overwriting a definition of promote_rule.

The downside is that these abstractions are difficult to design. And different cases arise: it doesn't make sense to define a priority between DataArray and Image. Arguably, the code image + dataarray has no correct answer. This indeed argues for a run time error, saying "it's not clear what you want; explicitly convert one argument to Array". Thinking of it this way, I'm starting to come around to the idea of run-time errors. It would actually be better to tell the user to write clearer code than to force the library writer or language to make arbitrary choices.

Here is a theory: there are 3 kinds of ambiguous definitions:

  1. Two types independently defining the same functionality, and it wouldn't matter which definition is called. Example: convert(::Type{BigFloat}, x::Real) vs. convert{T<:FloatingPoint}(::Type{T}, x::Rational). Both methods give the same answer.
  2. You define f(::BaseT, ::SubT) and f(::SubT, ::BaseT) because you simply forgot to handle the f(::SubT, ::SubT) case, which in fact needs to be handled.
  3. Conflicting extensions between packages, as in this issue.

In case (1) you want to just ignore and allow the ambiguity. In case (2) you want a definition-time warning. In case (3) you want a run-time error.

@carlobaldassi
Copy link
Member

+1 for run-time errors in case of conflicting extensions between packages. In fact, I was coming around to suggest the same.
(side note: I edited out the code sketch above, it was utterly wrong, sorry for the noise.)

@lindahua
Copy link
Contributor Author

+1 for @JeffBezanson's suggestion of differentiated treatment of ambiguities.

@carlobaldassi
Copy link
Member

It should be taken into account that conversion (e.g. from Image to Array) would not be cheap in general though. It still seems better to call invoke than convert, if one knows what method he actually wants to call. So in the end conceiving a "better interface" to invoke may still be worth something (but this is low-priority, and I still agree that the user should be forced to make the choice explicitly).

@timholy
Copy link
Member

timholy commented Mar 17, 2014

@JeffBezanson, do you have concrete plans for distinguishing those three cases automatically? The first one in particular sounds tricky.

@toivoh
Copy link
Contributor

toivoh commented Aug 12, 2014

How about ambiguities between Base and the current module? I'd be a lot
more comfortable if we'd keep catching those at compile time.

Hence the suggestion about checking between dependent modules, which I
really think would exempt all the cases that people have been complaining
about here?

@toivoh
Copy link
Contributor

toivoh commented Aug 19, 2014

Looking at the video, the arguments seem to be

  • At runtime, it's better to get an error than to pick one out of the most specific methods at random. (Agree that this would be an improvement whether we keep ambiguity warnings or not)
  • Don't want to add tons of boilerplate in the form of error-throwing methods just to suppress warnings -- don't want warnings for combinations that don't make sense.

I agree, but I also think that there are many cases when ambiguity warnings carry valuable information; we should consider carefully before we shoot the messenger.

@toivoh
Copy link
Contributor

toivoh commented Aug 19, 2014

Come to think about it, monotonic return types (see the thread and discussion in #1090) could remove quite a lot of those ambiguities that don't make sense. E.g. if we have

+(::DataArray, ::AbstractArray)::DataArray = ...
+(::AbstractArray, ::Image)::Image = ...

then +(::DataArray, ::Image) must have return type None == typeintersect(DataArray,Image), so we already know that there is no definition needed for it. This is not a cure-all though, consider e.g. dot, which would probably have the same return type when applied to DataArrays and Images.

@toivoh
Copy link
Contributor

toivoh commented Aug 19, 2014

What I'm concerned about if we get rid of all compile time ambiguity warnings is that we will gradually lose a lot of what makes Julia great. I think it's fantastic that multiple dispatch combined with carefully considered abstractions allows so many types and functions in Julia to interoperate as a coherent whole. (To compare, I think anyone who has tried to implement arithmetic operations on their own type in Python has quickly given up the hope to let it interoperate with more than a few types that they themselves know and care about; at least I did.)

What if developers stop noticing all those ambiguous cases for things that do make sense? We will be shifting the burden to the users. I'm afraid that there will be a flood of bug reports about runtime ambiguity errors instead of the current discussions about ambiguity warnings.

Some of those ambiguities could probably have been easily fixed up front if the developer saw the warning. Others will probably indicate design flaws that should have been considered earlier rather than later. When I see a flood of ambiguity warnings, I think that I should probably do something differently, not that I should go add a corresponding amount of boilerplate.

If we suppress ambiguity warnings, I really urge that we start by identifying cases where it's very likely that the resolution would have been to define an error throwing method, and initially suppress just those. Once we see how much that impacts the robustness of the Julia code that we produce, we might take another step.

@StefanKarpinski
Copy link
Member

When I see a flood of ambiguity warnings, I think that I should probably do something differently, not that I should go add a corresponding amount of boilerplate.

I think this is a very good point. A better solution might be this:

  1. Only show warnings in "paranoid mode" – which we've talked about before and should be the default mode for running tests. Your package has tests, right?
  2. Raise an error if an ambiguous method is actually called, regardless of mode.

This way users don't see the flood of warnings but developers should be aware of the problem. Maybe whether to show warnings or not can be triggered by something besides paranoid mode, but what we really need to stop is the slew of warnings when a user who doesn't know or care about the ambiguities sees a wall of warnings when loading a standard package.

@StefanKarpinski
Copy link
Member

Come to think about it, monotonic return types (see the thread and discussion in #1090) could remove quite a lot of those ambiguities that don't make sense. E.g. if we have

+(::DataArray, ::AbstractArray)::DataArray = ...
+(::AbstractArray, ::Image)::Image = ...

then +(::DataArray, ::Image) must have return type None == typeintersect(DataArray,Image), so we already know that there is no definition needed for it. This is not a cure-all though, consider e.g. dot, which would probably have the same return type when applied to DataArrays and Images.

This is a really interesting point. If we know that the intersection of two methods cannot possibly return, then the implementation that throws an error is actually the only possible implementation, so it's not even necessary to warn the developer since the default (2 above) is correct.

@StefanKarpinski
Copy link
Member

Also, keep in mind that if users actually encounter an ambiguous case, we can emit an error that asks them to report the error, so we'd be crowdsourcing the discovery of real problems vs. fake problems. Still, having warnings in paranoid mode may still be best.

@jwmerrill
Copy link

Nitpick: reading through this discussion, I keep getting confused by attempts to distinguish between "run time" and "compile time", since Julia is not AOT compiled. Producing ambiguity warnings when I say

using Images
using DataArrays

is producing warnings at run time, and throwing an error when I do

image + dataarray

is throwing an error at run time.

If Julia moved to AOT compilation, both or neither of them could happen at compile time instead.

There is obviously a distinction between the two, but it would make a lot more sense (to me) to talk about producing an ambiguity warning at "module load time," or throwing an error when "encountering an ambiguous call site".

The reason that I think this is worth bringing up is that in general one prefers compile time errors to run time errors---when there is such a distinction---because it means you can fix problems during development instead of during use. But I don't think that principle is really in operation here, because the ambiguity doesn't happen during development of the packages; it happens when a user decides to load two packages that no one may have simultaneously loaded before.

@timholy
Copy link
Member

timholy commented Sep 23, 2014

Yes, that's better terminology.

@StefanKarpinski
Copy link
Member

Compile time is also technically incorrect – it's parse time. Code generation is later.

@amitmurthy
Copy link
Contributor

One "solution" mentioned by Jeff in this thread, which was not discussed further is - "Another solution is to resolve left-to-right, which is easy to understand, but also a bit arbitrary."

That approach can solve one particular case which I am not able to easily correct right now:

If I define a print(io::AsyncStream, xs...) in stream.jl, it clashes with print(io::IO, t::Text) in docs.jl

If left-to-right disambiguation worked, print(io::AsyncStream, xs...) would invoke the abstract IO implementation after locking io. Thought I would give an example where a left-to-right disambiguation provided a workable solution.

@vtjnash
Copy link
Member

vtjnash commented Aug 22, 2015

this is too big a change to deal with in 0.4.x, should i move this to 0.5 or just remove the target?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs decision A decision on this change is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.