min/max reductions are confusing #4235

Closed
JeffBezanson opened this Issue Sep 9, 2013 · 49 comments

Projects

None yet

10 participants

@JeffBezanson
Owner

The generic max(itr) calls max on pairs of values, while max(Array) calls > on pairs of values. This is a problem because max(A::Array,B::Array) is elementwise, while all other definitions of max(A,B) return either A or B depending which is "larger".

At the very least, all 1-argument max definitions should be consistent on whether they do reduce(max, ...) or call >.

Then we are just left with the elementwise behavior, which may be a fight for another day.

Owner

Worth noting that we solve this problem in other cases by giving the reducing function a different name: + => sum, * => prod. It would be nice to do this for min and max too, but it's hard to think of names, other than maybe max => maximum, or maybe maximal.

Contributor

An alternative view would be that rather max and (a > b ? a : b) should agree and the piecewise maximum could receive a new name, e.g. pmax as in R (because .max would be very exotic.)

Owner

I would be ok with a solution like that too.

Owner

I think we should get this into 0.2 since it's a potentially breaking API change. In particular we must get rid of the awful max(A, (), dim) methods.
cc @ViralBShah @StefanKarpinski

Owner
timholy commented Oct 4, 2013

Agreed that this is a noteworthy wart.

Member
lindahua commented Oct 7, 2013

I agree that the choice of using max for both the element-wise operation that takes the larger values and the reduction to get the maximum was unfortunate. It causes all this mess and also forces us to use the ugly syntax like max(A, (), dim).

I am in favor of using maximum and minimum for the reduction purpose, and am more than happy to fix any issues caused by this breakage in the packages maintained by me. That being said, we should be cautious though, as this would be a major API change, given how widely these functions may have been used.

Contributor

We tried to fix this before by renaming the multi-arg versions of max and min to maxof and minof in #2419. I like the name, but as you can see it required changing many uses of max and min.

Member
lindahua commented Oct 9, 2013

I suspect that renaming the reduction versions from max and min to maximum and minimum might be relatively easier than renaming max(a, b) to maxof(a, b), as the latter seems to be used much more frequently.

Contributor

@lindahua I think you are right.

Owner

+1 for Dahua's names.

Owner

I'm fine with that renaming. It solves the () problem, but then there is the elementwise behavior. Would maximum(A) do reduce(max, A), or pick the largest element according to >? If max(a,b) were not elementwise, those two would be the same.

Owner

Here's an idea. The min and max functions without a dim keyword argument do reduction over each argument and then return the min/max across arguments – they're basically maximally greedy. If you give a dim keyword argument with a single input array, then reduce those dimensions of it. If you give multiple array arguments with a dim keyword, then each argument is dimension reduced and then parallel min/max is done across the reduced results. Examples:

min(1,2) => 1
min([1, 2, 3], 10, [0]) => 0
min([1, 2, 3], [4, 5, 6]) => 1
min([1 2 3; 3 2 1], dim=1) => [1 2 1]
min([1 2 3; 3 2 1], dim=2) => [1 1]'
min([1 2 3; 3 2 1], [2 0 2] dim=1) => [1 0 1]
min([1 2 3; 3 2 1], [3 2 1; 1 2 3] dim=()) => [1 2 1; 1 2 1]

I think this is equivalent to having dim default to 1:n where n is the maximum number of dimensions of its arguments, with the convention that reducing along dimensions that something doesn't actually have isn't an error, but rather just works. I think this smoothly dovetails the two meanings of min and max without introducing new function names and it generally Just Works™.

Owner

Not a bad idea, but I'm worried about all that reducing. Would max("ab","bc") still give "bc", or 'c'?

Owner

Ah, the dual nature of strings as both scalers and collections rears its homely head. I would vote to stick to doing this stuff for abstract arrays and leaving it at that. How often do you really want to reduce over the characters in a string?

Owner

This seems to be what Max does in mathematica (flattens all of its arguments). It strikes me as a bit odd, particularly in the case of arrays of arrays. But I think I like it more than the elementwise behavior (which can be done with clamp).

Honestly, at the end of the day I think all of these behaviors are non-obvious, and the one true max function is max(a,b) = a > b ? a : b. I also accept maximum(a) = reduce(max,a) (which also allows for maximum(A,dims)). Anything else is just hard to reason about. This is a pretty painful change but I think it's the only really sane approach.

Owner

I don't really see why the version I proposed is so hard to reason about but I can see the value in having different names. How do you do an element-wise max in your scheme? Is that max(A,B) where the arguments are matrices?

Owner

Absolutely not; as I said the one true max function is max(a,b) = a > b ? a : b. Elementwise max is done with clamp or map.

Owner

Your version is not so bad; it is basically the same as mine except that max also flattens out any AbstractArrays in its arguments.

Owner

I really don't think that asking people to remember max, maximum, clamp and map is great. I don't think i can keep them straight right now. What does clamp do here? Certainly it isn't easier to use than my thing, which mostly just does what you want.

Owner

I don't consider it confusing to give different names to different operations. If max is simple and well understood, and map is too, then combining them is easy. And there are several different operations being discussed here; it's going to be harder to cram them into one function.

True that elementwise max/min is a very useful operation, but your version doesn't provide it either.
clamp(x, lo, hi) limits x to between lo and hi. A typical use is clamp(image, 0, 255).

Owner

Sorry, I forgot you put the elementwise behavior into the dim keyword definition.

Owner

One obstacle is that reducing multiple dimensions still gives an array:

julia> sum(rand(2,2), 1:2)
1x1 Array{Float64,2}:
 1.18867
Owner

Right. That's definitely is a bit of an issue. We could drop reduced dimensions if some of the arguments don't have them.

Owner

FWIW, R does element-wise operations using pmax:

> min(c(1, 2, 3), c(3, 2, 1))
[1] 1
> pmin(c(1, 2, 3), c(3, 2, 1))
[1] 1 2 1
Member

There do exist some ambiguities of the semantics when we talk about element-wise max.

What is the desired behavior of max(a, b) when a and b are array of arrays. Currently, map(max, a, b) and max(a, b) give different results

julia> a = {[1, 3]};
julia> b = {[4, 2]};

julia> max(a, b)
1-element Array{Any,1}:
 [4,2]

julia> map(max, a, b)
1-element Array{Any,1}:
 [4,3]

This kind of inconsistent behavior is partly ascribed to how we define max currently:

max(x, y) = (x > y ? x : y)   # -- (1)
max{T1<:Real,T2<:Real}(x::AbstractArray{T1}, y<:AbstractArray{T2})  # -- (2)

When we call max(a, b), it selects the version (1), and since Julia allows comparing arrays as a whole using lexicographical order, it considers b as the greater one, and returns it.

When we call map(max, a, b), it applies max to each pair of elements. Since max(a[1], b[1]) is [4, 3], it returns {[4, 3]}.

I think that's why Jeff proposes to just keep max as simple as a > b ? a : b and remove the element-wise mapping functionality.

However, I still think it is worth keeping element-wise max as it is, because of several reasons:

  1. Element-wise max is very useful in practice, but map(max, a, b) is just simply too slow at current point. A very simple test on my machine gets the following, over two arrays of length 10^7, max(a, b) takes 0.085 second, while map(max, a, b) takes 1.21 second (warming was done before measuring). The latter is 14x slower than the former -- this is enough to justify a dedicated method with efficient implementation.
  2. clamp is not a proper function to replace max, especially when both a and b are arrays. Also max doesn't do the extra work to compare with the lower bound even when b is a scalar.
  3. If we just keep the max(a, b) = (a > b ? a: b) definition and remove the element-wise functionality. Then people will see max([1, 5], [3, 4]) as [3, 4] instead of what they expect, that is [3, 5]. This vastly contrasts with what all other technical computing platform would do, and it is a really big surprise to people.
  4. The element-wise definition did not actually cause a lot issues in practice -- that's why we notice this just a month ago when Jeff thought about it. If we change this, it would be a major breakage.

I really think that we should keep the element-wise version. However, we have to clean up the interface a little bit. For example, we should remove the T<:Real restriction from the element-wise version, such that the element-wise behavior also applies to Array{Any}.

In the mean time, the reduction functionality should be separated, and we may use maximum and minimum for reduction. This will reduce ambiguity in some cases and we can allow maximum(a, dim).

Member

The more I think about this, the more I feel the same with Jeff. Element-wise max/min does cause a lot of troubles when the elements themselves are arrays.

There are possibly two ways out of this mess:

  1. Restrict element-wise operations to arrays of real numbers, as
max(a, b) = (a > b ? a : b);
max(a::AbstractArray, b::AbstractArray) = error("  ... some useful error message ... ")
max{T1<:Real,T2<:Real}(a::AbstractArray{T1}, b::AbstractArray{T2}) = ... element wise max ...
max{T1<:Real,T2<:Real}(a::AbstractArray{T}, b::T2) = ...
max{T1<:Real,T2<:Real}(a::T1, b::AbstractArray{T2}) = ...
  1. Introduce a different name for element-wise max (say pmax as mentioned by John, which is used by R). Since map(max, a, b) remains too slow, a dedicated element-wise max function is needed. This introduces an additional name, and probably major breakage & surprise, but it is a way that is 100% clean -- zero ambiguity.
Member

A third way to reduce the ambiguity is to stop allowing arrays to be comparable. I don't really see why we should make arrays (as a whole) to be comparable. I never see people compare arrays in this way as [1, 4] < [2, 3]. Lexicographical comparison only makes sense for strings (and possible tuples), which fortunately are not defined as arrays in Julia.

If we disable comparison between arrays, then there won't be ambiguities when we use a single function to express both meaning, since max(a, b) will always have one behavior that makes sense -- when either a or b is an array, it does element-wise operation, otherwise, it picks the larger one. And it is simply not allowed to work on array of arrays, as the elements in such cases are not comparable.

Owner

"Lexicographical comparison only makes sense for strings"

This isn't quite true: there's a substantial literature on lexicographic preferences in psychology and economics.

Member

What about we use lexcompare or lexless for such purposes?

Owner

Works for me. We could also add another Base.Sort.Ordering to be `Lexicographic.

Member

And for getting the maximum over a collection of sequences/arrays using specific order (say lexicographical), we may write:

maximum(sequences, order=Lexicographic()) 
Member

I should have clarified my point. Whereas lexicographical comparison between arrays do has its value in specific context, it is not a general operation as the comparison between real numbers. Therefore, it would be beneficial to encourage users to explicitly express their intent when lexicographical comparison is what they intend to do. This will not only make the code clearer and less ambiguous, but also avoid having other users caught in a big surprise when they see max([1, 4], [2, 3]) produces [2, 3] instead of [2, 4].

Using specific functions for lexicographical comparison in the place of < for arrays also addresses the issue being discussed, that is, the ambiguities between the two meanings of max. Since lexicographical comparison between arrays is only used with in certain context, this change will not be a huge breakage which would otherwise caused by changed semantics of max and min.

Member

Here is a detailed proposal that comes with code skeleton:

Semantics:

  • max(a, b) yields the greater one, when a and b are comparable (and neither of them is an array). When a and b are floating-point values, special cases (e.g. NaN) will be taken care of using specialized methods.
  • max(a, b) performs element-wise operation when either argument is an array. But elements can not be arrays in general.
  • maximum(a) and maximum(a, dims) perform reduction.
  • An additional order argument may be provided for the use case where one want to do element-wise max between collections of arrays (e.g. using lexicographical order).

With this semantics, major of codes that currently use max will still work normally expected (except that codes that do reduction should use maximum instead of max), while ambiguities that might arise when using arrays of arrays would be eliminated.

Here is the codes to achieve this goal:

# the emax function is an internal function that is supposed to be used
# within element-wise and reduction functions. It is not exposed.
emax{T}(a::T, b::T) = (a < b ? b : a)
emax{T<:Real}(a::T, b::T) = (a < b ? b : a)
emax(a::Float32, b::Float32) = ... # special implementation that takes care of NaN
emax(a::Float64, b::Float64) = ... # special implementation that takes care of NaN
emax{T1<:Real, T2<:Real}(a::T1, b::T2) = emax(promote(a, b)...)
emax(a::AbstractArray, b::AbstractArray) = error(" ... useful error message ... ")
emax(a::AbstractArray, b) = error(" ... ")
emax(a, b::AbstractArray) = error(" ... ")

# to support element-wise max when special ordering is supplied
emax(a, b, ord::Ordering) = lt(ord, a, b) ?  b : a

## scalar max
# general case
max(a, b) = emax(a, b)
max(a, b, ord::Ordering) = emax(a, b, ord)

# element-wise for arrays
max(a::AbstractArray, b::AbstractArray) = ... # map(emax, a, b), implemented in an efficient way
max(a::AbstractArray, b) = ... # map(x -> emax(x, b), a) implemented in an efficient way
max(a, b::AbstractArray) = ... # map(x -> emax(a, x), b) implemented in an efficient way

# the following maps (x, y) --> emax(x, y, ord) in an element-wise manner in an efficient way
max(a::AbstractArray, b::AbstractArray, ord::Ordering) = ...
max(a::AbstractArray, b, ord::Ordering) = ...
max(a, b::AbstractArray, ord::Ordering) = ...

# reduction
function maximum(itr)  # reduction also use emax internally
    s = start(itr)
    m = next(itr, s)
    while !done(itr, s)
        (v, s) = next(itr, s)
        m = emax(m, v)
    end
    return m
end

maximum(itr, ord::Ordering) = ... # use ord for reduction
maximum(a::AbstractArray, dims::Dims) = ... # reduction along dimensions
maximum(a::AbstractArray, dims::Dims, ord::Ordering) = ... # reduction along dimensions using ord

# deprecate max(a) for single argument case in favor of maximum(a)
# deprecate max(a, (), dims) in favor of maximum(a, dims)

I think this properly address all the issues. The introduction of emax enables the use of max for scalar max and element-wise max without causing ambiguities (while cases that require special care, e.g. floating point max, are also taken care of).

Also, this would not introduce major breakage. The use of max(a) and max(a, (), dims) is deprecated gracefully.

Owner

I haven't caught up on this whole conversation, but I would note that while order=Lexicographic could be the underlying implementation, a better way to express this would be to write lt=lexless and then have a special case in the ord function that checks if lt === lexless and uses the Lexicographic ordering type. This is how it works currently for lt === isless and by === identity:

function ord(lt::Function, by::Function, rev::Bool, order::Ordering=Forward)
    o = (lt===isless) & (by===identity) ? order  :
        (lt===isless) & (by!==identity) ? By(by) :
        (lt!==isless) & (by===identity) ? Lt(lt) :
                                          Lt((x,y)->lt(by(x),by(y)))
    rev ? ReverseOrdering(o) : o
end

This isn't the most elegant thing in the world, but it's really just a stop-gap until comparison functions can be specialized on and inlined. It keeps the same interface that you'd want if that were the case while allowing specialization. I was actually considering removing the order keyword altogether – it exposes the user to far too much internal machinery, but decided to keep it because otherwise deprecation is a total nightmare.

Owner

@lindahua 's latest proposal looks pretty good. After some thought, I agree that making a < b lexicographic for arrays by default is more trouble than it's worth. It's also very different from what most people are used to.

It's a good idea to make maximum work like the sorting functions (to the extent that they are both about order).

One small change I'd propose is to implement emax as

emax(a,b) = max(a,b)
emax(a::AbstractArray, b::AbstractArray) = error()
emax(a, b::AbstractArray) = error()
emax(a::AbstractArray, b) = error()

so that max can still be extended by defining methods for max.

Member

@JeffBezanson: Yes, your modification makes sense.

Owner

This plan seems good to me although I'm a little concerned that the fancy NaN logic and other emax specializations don't compose with general orderings, but that's a bit of an implementation detail. The overall scheme seems sound.

Aside: I figured out a reasonable way to deprecate the order keyword argument: ad6903e.

Owner

Shame; in this particular case order=Lexicographic is a nice interface.

Owner

Well, I could always undeprecate it.

Owner

Next we have to decide how lexicographic ordering works. Maybe:

lexcmp(a, b) = cmp(a,b)
lexcmp(a::AbstractArray, b::AbstractArray) = # existing implementation of cmp for arrays

lexless(a,b) = lexcmp(a,b)<0
Owner

This may be taking it too far, but lexicographic ordering should maybe be parameterized by another ordering that's used for comparing elements.

Owner

Dahua's proposal with Jeff's modifications seems like a huge step forward.

I think Stefan's idea of having lexicographic ordering depend on another ordering seems sound to me.

We might also want to allow order = Generalized(A) for generalized inequalities on numeric arrays.

Owner

I don't understand the generalized thing. Can you elaborate?

Owner

Here's a brief introduction, where it explains how you can generalize component-wise inequality: http://www.nt.tuwien.ac.at/uploads/media/lecture7.pdf

Might be too obscure. Just one of the other useful orderings I know of for vectors after moving beyond lexicographic orderings.

Member

I think the easiest plan may be implement a subtype of Ordering called Lexicographic, and allow it to be supplied as an optional argument to the keyword param. We should proceed with the change of max and min functions as the plan, as this is blocking the 0.2 release.

The Generalized order mentioned by @johnmyleswhite is mainly used in convex optimization context and has to be defined with respect to a vector space. These things can be implemented in a package.

Member

I will make updates to NumericExtensions, Stats, Distributions, and other downstream packages as soon as these changes are merged.

@JeffBezanson JeffBezanson added a commit that referenced this issue Oct 12, 2013
@JeffBezanson JeffBezanson step 2 for #4235
make minimum and maximum use scalarmin and scalarmax, which are min
  and max but exclude arrays

remove isless for AbstractArray

add LexicographicOrdering and use it in sortrows and sortcols
currently LexicographicOrdering is implemented with cmp(), which does
  the right thing for arrays, but we might not want to keep it that way
7d5f80c
Owner

Just caught up with this thread. It would be great to do a blog post summarizing the design when all the changes are in.

Member
ihnorton commented Oct 2, 2015

Hi @rosangelaoliveira4, welcome to Julia. Please refer to the manual, and ask questions on StackOverflow or the julia-users mailing list. Thanks.

Member
jiahao commented Oct 4, 2015

This is not the right place to ask basic usage questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment