Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define the output order from map #50857

Open
CameronBieganek opened this issue Aug 9, 2023 · 9 comments
Open

Define the output order from map #50857

CameronBieganek opened this issue Aug 9, 2023 · 9 comments
Labels
domain:docs This change adds or pertains to documentation

Comments

@CameronBieganek
Copy link
Contributor

Currently the map docstring makes no assertion about the order of the output collection from map. So, map(x -> 2x, 1:2) == [4, 2] is conformant with the map docstring. Thus, strictly speaking, no one should ever use map unless they are going to perform a reduction (with an associative operator) on the output. Here is a proposal for an updated docstring that defines the output order:

"""
    map(f, x...)

Transform iterable `x` by applying `f` to each element. The order of the output matches the order of the input.  
In other words, the following expression is always true:

all(zip(map(f, xs), xs)) do (fx, x)
    fx == f(x)
end

For multiple collection arguments, apply `f` elementwise, and stop when when any of them is exhausted.
"""

Note that I've also switched the terminology from "collection" to "iterable", since map works on arbitrary iterable objects which are not necessarily collections. For example, length throws a MethodError when called on an iterator like

itr = (sin(x) for x in 1:10 if sin(x) > 0)

and the manual implies that length must be implemented for an object to be considered a collection.

The docstring for zip is also a bit vague, so in order to adopt the above change to the map docstring, we would also need to clarify the zip docstring.

@brenhinkeller brenhinkeller added the domain:docs This change adds or pertains to documentation label Aug 9, 2023
@Seelengrab
Copy link
Contributor

Seelengrab commented Aug 9, 2023

We could be even more precise, by matching the definition here; in lay terms, applying a function while preserving the structure of the object we're mapping over (as well as respecting composition of mapped functions).

The only downside is that we can't use "functor" for that, because we've chosen that to mean a callable struct instead..

@CameronBieganek
Copy link
Contributor Author

CameronBieganek commented Aug 9, 2023

Well, map doesn't actually adhere to the identity axiom:

julia> map(identity, Iterators.take(1:3, 2))
2-element Vector{Int64}:
 1
 2

Also, as far as I can tell, the functor rules don't actually imply anything about the order of the output, except in the identity case.

@bvdmitri
Copy link
Contributor

bvdmitri commented Aug 9, 2023

the following expression is always true:

all(zip(map(f, xs), xs)) do (fx, x)
    fx == f(x)
end

this is a bit misleading as it is not always true in general. For this to hold the collection must be non stateful, f must be pure. And it is actually may be a non-Bool value as well even for pure functions:

julia> f(x) = [ 1, 2, 3, x, missing ]
f (generic function with 1 method)

julia> xs = [1, 2]
2-element Vector{Int64}:
 1
 2

julia> all(zip(map(f, xs), xs)) do (fx, x)
           fx == f(x)
       end
missing

And yeah, the order of zip is actually not guaranteed either.

@CameronBieganek
Copy link
Contributor Author

CameronBieganek commented Aug 10, 2023

I thought about stateful iterators a while ago, but I finally got around to opening the issue today. So I forgot to add a copy to handle stateful iterators. Regarding non-Bool values, I suppose the comparison should be with isequal. So, combining those two changes, we have

all(zip(map(f, copy(xs)), xs)) do (fx, x)
    isequal(fx, f(x))
end

Impure functions seem harder to handle. The following works, but it feels like it works for the wrong reasons:

julia> f(x) = push!(x, 10)
f (generic function with 1 method)

julia> xs = [[1, 2], [3, 4]]
2-element Vector{Vector{Int64}}:
 [1, 2]
 [3, 4]

julia> all(zip(map(f, copy(xs)), xs)) do (fx, x)
           a = f(x)
           @show a
           @show fx
           isequal(fx, a)
       end
a = [1, 2, 10, 10]
fx = [1, 2, 10, 10]
a = [3, 4, 10, 10]
fx = [3, 4, 10, 10]
true

There's probably a counter-example with an impure function where the all expression returns false...

At any rate, the all expression is starting to look a bit too unwieldy to add to the docstring. Perhaps there's a better way to describe the output order in prose. The best I can think of at the moment is something like this:

The order of the output matches the order of the input. For example,
the first element of the output is `f(first(x))` and the last element of the
output is `f(last(x))`.

And yeah, the order of zip is actually not guaranteed either.

I think that comment is referring to the fact that, for zip(itr1, itr2), it is undefined whether the $i$-th tuple is evaluated by first iterating itr1 and then iterating itr2 or by first iterating itr2 and then iterating itr1. In other words, the evaluation of the $i$-th tuple could be done either like this:

a, state1 = iterate(itr1, state1)
b, state2 = iterate(itr2, state2)
return (a, b)

or like this:

b, state2 = iterate(itr2, state2)
a, state1 = iterate(itr1, state1)
return (a, b)

The order of the output tuples, on the other hand, is most definitely intended to correspond to the order of the input tuples, except for weird cases with aliased, stateful iterators (e.g. itr = Iterators.Stateful(1:5); zip(itr, itr)). In other words, the first tuple should normally be (first(itr1), first(itr2)).

@CameronBieganek
Copy link
Contributor Author

I should reiterate the main message of my original post:

Without a guarantee in the docstring on the order of the output, no one should ever use map unless they are going to perform a reduction (with an associative operator) on the output.

@bvdmitri
Copy link
Contributor

I agree with you issue, I just feel it unnecessary to come up with a "code explanation", which has many other implicit assumptions. You've explained it pretty well in one sentence "The order of the output matches the order of the input.". That should be enough?

@jishnub
Copy link
Contributor

jishnub commented Aug 12, 2023

Since you're referring to the order being undefined, do you mean that the reduction operator should be commutative?

@CameronBieganek
Copy link
Contributor Author

Yeah, I guess the operator needs to be both associative and commutative to be able to arbitrarily rearrange the order of the sequence without changing the value of the reduction. But that's not part of the proposed docstring. That was just me complaining about the vagueness of the current map docstring.

@nsajko
Copy link
Contributor

nsajko commented Nov 3, 2023

Related observation: the foreach doc string says:

foreach should be used instead of map when the results of f are not needed, for example in foreach(println, array).

This kind of (but not really, perhaps) implies the expected output order of map.

It was added all the way back in 71fc3b3, together with the NEWS entry for foreach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:docs This change adds or pertains to documentation
Projects
None yet
Development

No branches or pull requests

6 participants