Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsumes flatmap functionality under flatten #45985

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nlw0
Copy link
Contributor

@nlw0 nlw0 commented Jul 10, 2022

A proposal following comments to #44792. This patch removes the new Iterators.flatmap and implements the same functionality as a new flatten method that can take a function parameter. flatten is also introduced to Base similar to map, producing concrete arrays.

I personally don't really like the name flatten. I believe many programmers are probably not very familiar with the great importance of flatmap when you're coding in this style that relies more on functional combinators than on procedural for-loops. That means, for those following this style, we'll be writing a lot of code using the term flatten where there's something else happening than just flattening. We are flat-mapping. It would be great to have a more accurate term.

This patch also extends what was done in #44792 by bringing flatten to Base, producing concrete vectors through collect.

@nlw0
Copy link
Contributor Author

nlw0 commented Jul 10, 2022

@tkf @aplavin @thchr I'd love to hear your thoughts.

Although I'm not a fan of sticking to flatten, I'd be pleased if we also had stack as an array-focused alternative, as proposed by @mcabbott in #43334. With that we would have a general-purpose flatten and an array-specialized stack, which I see as a useful interface building on top of hvncat, and both would offer the alternative of being used as "flatmap" if given multiple arguments instead of a single collection-of-collections. The open question then would be: shouldn't they be the same function as well?... I don't know, I feel it's useful to keep a separate array-specialized version, and I'm not sure mixing general-purpose "flattening" would work out fine once we start having specialized implementations for different datatypes.

I would like to further illustrate a possible use of flatmap that was perhaps not clear in the previous PR. In other languages there are Optional and Future objects that can be used the following way to perform a kind of function chaining like the following. Suppose you have these two functions that take a normal value and compute it in a Task:

f(x) = @spawn x^2*3+4
g(x) = @spawn x^3*2+1

Now we want to compose these functions, we can't simply compose them, because we need a regular value and we get a 'Task'. We need to do something like that:

fetch(g(fetch(f(x))))

The idea is that you could "compose" these with flatmap like this:

fetch(flatmap(g, f(x)))

I know this may sound weird, unusual or futile to programmers who have never met the idea before. I'm just trying to raise awareness of this important programming pattern that is out there, and that I believe could be useful for Julia programmers. Now this function isn't really performing what can be easily understood as "flattening" a list of lists. It's doing something different. That's why it's nice to have a different name. flatmap just turns out to be a popular name in other languages.

@mcabbott
Copy link
Contributor

mcabbott commented Jul 10, 2022

Maybe this should be a PR and an issue? My suggestion at #44792 (comment) was only to avoid making a whole new verb for Iterators.flatten ∘ Iterators.map.

Making yet another verb for collect ∘ Iterators.flatten is another question, which you can see if people find an interest in. (I do not think it's as close to the proposed stack as you suggest, and I do not see the relation to hvncat.)

@nlw0
Copy link
Contributor Author

nlw0 commented Jul 10, 2022

@mcabbott there's no new verb, no flatmap anymore, Iterators.flatten returns an iterator, and Base.flatten returns a vector, similar to Base.map and Iterators.map. The only new thing is to offer Base.flatten, and also flatten(f, x) works as flatmap.

hvncat can handle multidimensional arrays, either collecting them into a new dimension, or any other dimension, eg

julia> hvncat(2, 1:2, 3:4)
2×2 Matrix{Int64}:
 1  3
 2  4

julia> hvncat(3, [1 3;2 4], [5 7; 6 8])
2×2×2 Array{Int64, 3}:
[:, :, 1] =
 1  3
 2  4

[:, :, 2] =
 5  7
 6  8

julia> reduce((a,b)->hvncat(3, a,b), [[1 3;2 4], [5 7; 6 8], [9 11; 10 12]])
2×2×3 Array{Int64, 3}:
[:, :, 1] =
 1  3
 2  4

[:, :, 2] =
 5  7
 6  8

[:, :, 3] =
  9  11
 10  12

julia> hvncat(2, [1 3;2 4], [5 7 9; 6 8 10])
2×5 Matrix{Int64}:
 1  3  5  7   9
 2  4  6  8  10

I see this as an array-specific form of flatten, which must handle the concept of dimensions, and can be extended to take a function parameter like flatten and flatmap. What is the difference to the proposed stack, apart from potential performance improvements?

Edit: I wasn't familiar with how cat works, and it's pretty similar but with a dims argument. The difference from stack as I understand is being able to take an array of arrays and use that to guide the concatenations in multiple dimensions.

I don't understand how you don't understand it's related to cat, it's clearly related, but not the same thing. It's also certainly close to flatten, the difference being what I'm calling an "array-specialized" version. Iterators.flatten, and the proposed Base.flatten simply ignores array shape information. It might receive an optional shape argument, and just reshape the output, and that might make it an alternative to either cat or stack. The relationship is pretty clear to me.

@mcabbott
Copy link
Contributor

The only new thing is to offer Base.flatten

Correct, this is a new symbol, and is what I suggest ought to be discussed separately from the rest of the PR. This one is quick, while an eager flatten is something which may need more discussion, and is probably going to want a lot more work on performance -- there is no reason it need be slower than stack on the same data:

julia> xs = (rand(100) for _ in 1:100);

julia> @btime stack($xs);
  min 22.000 μs, mean 37.096 μs (102 allocations, 165.67 KiB)

julia> @btime collect(Iterators.flatten($xs));
  min 88.833 μs, mean 130.521 μs (110 allocations, 414.11 KiB)

Obviously stack and cat are not wholly unrelated; the comment before your first gives examples of some differences (as does the docstring).

@nlw0
Copy link
Contributor Author

nlw0 commented Jul 10, 2022

I see the question of what functions should be available, and what should be their names as very distinct to how they should be implemented to improve its performance.

@mcabbott
Copy link
Contributor

You can make the case for Base.flatten without mentioning performance, if you choose. I don't think it's a crazy idea. There will still be various choices to discuss, e.g. what type should flatten(([1 2], [5.0])) return? Should it be called by comprehensions? These questions might take a while, and aren't really related to Iterators.flatten(f, x), which is why I suggest this PR focus on one thing.

@nlw0
Copy link
Contributor Author

nlw0 commented Jul 10, 2022

I'm only proposing this PR to have "flatmap" functionality in Base. The added name change is just an attempt to entice other developers who often react to my PRs with frowning-face emojis.

I'm very happy with Iterators.flatmap, and I'm making my color-of-the-bikeshed symbol preference very clear. I'll accept whatever string is required for us to be able to enjoy the new functionality. Another good name might actually be bind.

#44792 was reviewed, supported, questioned, discussed, approved and merged. I can't stop anyone from making another PR that reverses it. I would just worry about the future of a project where one might witness something akin to a wikipedia edit war.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants