Subsumes flatmap functionality under flatten #45985

nlw0 · 2022-07-10T09:46:36Z

A proposal following comments to #44792. This patch removes the new Iterators.flatmap and implements the same functionality as a new flatten method that can take a function parameter. flatten is also introduced to Base similar to map, producing concrete arrays.

I personally don't really like the name flatten. I believe many programmers are probably not very familiar with the great importance of flatmap when you're coding in this style that relies more on functional combinators than on procedural for-loops. That means, for those following this style, we'll be writing a lot of code using the term flatten where there's something else happening than just flattening. We are flat-mapping. It would be great to have a more accurate term.

This patch also extends what was done in #44792 by bringing flatten to Base, producing concrete vectors through collect.

nlw0 · 2022-07-10T10:45:59Z

@tkf @aplavin @thchr I'd love to hear your thoughts.

Although I'm not a fan of sticking to flatten, I'd be pleased if we also had stack as an array-focused alternative, as proposed by @mcabbott in #43334. With that we would have a general-purpose flatten and an array-specialized stack, which I see as a useful interface building on top of hvncat, and both would offer the alternative of being used as "flatmap" if given multiple arguments instead of a single collection-of-collections. The open question then would be: shouldn't they be the same function as well?... I don't know, I feel it's useful to keep a separate array-specialized version, and I'm not sure mixing general-purpose "flattening" would work out fine once we start having specialized implementations for different datatypes.

I would like to further illustrate a possible use of flatmap that was perhaps not clear in the previous PR. In other languages there are Optional and Future objects that can be used the following way to perform a kind of function chaining like the following. Suppose you have these two functions that take a normal value and compute it in a Task:

f(x) = @spawn x^2*3+4
g(x) = @spawn x^3*2+1

Now we want to compose these functions, we can't simply compose them, because we need a regular value and we get a 'Task'. We need to do something like that:

fetch(g(fetch(f(x))))

The idea is that you could "compose" these with flatmap like this:

fetch(flatmap(g, f(x)))

I know this may sound weird, unusual or futile to programmers who have never met the idea before. I'm just trying to raise awareness of this important programming pattern that is out there, and that I believe could be useful for Julia programmers. Now this function isn't really performing what can be easily understood as "flattening" a list of lists. It's doing something different. That's why it's nice to have a different name. flatmap just turns out to be a popular name in other languages.

mcabbott · 2022-07-10T10:58:44Z

Maybe this should be a PR and an issue? My suggestion at #44792 (comment) was only to avoid making a whole new verb for Iterators.flatten ∘ Iterators.map.

Making yet another verb for collect ∘ Iterators.flatten is another question, which you can see if people find an interest in. (I do not think it's as close to the proposed stack as you suggest, and I do not see the relation to hvncat.)

nlw0 · 2022-07-10T11:28:50Z

@mcabbott there's no new verb, no flatmap anymore, Iterators.flatten returns an iterator, and Base.flatten returns a vector, similar to Base.map and Iterators.map. The only new thing is to offer Base.flatten, and also flatten(f, x) works as flatmap.

hvncat can handle multidimensional arrays, either collecting them into a new dimension, or any other dimension, eg

julia> hvncat(2, 1:2, 3:4)
2×2 Matrix{Int64}:
 1  3
 2  4

julia> hvncat(3, [1 3;2 4], [5 7; 6 8])
2×2×2 Array{Int64, 3}:
[:, :, 1] =
 1  3
 2  4

[:, :, 2] =
 5  7
 6  8

julia> reduce((a,b)->hvncat(3, a,b), [[1 3;2 4], [5 7; 6 8], [9 11; 10 12]])
2×2×3 Array{Int64, 3}:
[:, :, 1] =
 1  3
 2  4

[:, :, 2] =
 5  7
 6  8

[:, :, 3] =
  9  11
 10  12

julia> hvncat(2, [1 3;2 4], [5 7 9; 6 8 10])
2×5 Matrix{Int64}:
 1  3  5  7   9
 2  4  6  8  10

I see this as an array-specific form of flatten, which must handle the concept of dimensions, and can be extended to take a function parameter like flatten and flatmap. What is the difference to the proposed stack, apart from potential performance improvements?

Edit: I wasn't familiar with how cat works, and it's pretty similar but with a dims argument. The difference from stack as I understand is being able to take an array of arrays and use that to guide the concatenations in multiple dimensions.

I don't understand how you don't understand it's related to cat, it's clearly related, but not the same thing. It's also certainly close to flatten, the difference being what I'm calling an "array-specialized" version. Iterators.flatten, and the proposed Base.flatten simply ignores array shape information. It might receive an optional shape argument, and just reshape the output, and that might make it an alternative to either cat or stack. The relationship is pretty clear to me.

mcabbott · 2022-07-10T12:37:07Z

The only new thing is to offer Base.flatten

Correct, this is a new symbol, and is what I suggest ought to be discussed separately from the rest of the PR. This one is quick, while an eager flatten is something which may need more discussion, and is probably going to want a lot more work on performance -- there is no reason it need be slower than stack on the same data:

julia> xs = (rand(100) for _ in 1:100);

julia> @btime stack($xs);
  min 22.000 μs, mean 37.096 μs (102 allocations, 165.67 KiB)

julia> @btime collect(Iterators.flatten($xs));
  min 88.833 μs, mean 130.521 μs (110 allocations, 414.11 KiB)

Obviously stack and cat are not wholly unrelated; the comment before your first gives examples of some differences (as does the docstring).

nlw0 · 2022-07-10T15:12:25Z

I see the question of what functions should be available, and what should be their names as very distinct to how they should be implemented to improve its performance.

mcabbott · 2022-07-10T16:24:00Z

You can make the case for Base.flatten without mentioning performance, if you choose. I don't think it's a crazy idea. There will still be various choices to discuss, e.g. what type should flatten(([1 2], [5.0])) return? Should it be called by comprehensions? These questions might take a while, and aren't really related to Iterators.flatten(f, x), which is why I suggest this PR focus on one thing.

nlw0 · 2022-07-10T17:07:21Z

I'm only proposing this PR to have "flatmap" functionality in Base. The added name change is just an attempt to entice other developers who often react to my PRs with frowning-face emojis.

I'm very happy with Iterators.flatmap, and I'm making my color-of-the-bikeshed symbol preference very clear. I'll accept whatever string is required for us to be able to enjoy the new functionality. Another good name might actually be bind.

#44792 was reviewed, supported, questioned, discussed, approved and merged. I can't stop anyone from making another PR that reverses it. I would just worry about the future of a project where one might witness something akin to a wikipedia edit war.

nlw0 added 3 commits July 10, 2022 11:37

subsumes flatmap functionality under flatten

b10d8d8

flatmap->flatten on tests

3b854d5

export flatten to Base

8545ac5

nlw0 mentioned this pull request Oct 17, 2022

Add stack(array_of_arrays) #43334

Merged

nlw0 mentioned this pull request Oct 25, 2022

block method to assemble multi-dimensional arrays #46003

Open

LilithHafner mentioned this pull request Oct 26, 2022

Replace flatmap with a flatten method #47327

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subsumes flatmap functionality under flatten #45985

Subsumes flatmap functionality under flatten #45985

nlw0 commented Jul 10, 2022

nlw0 commented Jul 10, 2022 •

edited

Loading

mcabbott commented Jul 10, 2022 •

edited

Loading

nlw0 commented Jul 10, 2022 •

edited

Loading

mcabbott commented Jul 10, 2022

nlw0 commented Jul 10, 2022

mcabbott commented Jul 10, 2022

nlw0 commented Jul 10, 2022

Subsumes flatmap functionality under flatten #45985

Are you sure you want to change the base?

Subsumes flatmap functionality under flatten #45985

Conversation

nlw0 commented Jul 10, 2022

nlw0 commented Jul 10, 2022 • edited Loading

mcabbott commented Jul 10, 2022 • edited Loading

nlw0 commented Jul 10, 2022 • edited Loading

mcabbott commented Jul 10, 2022

nlw0 commented Jul 10, 2022

mcabbott commented Jul 10, 2022

nlw0 commented Jul 10, 2022

nlw0 commented Jul 10, 2022 •

edited

Loading

mcabbott commented Jul 10, 2022 •

edited

Loading

nlw0 commented Jul 10, 2022 •

edited

Loading