Skip to content

Conversation

@nsajko
Copy link
Member

@nsajko nsajko commented May 15, 2024

The new implementation is more elegant and flexible, doing away with the code duplication and extreme amounts of constant-hardcoding.

FTR, this PR is part of a series of changes which (re)implement many of the operations on tuples using a new recursive technique. The ultimate goal is to hopefully increase the value of the loop-vs-recurse cutoff (Any32, sometimes hardcoded 32) for tuple operations.

Demanding type inference example:

julia> isconcretetype(Core.Compiler.return_type(foldl, Tuple{typeof((l,r) -> l => r),Tuple{Vararg{Int,30}}}))
true

Replace the hack implementation with something more elegant and
flexible.

FTR, this PR is part of a series of changes which (re)implement many of
the operations on tuples using a new recursive technique. The ultimate
goal is to hopefully increase the value of the loop-vs-recurse cutoff
(`Any32`, sometimes hardcoded `32`) for tuple operations.

As-is, this creates a performance regression for tuples with length
just above the cutoff, e.g., 32-40. This shouldn't matter once the
cutoff value is increased, IMO.

Demanding type inference example:

```julia-repl
julia> isconcretetype(Core.Compiler.return_type(foldl, Tuple{typeof((l,r) -> l => r),Tuple{Vararg{Int,30}}}))
true
```
@nsajko nsajko added collections Data structures holding multiple items, e.g. sets fold sum, maximum, reduce, foldl, etc. labels May 15, 2024
@KristofferC
Copy link
Member

I don't really see how the current one is a "hack implementation". It seems very straightforward compared to what replaces it here at least...

@nsajko
Copy link
Member Author

nsajko commented May 15, 2024

I don't really see how the current one is a "hack implementation".

Edited PR description to be more clear and less inflammatory.

very straightforward compared to what replaces it here at least...

Perhaps, but note that the first half of the change is not fold-specific, it will hopefully be reused for other implementations. See, e.g., #54479.

@vtjnash
Copy link
Member

vtjnash commented May 15, 2024

This shouldn't matter once the cutoff value is increased

This is going in the wrong direction. If the change is any good, it should perform better with a lower cutoff. Excessive unrolling as required by this PR is often a sign of an unreliable design

@nsajko
Copy link
Member Author

nsajko commented May 15, 2024

Excessive unrolling as required by this PR

This replaces an implementation where the unrolling is literally hardcoded. So I don't see how it's fair to say that my implementation is the one that requires unrolling. And I can't find any regressions other than the mentioned one.

If the mentioned regression is really a problem, it'd be easy to just add another method for Any32, which would be similar or equal to the current afoldl implementation. EDIT: in the current state of the PR there should be no regressions.

@nsajko
Copy link
Member Author

nsajko commented May 15, 2024

If the change is any good, it should perform better with a lower cutoff.

Also, I don't understand this. Surely increasing the cutoff value is a worthy goal of its own.

@nsajko nsajko marked this pull request as draft May 16, 2024 15:35
struct _TupleViewFront end
struct _TupleViewTail end
const _TupleView = Union{_TupleViewFront,_TupleViewTail}
_tupleview_length_representation_impl(n::Int) = ((nothing for _ OneTo(n))...,)::Tuple{Vararg{Nothing}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? The decrement is done with arithmetic anyway, so would an integer just work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's necessary, but not sure. Will check later.

@JeffBezanson
Copy link
Member

We have been through a couple implementations of this function and it has been risky to touch it in the past. This needs more motivation; what is the big benefit we are after?

This improved design is much closer to the current behavior, it
shouldn't introduce any regressions.
@nsajko nsajko marked this pull request as ready for review May 18, 2024 14:06
@nsajko
Copy link
Member Author

nsajko commented May 18, 2024

This needs more motivation; what is the big benefit we are after?

My motivation, not sure if it's enough motivation for you devs, is to make it possible to increase the cuttoffs where the tuple operations switch to looping from the current limits at around thirty-ish to a (few?) hundred. The current afoldl implementation approach would require a massive source code increase to accomplish this.

BTW I think the new commit could make this PR more palatable.

@nsajko nsajko marked this pull request as draft May 18, 2024 15:15
@nsajko
Copy link
Member Author

nsajko commented May 18, 2024

The test that fails now is caused by some preexisting issue that was only triggered now, I think. With this PR, f allocates even though g doesn't:

function f()
    as = ntuple(_ -> rand(), Val(10))
    hypot(as...)
end

function g(as::Vararg{Float64,10})
    hypot(as...)
end

So giving the compiler extra information causes the compiler to generate worse code?

@KristofferC
Copy link
Member

not sure if it's enough motivation for you devs, is to make it possible to increase the cuttoffs where the tuple operations switch to looping from the current limits at around thirty-ish to a (few?) hundred

Okay but why? This has no inherent value so there is something missing here. Is it because you think the unrolled code will be faster to compile or execute.

@martinholters
Copy link
Member

I think there are three objectives when it comes to implementations of functions acting on tuples like afold:

  1. low run-time
  2. low compile-time
  3. inference precision.

Where it should be noted that 2. and 3. should also be examined for non-concrete argument types.

Ideally, any new implementation should improve at least one of those while not doing worse on any of them. Otherwise, it's a matter of discussion to find suitable trade-offs. I'm unclear how the proposed change fares in this regard.

@MasonProtter
Copy link
Contributor

MasonProtter commented Jul 7, 2024

So, I suspect this is an unpopular suggestion, but I'm going to throw it out there just in case. IMO, if your goal is

My motivation, not sure if it's enough motivation for you devs, is to make it possible to increase the cuttoffs where the tuple operations switch to looping from the current limits at around thirty-ish to a (few?) hundred. The current afoldl implementation approach would require a massive source code increase to accomplish this.

then I would say that a better tool to use is not recursion, but instead a @generated function. Recursion is subject to all sorts of opaque compiler decisions that are not customizable, and trying to evade these limits usually leads to highly opaque code which can unexpectedly blow up when some internal heuristic changes.

If you want a function where you have control over how much unrolling actually happens, and you think you would benefit from unrolling by a factor of a few hundred, then a @generated function gives a cleaner and more controllable way to do that than this recursive technique in my opinion.

By the way, trying to benchmark _tuple_foldl directly for large Tuples caused a stack overflow on me:

julia> for N  (1, 10, 30, 50, 100, 500, 1000)
           tup = Ref(ntuple(identity, N))
           @btime Base._tuple_foldl(+, $tup[])
       end
  1.943 ns (0 allocations: 0 bytes)
  2.584 ns (0 allocations: 0 bytes)
  2.384 ns (0 allocations: 0 bytes)
  2.785 ns (0 allocations: 0 bytes)
  4.729 ns (0 allocations: 0 bytes)
 5.241 ms (128297 allocations: 4.93 MiB)
Internal error: stack overflow in type inference of ##sample#270(Tuple{Base.RefValue{NTuple{1000, Int64}}}, BenchmarkTools.Parameters).
This might be caused by recursion over very long tuples or argument lists.
Internal error: stack overflow in type inference of ##sample#270(Tuple{Base.RefValue{NTuple{1000, Int64}}}, BenchmarkTools.Parameters).
This might be caused by recursion over very long tuples or argument lists.
Internal error: stack overflow in type inference of ##sample#270(Tuple{Base.RefValue{NTuple{1000, Int64}}}, BenchmarkTools.Parameters).
[...]

@nsajko
Copy link
Member Author

nsajko commented Jul 7, 2024

trying to benchmark _tuple_foldl directly for large Tuples caused a stack overflow on me

Yeah, this is a known Julia issue, demanding type inference may result in a stack overflow. I guess this is the main reason why the cutoffs exist. The issue was certainly considered while I was coding up this PR. BTW, I've since started exploring an alternative design, so these PRs will probably be closed, see https://discourse.julialang.org/t/this-month-in-julia-world-2024-06/116704/2

@nsajko
Copy link
Member Author

nsajko commented Jul 7, 2024

which can unexpectedly blow up when some internal heuristic changes

All the more reason to exercise recursion more in Base?

@nsajko nsajko closed this Jul 14, 2024
@nsajko nsajko deleted the reimplement_afoldl branch July 14, 2024 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

collections Data structures holding multiple items, e.g. sets fold sum, maximum, reduce, foldl, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants