-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
reimplement afoldl using recursion
#54478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replace the hack implementation with something more elegant and
flexible.
FTR, this PR is part of a series of changes which (re)implement many of
the operations on tuples using a new recursive technique. The ultimate
goal is to hopefully increase the value of the loop-vs-recurse cutoff
(`Any32`, sometimes hardcoded `32`) for tuple operations.
As-is, this creates a performance regression for tuples with length
just above the cutoff, e.g., 32-40. This shouldn't matter once the
cutoff value is increased, IMO.
Demanding type inference example:
```julia-repl
julia> isconcretetype(Core.Compiler.return_type(foldl, Tuple{typeof((l,r) -> l => r),Tuple{Vararg{Int,30}}}))
true
```
|
I don't really see how the current one is a "hack implementation". It seems very straightforward compared to what replaces it here at least... |
Edited PR description to be more clear and less inflammatory.
Perhaps, but note that the first half of the change is not fold-specific, it will hopefully be reused for other implementations. See, e.g., #54479. |
This is going in the wrong direction. If the change is any good, it should perform better with a lower cutoff. Excessive unrolling as required by this PR is often a sign of an unreliable design |
This replaces an implementation where the unrolling is literally hardcoded. So I don't see how it's fair to say that my implementation is the one that requires unrolling. And I can't find any regressions other than the mentioned one. If the mentioned regression is really a problem, it'd be easy to just add another method for |
Also, I don't understand this. Surely increasing the cutoff value is a worthy goal of its own. |
| struct _TupleViewFront end | ||
| struct _TupleViewTail end | ||
| const _TupleView = Union{_TupleViewFront,_TupleViewTail} | ||
| _tupleview_length_representation_impl(n::Int) = ((nothing for _ ∈ OneTo(n))...,)::Tuple{Vararg{Nothing}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really necessary? The decrement is done with arithmetic anyway, so would an integer just work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's necessary, but not sure. Will check later.
|
We have been through a couple implementations of this function and it has been risky to touch it in the past. This needs more motivation; what is the big benefit we are after? |
This improved design is much closer to the current behavior, it shouldn't introduce any regressions.
My motivation, not sure if it's enough motivation for you devs, is to make it possible to increase the cuttoffs where the tuple operations switch to looping from the current limits at around thirty-ish to a (few?) hundred. The current BTW I think the new commit could make this PR more palatable. |
|
The test that fails now is caused by some preexisting issue that was only triggered now, I think. With this PR, function f()
as = ntuple(_ -> rand(), Val(10))
hypot(as...)
end
function g(as::Vararg{Float64,10})
hypot(as...)
endSo giving the compiler extra information causes the compiler to generate worse code? |
Okay but why? This has no inherent value so there is something missing here. Is it because you think the unrolled code will be faster to compile or execute. |
|
I think there are three objectives when it comes to implementations of functions acting on tuples like
Where it should be noted that 2. and 3. should also be examined for non-concrete argument types. Ideally, any new implementation should improve at least one of those while not doing worse on any of them. Otherwise, it's a matter of discussion to find suitable trade-offs. I'm unclear how the proposed change fares in this regard. |
|
So, I suspect this is an unpopular suggestion, but I'm going to throw it out there just in case. IMO, if your goal is
then I would say that a better tool to use is not recursion, but instead a If you want a function where you have control over how much unrolling actually happens, and you think you would benefit from unrolling by a factor of a few hundred, then a By the way, trying to benchmark julia> for N ∈ (1, 10, 30, 50, 100, 500, 1000)
tup = Ref(ntuple(identity, N))
@btime Base._tuple_foldl(+, $tup[])
end
1.943 ns (0 allocations: 0 bytes)
2.584 ns (0 allocations: 0 bytes)
2.384 ns (0 allocations: 0 bytes)
2.785 ns (0 allocations: 0 bytes)
4.729 ns (0 allocations: 0 bytes)
5.241 ms (128297 allocations: 4.93 MiB)
Internal error: stack overflow in type inference of ##sample#270(Tuple{Base.RefValue{NTuple{1000, Int64}}}, BenchmarkTools.Parameters).
This might be caused by recursion over very long tuples or argument lists.
Internal error: stack overflow in type inference of ##sample#270(Tuple{Base.RefValue{NTuple{1000, Int64}}}, BenchmarkTools.Parameters).
This might be caused by recursion over very long tuples or argument lists.
Internal error: stack overflow in type inference of ##sample#270(Tuple{Base.RefValue{NTuple{1000, Int64}}}, BenchmarkTools.Parameters).
[...] |
Yeah, this is a known Julia issue, demanding type inference may result in a stack overflow. I guess this is the main reason why the cutoffs exist. The issue was certainly considered while I was coding up this PR. BTW, I've since started exploring an alternative design, so these PRs will probably be closed, see https://discourse.julialang.org/t/this-month-in-julia-world-2024-06/116704/2 |
All the more reason to exercise recursion more in |
The new implementation is more elegant and flexible, doing away with the code duplication and extreme amounts of constant-hardcoding.
FTR, this PR is part of a series of changes which (re)implement many of the operations on tuples using a new recursive technique. The ultimate goal is to hopefully increase the value of the loop-vs-recurse cutoff (
Any32, sometimes hardcoded32) for tuple operations.Demanding type inference example: