Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Less aggressive recursion limiting #48059

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

RFC: Less aggressive recursion limiting #48059

wants to merge 1 commit into from

Conversation

Keno
Copy link
Member

@Keno Keno commented Dec 31, 2022

Our recusion heuristic works by detection recursion of edges of methods (N.B.: Methods, not specializations). This works reasonably well, but there are some pathological cases that defeat it. One common one is to have a wrapper function that calls an internal function, e.g.

mymap(f, x) = map(f, x)

If a higher order function is written with such a pattern, it is quite easy to run into the recursion limit even in legitimate cases. For example, with the above definition, a fuction like:

f(x) = mymap(x) do t
mymap(sin, t)
end

will fail to get precise inference. There's various other patterns that cause similar issues, e.g. optional arguments and keyword arguments and is one of the more common causes of inference suboptimalities.

This PR attempts to relax this criterion significantly. It is still based on methods, but considers the entire recursion path rather than just a single edge. So for example, in our current heuristic, we would limit:

E -> A -> B -> C -> A -> B

immediately, but with the proposed heuristic we would not limit it until we reach:

E -> A -> B -> C -> A -> B -> C

And in particular, we would not limit

E -> A -> B -> C -> A -> B -> D -> A -> B -> E

even though the A->B edge repeats frequently. This is intentional to allow code that has a central dispatch function (e.g. Diffractor has code patterns like that).

If this turns out to be not aggressive enough, we could consider imposing additional limitations on the intermediate edges, but I think this is worth a try.

Our recusion heuristic works by detection recursion of edges of
methods (N.B.: Methods, not specializations). This works reasonably
well, but there are some pathological cases that defeat it. One
common one is to have a wrapper function that calls an internal
function, e.g.

```
mymap(f, x) = _mymap(f, x)
```

If a higher order function is written with such a pattern, it is
quite easy to run into the recursion limit even in legitimate cases.
For example, with the above definition, a fuction like:

```
f(x) = mymap(x) do t
mymap(sin, t)
end
```

will fail to get precise inference. There's various other patterns
that cause similar issues, e.g. optional arguments and keyword arguments
and is one of the more common causes of inference suboptimalities.

This PR attempts to relax this criterion significantly. It is still
based on methods, but considers the entire recursion path rather
than just a single edge. So for example, in our current heuristic,
we would limit:

    E -> A -> B -> C -> A -> B

immediately, but with the proposed heuristic we would not limit it
until we reach:

    E -> A -> B -> C -> A -> B -> C

And in particular, we would not limit

    E -> A -> B -> C -> A -> B -> D -> A -> B -> E

even though the `A->B` edge repeats frequently. This is intentional
to allow code that has a central dispatch function (e.g. Diffractor
has code patterns like that).

If this turns out to be not aggressive enough, we could consider
imposing additional limitations on the intermediate edges, but I
think this is worth a try.
@Keno
Copy link
Member Author

Keno commented Dec 31, 2022

Guess I should have said, this does explicitly address the example given in the commit message, though that was just an example, not the primary motivation.
Master:

julia> mymap(f, x) = map(f, x)
mymap (generic function with 1 method)

julia> f(x) = mymap(x) do t
       mymap(sin, t)
       end
f (generic function with 1 method)

julia> (@code_typed f(Vector{Float64}[[1.],[2.],[3.]]))[2]
Vector (alias for Array{_A, 1} where _A)

PR:

julia> (@code_typed f(Vector{Float64}[[1.],[2.],[3.]]))[2]
Vector{Vector{Float64}} (alias for Array{Array{Float64, 1}, 1})

@oscardssmith
Copy link
Member

does this notably effect inference speed? It's not obvious to me that this heuristic guarantees termination.

@Keno
Copy link
Member Author

Keno commented Dec 31, 2022

Fixes at least #47694, #40084, #46557, #31315, #29298, but of course the difficulty here is one of line drawing, not just fixing it itself, but hopefully that list will give a few cases to check if we want to restrict this further.

@Keno
Copy link
Member Author

Keno commented Dec 31, 2022

does this notably effect inference speed? It's not obvious to me that this heuristic guarantees termination.

I don't know. It certainly does more inference, in cases that would be limited before, but on the other hand getting limited makes inference much more expensive because it disables caching, so in cases where it actually is able to recover information, it might end up being cheaper.

@Keno
Copy link
Member Author

Keno commented Dec 31, 2022

does this notably effect inference speed? It's not obvious to me that this heuristic guarantees termination.

It guarantees termination at any given recursion depth, but that limit is more of a "technically", because it's strongly exponential, so I didn't bother limiting it. However, the non-terminating case is somewhat pathological, because you need to write code that generates exponential method cycles, which I don't think would happen accidentally.

Keno added a commit to JuliaDiff/Diffractor.jl that referenced this pull request Dec 31, 2022
Tests currently depend on JuliaLang/julia#48045
and JuliaLang/julia#48059, so we should either
get those merged first, or mark them here as broken.
@JeffBezanson JeffBezanson added the compiler:inference Type inference label Jan 3, 2023
Keno added a commit to JuliaDiff/Diffractor.jl that referenced this pull request Jan 5, 2023
* Hookup demand-driven forward mode to the Diffractor runtime

Tests currently depend on JuliaLang/julia#48045
and JuliaLang/julia#48059, so we should either
get those merged first, or mark them here as broken.

* Mark test as broken
@LilithHafner
Copy link
Member

We can probably delete this hack if we merge this

@Keno
Copy link
Member Author

Keno commented Feb 13, 2023

@vtjnash Can you remind me what we decided to do here?

@Keno Keno closed this Feb 13, 2023
@Keno Keno reopened this Feb 13, 2023
@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 13, 2023

We discussed taking a Set comparison approach, where for each method we add to the call-stack, we maintain an indirect graph of when it most recently appeared on the stack prior to that. At least one method between our current call edge and the previous call edge from the same method must be new to the call stack (to have not appeared on the stack prior to our previous call edge, including in the cycle containing that prior call). Which can be done quickly by walking up the chain of identical method uses on the stack height number of times, and/or by comparing the height counters directly.

We proposed that this is fairly simple to show it should be usually pretty stable and predictable against random changes to the abstract interpretation order within a function, and can easily be shown to be convergent. We will of course need to check our work with PkgEval, to see if further refinement is required however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants