Skip to content

Conversation

@KristofferC
Copy link
Contributor

@KristofferC KristofferC commented Feb 27, 2023

ForwardDiff quite aggressively specializes most of its functions on the
concrete input function type. This gives a slight performance
improvement but it also means that a significant chunk of code has to be
compiled for every call to ForwardDiff with a new function.

Previously, for every equation in a model we would call
ForwardDiff.gradient with the julia function corresponding to that
equation. This would then compile the ForwardDiff functions for all of
these julia functions.

Looking at the specializations generated by a model, we see:

GC = ForwardDiff.GradientConfig{FRBUS_VAR.MyTag, Float64, 4, Vector{ForwardDiff.Dual{FRBUS_VAR.MyTag, Float64, 4}}}
MethodInstance for ForwardDiff.vector_mode_dual_eval!(::FRBUS_VAR.EquationEvaluator{:resid_515}, ::GC, ::Vector{Float64})
MethodInstance for ForwardDiff.vector_mode_gradient!(::DiffResults.MutableDiffResult{1, Float64, Tuple{Vector{Float64}}}, ::FRBUS_VAR.EquationEvaluator{:resid_515}, ::Vector{Float64}, ::GC)
MethodInstance for ForwardDiff.vector_mode_dual_eval!(::FRBUS_VAR.EquationEvaluator{:resid_516}, ::GC, ::Vector{Float64})
MethodInstance for ForwardDiff.vector_mode_gradient!(::DiffResults.MutableDiffResult{1, Float64, Tuple{Vector{Float64}}}, ::FRBUS_VAR.EquationEvaluator{:resid_516}, ::Vector{Float64}, ::GC)

which are all identical methods compiled for different equations.

In this PR, we instead "hide" all the concrete functions for every equation
between a common "wrapper functions". This means that only one
specialization of the ForwardDiff functions gets compiled.

Using the following benchmark script:

unique!(push!(LOAD_PATH, realpath("./models")))
using ModelBaseEcon
using Random # See https://github.com/JuliaLang/julia/pull/48810

@time using FRBUS_VAR

m = FRBUS_VAR.model
nrows = 1 + m.maxlag + m.maxlead
ncols = length(m.allvars)
pt = zeros(nrows, ncols);
@time @eval eval_RJ(pt, m);

using BenchmarkTools
@btime eval_RJ(pt, m);

This PR has the following changes:

  • Package load time: 0.078s -> 0.052s
  • First call eval_RJ: 11.47s -> 4.97s
  • Runtime performance of eval_RJ: 550μs -> 590μs

So there seems to be about a 10% runtime performance in the eval_RJ
call but the latency is drastically reduced.

ForwardDiff quite aggressively specializes most of its functions on the
concrete input function type. This gives a slight performance
improvement but it also means that a significant chunk of code has to be
compiled for every call to `ForwardDiff` with a new function.

Previously, for every equation in a model we would call
`ForwardDiff.gradient` with the julia function corresponding to that
equation. This would then compile the ForwardDiff functions for all of
these julia functions.

Looking at the specializations generated by a model, we see:

```julia
GC = ForwardDiff.GradientConfig{FRBUS_VAR.MyTag, Float64, 4, Vector{ForwardDiff.Dual{FRBUS_VAR.MyTag, Float64, 4}}}
MethodInstance for ForwardDiff.vector_mode_dual_eval!(::FRBUS_VAR.EquationEvaluator{:resid_515}, ::GC, ::Vector{Float64})
MethodInstance for ForwardDiff.vector_mode_gradient!(::DiffResults.MutableDiffResult{1, Float64, Tuple{Vector{Float64}}}, ::FRBUS_VAR.EquationEvaluator{:resid_515}, ::Vector{Float64}, ::GC)
MethodInstance for ForwardDiff.vector_mode_dual_eval!(::FRBUS_VAR.EquationEvaluator{:resid_516}, ::GC, ::Vector{Float64})
MethodInstance for ForwardDiff.vector_mode_gradient!(::DiffResults.MutableDiffResult{1, Float64, Tuple{Vector{Float64}}}, ::FRBUS_VAR.EquationEvaluator{:resid_516}, ::Vector{Float64}, ::GC)
```

which are all identical methods compiled for different equations.

In this PR, we instead "hide" all the concrete functions for every equation
between a common "wrapper functions". This means that only one
specialization of the ForwardDiff functions gets compiled.

Using the following benchmark script:

```julia
unique!(push!(LOAD_PATH, realpath("./models")))
using ModelBaseEcon
using Random # See JuliaLang/julia#48810

@time using FRBUS_VAR

m = FRBUS_VAR.model
nrows = 1 + m.maxlag + m.maxlead
ncols = length(m.allvars)
pt = zeros(nrows, ncols);
@time @eval eval_RJ(pt, m);

using BenchmarkTools
@Btime eval_RJ(pt, m);
```

This PR has the following changes:

- Package load time: 0.078s -> 0.05s
- First call `eval_RJ`: 11.47s -> 4.97s
- Runtime performance of `eval_RJ`: 550μs -> 590μs

So there seems to be about a 10% runtime performance in the `eval_RJ`
call but the latency is drastically reduced.
@jasonjensen jasonjensen changed the base branch from master to juliahub March 14, 2023 22:25
@bbejanov bbejanov changed the base branch from juliahub to dev March 31, 2023 13:08
@bbejanov bbejanov merged commit b6167e2 into bankofcanada:dev Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants