New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Cache refactor #105
WIP: Cache refactor #105
Conversation
Some benchmarking (for rosenbrock): # giant-refactor
julia> @time for i = 1:10^5 g(rand(5)) end
0.352903 seconds (800.00 k allocations: 35.095 MB, 1.12% gc time)
# this branch
julia> @time for i = 1:10^5 g(rand(5)) end
0.166368 seconds (900.00 k allocations: 39.673 MB, 5.06% gc time)
# Hardcoding a cache vector of GradientCaches, i.e. no Dict at all:
julia> @time for i = 1:10^5 g(rand(5)) end
0.114197 seconds (800.00 k allocations: 35.095 MB, 2.73% gc time) So seems it is still worthwhile to allow a user to completely skip the dict by specifying the input data. |
eb90bbc
to
07aff58
Compare
One idea that might save some memory is to not have the input length as a key to the dict but instead just call AFAIU, |
Tried it but it didn't work out so well for reasons.. |
c0038b9
to
cf1024d
Compare
98b09a0
to
affd82a
Compare
Added the possibility to run with a precreated cache: For example
Some benchmarks at: https://gist.github.com/KristofferC/007e8ced53e2bb0484e6 TLDR (rosenbrock function used, Explicitly giving the cache gave less than I thought. Might not be worth it. |
710168e
to
96b088d
Compare
20c2fc0
to
f8433d1
Compare
dc95ed8
to
059a21a
Compare
059a21a
to
7642079
Compare
Rebased and updated original post with some details. |
This is a WIP to refactor the cache to be a bit more efficient.
The design thoughts are given here:
Currently, the overhead from retrieving vectors from the cache is significant for smallish problems because we do two dict lookups for each vector retrieval. This can be reduced to two dict lookups per function call by storing all the vectors we need for the function in a type and store a vector of those types (one per thread) in a cache instead. The first commit implements this.
The second commit implements an additional keyword argument to the macros,
cache
. AForwardDiffCache
is currently created bycache = ForwardDiff.ForwardDiffCache(Val{input_length}, eltype(X), Val{chunk_size})
. If such an argument is given, we use the passed in cache, else we create or reuse one from the global cache. This requires a bit of extra code and the performance gains are so-so so we have to decide if this is worth the extra code + documentation.Changes from
giant-refactor
branch:ForwardDiffCache
which each contains a vector of size NTHREADS of a new typeGradientCache
. This means that at the start of a gradient call we only do one lookup in the Dict, the rest of the accesses are type stable array accesses and field accesses.totalsizeofcache
function that counts the total bytes in the cache.TODO:
jacobian
when this is addedpartials_remainder
for threadtid
.