WIP: Cache refactor #105

KristofferC · 2016-02-11T14:11:21Z

This is a WIP to refactor the cache to be a bit more efficient.

The design thoughts are given here:

Currently, the overhead from retrieving vectors from the cache is significant for smallish problems because we do two dict lookups for each vector retrieval. This can be reduced to two dict lookups per function call by storing all the vectors we need for the function in a type and store a vector of those types (one per thread) in a cache instead. The first commit implements this.

The second commit implements an additional keyword argument to the macros, cache. A ForwardDiffCache is currently created by cache = ForwardDiff.ForwardDiffCache(Val{input_length}, eltype(X), Val{chunk_size}). If such an argument is given, we use the passed in cache, else we create or reuse one from the global cache. This requires a bit of extra code and the performance gains are so-so so we have to decide if this is worth the extra code + documentation.

Changes from giant-refactor branch:

There is now only a single Dict where the values are ForwardDiffCache which each contains a vector of size NTHREADS of a new type GradientCache. This means that at the start of a gradient call we only do one lookup in the Dict, the rest of the accesses are type stable array accesses and field accesses.
Adds a totalsizeofcache function that counts the total bytes in the cache.

TODO:

Benchmark properly
Integrate it with jacobian when this is added
Reduce a bit of memory used by only allocaying partials_remainder for thread tid.

KristofferC · 2016-02-11T15:55:39Z

Some benchmarking (for rosenbrock):

# giant-refactor
julia> @time for i = 1:10^5 g(rand(5)) end
  0.352903 seconds (800.00 k allocations: 35.095 MB, 1.12% gc time)

# this branch
julia> @time for i = 1:10^5 g(rand(5)) end
  0.166368 seconds (900.00 k allocations: 39.673 MB, 5.06% gc time)

# Hardcoding a cache vector of GradientCaches, i.e. no Dict at all:
julia> @time for i = 1:10^5 g(rand(5)) end
  0.114197 seconds (800.00 k allocations: 35.095 MB, 2.73% gc time)

So seems it is still worthwhile to allow a user to completely skip the dict by specifying the input data.

KristofferC · 2016-02-12T14:54:35Z

One idea that might save some memory is to not have the input length as a key to the dict but instead just call resize! on the pertinent arrays.

AFAIU, resize! only does memory allocation when the underlying buffer is increased and is extremely fast when a size is decreased.

KristofferC · 2016-02-12T21:46:25Z

Tried it but it didn't work out so well for reasons..

KristofferC · 2016-02-13T20:03:54Z

Added the possibility to run with a precreated cache:

For example

cache = ForwardDiff.ForwardDiffCache(Val{5}, Float64)
g = ForwardDiff.@gradient(rosenbrock, cache = cache)

Some benchmarks at: https://gist.github.com/KristofferC/007e8ced53e2bb0484e6

TLDR (rosenbrock function used, giant_refactor is baseline with 1x):
For input_length = 1000, no difference between anything on this branch or giant_refactor
For input_length = 50, explicit cache: 0.8x, this branch only: 0.8x
For input_length = 20, explicit cache: 0.55x, this branch only: 0.6x
For input_length = 5, explicit cache: 0.4x, this branch only: 0.5x.

Explicitly giving the cache gave less than I thought. Might not be worth it.

KristofferC · 2016-03-03T13:31:06Z

Rebased and updated original post with some details.

jrevels mentioned this pull request Feb 11, 2016

Giant rewrite for Julia v0.5 #102

Merged

5 tasks

KristofferC force-pushed the cache_refactor branch 2 times, most recently from eb90bbc to 07aff58 Compare February 12, 2016 10:22

jrevels force-pushed the giant-refactor branch 3 times, most recently from c0038b9 to cf1024d Compare February 12, 2016 23:21

KristofferC force-pushed the cache_refactor branch 3 times, most recently from 98b09a0 to affd82a Compare February 13, 2016 17:08

KristofferC force-pushed the cache_refactor branch 2 times, most recently from 710168e to 96b088d Compare February 14, 2016 19:50

jrevels force-pushed the giant-refactor branch from 20c2fc0 to f8433d1 Compare February 21, 2016 01:44

KristofferC force-pushed the cache_refactor branch 2 times, most recently from dc95ed8 to 059a21a Compare March 3, 2016 08:48

KristofferC added 3 commits March 3, 2016 11:56

optimized the way data is being retrieved from cache

0c63a1c

add possibility to call with cache

3db884a

add some tests

7642079

KristofferC force-pushed the cache_refactor branch from 059a21a to 7642079 Compare March 3, 2016 10:57

KristofferC closed this Jun 13, 2016

jrevels deleted the cache_refactor branch June 27, 2016 00:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Cache refactor #105

WIP: Cache refactor #105

KristofferC commented Feb 11, 2016

KristofferC commented Feb 11, 2016

KristofferC commented Feb 12, 2016

KristofferC commented Feb 12, 2016

KristofferC commented Feb 13, 2016

KristofferC commented Mar 3, 2016

WIP: Cache refactor #105

WIP: Cache refactor #105

Conversation

KristofferC commented Feb 11, 2016

KristofferC commented Feb 11, 2016

KristofferC commented Feb 12, 2016

KristofferC commented Feb 12, 2016

KristofferC commented Feb 13, 2016

KristofferC commented Mar 3, 2016