Skip to content
The simple essence of automatic differentiation
Branch: master
Clone or download
Latest commit 136844b Dec 5, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
Figures figure tweaks May 31, 2018
.gitignore *.web Nov 14, 2018
Makefile *.web Nov 14, 2018
essence-of-ad.lhs Additive Dec 5, 2018
formatting.fmt proofreading pass Jan 8, 2018
macros.tex first draft Jan 2, 2018 todo Aug 10, 2018

The simple essence of automatic differentiation


Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization. Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep understanding, improvement, and parallel execution. This talk develops a simple, generalized AD algorithm calculated from a simple, natural specification. The general algorithm is then specialized by varying the representation of derivatives. In particular, applying well-known constructions to a naive representation yields two RAD algorithms that are far simpler than previously known. In contrast to commonly used RAD implementations, the algorithms defined here involve no graphs, tapes, variables, partial derivatives, or mutation. They are inherently parallel-friendly, correct by construction, and usable directly from an existing programming language with no need for new data types or programming style, thanks to use of an AD-agnostic compiler plugin.

You can’t perform that action at this time.