Make first order gradient graphs more efficient #5959

t-vi · 2020-06-29T18:43:55Z

Previously, nodes are visited as often as they are used and each time a
derivative is computed. Only at the leaves were the contributions of
everything added. This patch changes this to add at any node that is
used several times.

t-vi · 2020-06-29T18:53:43Z

This pull request is only about the second commit, the first is #5946 .
I noticed that my gradient had many more O^3 (matmul etc.) operations than it should have and tracked this down to how gradients are computed when a value is used several times in the computation.
Graphs are becoming really big and unwieldy if they are not purely sequential computation. Also, the duplication cannot be eliminated by CSE because the "output part" is duplicate rather than the input (one could, in theory commute add with all the gradient ops).
While it doesn't fix anything, it might also have a mitigating impact for people seeing other effects when working with first order gradients (e.g. #4534).

MarisaKirisame · 2020-06-29T20:14:48Z

tests/python/relay/test_pass_gradient.py

@@ -27,6 +29,20 @@
 import tvm.relay.op as op


+def count_ops(expr):


can you move this to some common file? this look like something useful to other places as well.

I'll move it to python/tvm/relay/testing/__init__.py when rebasing post #5946 .

MarisaKirisame · 2020-06-29T20:24:46Z

We was thinking of using ANF then AD. But this also work.

MarisaKirisame · 2020-06-30T03:35:21Z

@t-vi please rebase.

Previously, nodes are visited as often as they are used and each time a derivative is computed. Only at the leaves were the contributions of everything added. This patch changes this to add at any node that is used several times.

t-vi · 2020-06-30T15:11:46Z

@MarisaKirisame Thank you! I rebased and now the CI is all happy again.

tqchen · 2020-06-30T15:48:52Z

Thanks @t-vi @MarisaKirisame

Previously, nodes are visited as often as they are used and each time a derivative is computed. Only at the leaves were the contributions of everything added. This patch changes this to add at any node that is used several times.

MarisaKirisame approved these changes Jun 29, 2020

View reviewed changes

Make first order gradient graphs more efficient

35d00aa

Previously, nodes are visited as often as they are used and each time a derivative is computed. Only at the leaves were the contributions of everything added. This patch changes this to add at any node that is used several times.

t-vi force-pushed the grad_efficiency branch from e344cfb to 35d00aa Compare June 30, 2020 05:35

tqchen merged commit 7176483 into apache:master Jun 30, 2020

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make first order gradient graphs more efficient #5959

Make first order gradient graphs more efficient #5959

t-vi commented Jun 29, 2020

t-vi commented Jun 29, 2020

MarisaKirisame Jun 29, 2020

t-vi Jun 29, 2020

MarisaKirisame commented Jun 29, 2020

MarisaKirisame commented Jun 30, 2020

t-vi commented Jun 30, 2020

tqchen commented Jun 30, 2020

		@@ -27,6 +29,20 @@
		import tvm.relay.op as op


		def count_ops(expr):

Make first order gradient graphs more efficient #5959

Make first order gradient graphs more efficient #5959

Conversation

t-vi commented Jun 29, 2020

t-vi commented Jun 29, 2020

MarisaKirisame Jun 29, 2020

Choose a reason for hiding this comment

t-vi Jun 29, 2020

Choose a reason for hiding this comment

MarisaKirisame commented Jun 29, 2020

MarisaKirisame commented Jun 30, 2020

t-vi commented Jun 30, 2020

tqchen commented Jun 30, 2020