In [2]:
import mxnet as mx
from mxnet import nd, autograd
mx.random.seed(1)

- Let’s say that we are interested in differentiating a function f = 2 * (x * 2) with respect to parameter x.
- Once we compute the gradient of f with respect to x, we’ll need a place to store it. In MXNet, we can tell an NDArray that we plan to store a gradient by invoking its attach_grad() method.

In [3]:
x = nd.array([[1, 2], [3, 4]])

In [6]:
x.attach_grad()

- Now we’re going to define the function f and MXNet will generate a computation graph on the fly.
- Note that building the computation graph requires a nontrivial amount of computation. So MXNet will only build the graph when explicitly told to do so. 
- We can instruct MXNet to start recording by placing code inside a with autograd.record(): block.

In [7]:
with autograd.record():
    y = x * 2
    z = y * x 

In [8]:
z.backward()

In [9]:
x.grad


[[ 4.  8.]
 [12. 16.]]
<NDArray 2x2 @cpu(0)>

In [13]:
with autograd.record():
    y = x * 2
    z = y * x

head_gradient = nd.array([[10, 1.], [.1, .01]])
z.backward(head_gradient)
x.grad


[[40.    8.  ]
 [ 1.2   0.16]]
<NDArray 2x2 @cpu(0)>

In [16]:
a = nd.random_normal(shape=3)
a


[-0.8064291  1.220331   2.2323563]
<NDArray 3 @cpu(0)>

In [17]:
a.attach_grad()

In [18]:
with autograd.record():
    b = a * 2
    while (nd.norm(b) < 1000).asscalar():
        b = b * 2

    if (mx.nd.sum(b) > 0).asscalar():
        c = b
    else:
        c = 100 * b

In [19]:
head_gradient = nd.array([0.01, 1.0, .1])
c.backward(head_gradient)

In [20]:
a.grad


[  5.12 512.    51.2 ]
<NDArray 3 @cpu(0)>