# Automatic Differentiation

MXNet's `autograd` package provides automatic derivatives for functions and code. Let's start by importing it. 

In [1]:
from mxnet import nd
from mxnet import autograd

### Autograd for $f(x) = 2 x^2$

Let's start by assigning a value.

In [2]:
x = nd.array([[1, 2], [3, 4]])
print(x)


[[1. 2.]
 [3. 4.]]
<NDArray 2x2 @cpu(0)>


Calculating gradients require extra computation, and we’ll need a place to store it. We do this via `attach_grad`.

In [3]:
x.attach_grad()

#### Record the computational graph

To trace execution we need the `autograd.record()` scope.

In [4]:
with autograd.record():
    y = 2 * x**2
print(y)


[[ 2.  8.]
 [18. 32.]]
<NDArray 2x2 @cpu(0)>


#### Invoke Back Propagation

In [5]:
y.backward()

#### Verify Computed Gradients

Note that $y=2x^2$ and $\frac{dy}{dx} = 4x$, which should be

`[[4, 8], [12, 16]]`

In [6]:
x.grad


[[ 4.  8.]
 [12. 16.]]
<NDArray 2x2 @cpu(0)>

### Python Control Flow

In [9]:
def f(a):
    b = a * 2
    while b.norm() < 1000:
        b = b * 2
    return b[0] if b.sum() >= 0 else b[1]

Initialize with a random value and record the graph.

In [10]:
a = nd.random.normal(shape=2)
a.attach_grad()
with autograd.record():
    c = f(a)

#### Compute and Verify Gradients

`b` is a linear function of `a`, and `c` is chosen from `b`. 
The gradient with respect to `a` be will be either `[c/a[0], 0]` or `[0, c/a[1]]`.

```python
def f(a):
    b = a * 2
    while b.norm() < 1000:
        b = b * 2
    return b[0] if b.sum() >= 0 else b[1]
```

In [11]:
c.backward()
[a.grad, c/a]

[
 [4096.    0.]
 <NDArray 2 @cpu(0)>, 
 [4096.     8018.3516]
 <NDArray 2 @cpu(0)>]