# Automatic Differentiation with `autograd`

Models are trained to get better and better as a function of experience. Usually, getting better means minimizing a loss function. To achieve this goal, the model iteratively computes the gradient of the loss with respect to weights and then updates the internal weights accordingly. While the gradient calculations are straightforward through a chain rule, for complex models, working it out by hand can be tedious.

Before diving deep into the model training, you must understand how MXNet’s `autograd` package expedites this work by automatically calculating derivatives.

## Basic usage

Firstly, import the `autograd` package and analyze the case study for a simple function.

In [1]:
from mxnet import nd
from mxnet import autograd

### Case study: autograd for $f(x) = 2 x^2$

Start by assigning an initial value of $x$.

In [2]:
x = nd.array([[1, 2], [3, 4]])

In MXNet, an `NDArray` for which a gradient needs to be calculated and stored can be invoked using its `attach_grad` method.

#### Attach gradient storage

Calculating gradients requires extra computation, and a variable in which to store it. Utilize the `.attach_grad()` function to store the gradient.

In [None]:
x.attach_grad()

#### Define and record y = f(x)

Now, define the function $y=f(x)$. To let MXNet store $y$, so that gradients can be computed later, put the definition inside the `autograd.record()` scope.

In [4]:
with autograd.record():
    y = 2 * x**2
print(y)


[[ 2.  8.]
 [18. 32.]]
<NDArray 2x2 @cpu(0)>


#### Invoke back propagation

Invoke back propagation (backprop) by calling `y.backward()`.

In [5]:
y.backward()

#### Verify Computed Gradients

Next, verify the computed gradients; note that $y=2x^2$ and $\frac{dy}{dx} = 4x$, which should be

`[[4, 8],[12, 16]]`

In [6]:
x.grad


[[ 4.  8.]
 [12. 16.]]
<NDArray 2x2 @cpu(0)>