We can use the `autograd` package from `MXNet` to calculate gradients of arbitrary functions. 

In [1]:
import mxnet as mx
from mxnet import nd
from mxnet import autograd

As a toy example, let's say we're interested in differentiating a function $f(x) = 2x^2$ with respect to parameter $x$.

In [3]:
# Assigning an initial value to x
x = nd.array([[1,2],[3,4]])
x


[[1. 2.]
 [3. 4.]]
<NDArray 2x2 @cpu(0)>

While computing the gradient of $f(x)$ with respect to $x$, you will need a place to store and retrieve the gradient. For a particular NDArray, you need to use attach grad to signal that you're going to compute the gradient of a function with respect to that NDArray.

In this example, we're calling `attach_grad` on `x` because we want to compute the gradient of $f$ with respect to $x$.

In [4]:
x.attach_grad()

We also need to allocate space to store the gradient after it's computed and attach grad takes care of this as well.

Now, let's evaluate the function $y=f(x)$.

First, we need to define the function. 

In [5]:
def f(x):
    return 2 * x**2

As long as the argument `x` is passed in as an NDArray and we keep the objects as NDArray is in the function, `autograd` we'll be able to compute the gradient.

In [6]:
with autograd.record():
    y = f(x)

In [7]:
x, y

(
 [[1. 2.]
  [3. 4.]]
 <NDArray 2x2 @cpu(0)>,
 
 [[ 2.  8.]
  [18. 32.]]
 <NDArray 2x2 @cpu(0)>)

As you can see, the contents of `y` are the results of evaluating $2x^2$. 

Now, to compute the gradient you will have to call the backward function on y. This will invoke backpropagation on y and compute gradient of y with respect to all its upstream computational dependencies. It will store the gradient in the space allocated by the attach grad function. 

In [10]:
y.backward()

Now, we can verify the computer gradients to see if it's correct.

The analytical derivative of $f(x)=2x^2$ is $4x$

In [11]:
x, x.grad

(
 [[1. 2.]
  [3. 4.]]
 <NDArray 2x2 @cpu(0)>,
 
 [[ 4.  8.]
  [12. 16.]]
 <NDArray 2x2 @cpu(0)>)

The gradient computed on an NDArray that you call attach grad on, will be stored in the grad property of x.

### Dynamic programs

Sometimes it's necessary to write dynamic programs where the execution flow depends on values computed in real-time as part of the execution.

MXNet will record the execution trees and compute the gradient as well. 

Consider the following function f. 
- It takes in an input vector of size two each drawn randomly from the uniform distribution on negative one to one.
- F doubles the input vector until its norm or length which is 1,000. 
- Then it selects one element depending on the sum of its elements. If it's positive, it returns the first element and the second element if it's negative.

In [15]:
def f(x):
    x = x*2
    while x.norm().asscalar() < 1000:
        x = x*2
        # If sum positive
        # Pick 1st
    if x.sum() >= 0:
        y = x[0]
    # else pick 2nd
    else:
        y = x[1]
    return y

In [13]:
# First, we initialize x with two numbers drawn randomly from a uniform distribution between minus one and one.
x = nd.random.uniform(-1, 1, shape=2)
x


[0.09762704 0.18568921]
<NDArray 2 @cpu(0)>

In [16]:
# attach grad to x and record the trace of the function evaluation before calling backwards to compute the gradients.
x.attach_grad()
with autograd.record():
    y = f(x)
y.backward()

How to evaluate these gradients?

$$y =k.x[0] \text{ or } y=k.x[1], \text{ hence } \dfrac{\mathrm{d}y}{\mathrm{d}x} = \begin{vmatrix} 0 \\ k \end{vmatrix} \text{ or } \begin{vmatrix} k \\ 0 \end{vmatrix}$$

$$\text{with } k = 2^n \text{where} n \text{is the number of times } x \text{was multiplied by 2}$$

Breaking it down, we know that $y$ is a linear function of one of the items in $x$. If we represent the coefficient in the linear function as $k$, then the gradient with respect to $x$ will either be $k$, 0 or 0, $k$ depending on which element from $x$ is picked. We also know what $K$ is, it is the number of times we doubled $x$ in the loop. 

In [18]:
x


[0.09762704 0.18568921]
<NDArray 2 @cpu(0)>

In [17]:
x.grad


[8192.    0.]
<NDArray 2 @cpu(0)>