# Autograd System in MinPy

We design MinPy with an idea in mind: how to maximize the flexibly and the convenience of a tool built for machine learning at the same time? If we want the flexibly, we need imperative programing to let researchers control every details of the neural network. However, in most of the machine learning tools, it is inevitable to manually implement the back propagation of the forward pass for manually defined layers or modules. Only the symbolic part of the network enjoys its hand-free back propagation. To integrate the automatic gradient solver and imperative programming together, we introduce the key component of MinPy's imperative programming - Autograd system.

## A Close Look at Autograd System
Autograd computes a gradient function for any customized function with a single output. For example, we define a simple function `foo`:

In [1]:
def foo(x):
    return x**2

foo(4)

16

Now we get its derivative by MinPy's Autograd. To use Autograd, simply import `grad` from `minpy.core`.

In [2]:
import minpy.numpy as np  # currently need import this at the same time
from minpy.core import grad

d_foo = grad(foo)

In [3]:
d_foo(4)

8.0

Autograd also differentiates vector inputs. For example:

In [4]:
x = np.array([1, 2, 3, 4])
d_foo(x)

[ 2.  4.  6.  8.]

## Autograd for Loss Function

Since in world of machine learning we optimize a scalar loss, Autograd is particular useful to obtain the gradient of input parameters for next updates. For example, we define an affine layer, relu layer, and a softmax loss.

In [5]:
def affine(x, w, b):
    """
    Computes the forward pass for an affine (fully-connected) layer.
    The input x has shape (N, d_1, ..., d_k) and contains a minibatch of N
    examples, where each example x[i] has shape (d_1, ..., d_k). We will
    reshape each input into a vector of dimension D = d_1 * ... * d_k, and
    then transform it to an output vector of dimension M.
    Inputs:
    - x: A numpy array containing input data, of shape (N, d_1, ..., d_k)
    - w: A numpy array of weights, of shape (D, M)
    - b: A numpy array of biases, of shape (M,)
    Returns a tuple of:
    - out: output, of shape (N, M)
    """
    out = np.dot(x, w) + b
    return out

def relu(x):
    """
    Computes the forward pass for a layer of rectified linear units (ReLUs).
    Input:
    - x: Inputs, of any shape
    Returns a tuple of:
    - out: Output, of the same shape as x
    """
    out = np.maximum(0, x)
    return out

def softmax_loss(x, y):
    """
    Computes the loss and gradient for softmax classification.
    Inputs:
    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth class
    for the ith input.
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
    0 <= y[i] < C
    Returns a tuple of:
    - loss: Scalar giving the loss
    """
    probs = np.exp(x - np.max(x, axis=1, keepdims=True))
    probs /= np.sum(probs, axis=1, keepdims=True)
    N = x.shape[0]
    loss = -np.sum(np.log(probs[np.arange(N), y])) / N
    dx = probs.copy()
    dx[np.arange(N), y] -= 1
    dx /= N
    return loss

Then we use these layers to define a single layer fully connected network.

In [29]:
class SimpleNet(object):
    def __init__(self, hidden_size=100, num_class=3):
        # Define model parameters.
        self.params = {}
        self.params['w'] = np.random.randn(hidden_size, num_class) * 0.01
        self.params['b'] = 0

    def forward(self, X):
        # First affine layer (fully-connected layer).
        y1 = affine(X, self.params['w'], self.params['b'])
        # ReLU activation.
        y2 = relu(y1)
        return y2

    def loss(self, X, y):
        # Compute softmax loss between the output and the label.
        return softmax_loss(self.forward(X), y)

In [30]:
batch_size = 100
hidden_size = 100
net = SimpleNet(hidden_size)
x = np.random.randn(batch_size, hidden_size)

In [31]:
gradient = grad(net.loss)

Then we can get gradient by simply call `gradient(X, y)`.