# Getting started with grain

grain is dynamic autograd library on CPU/CUDA devices for D language. While grain's autograd mechanism is dynamic, D compilers can strongly optimize  and validate your scripts statically. This combination gives you great experience.

## how to run grain in jupyter

This installation guide is tested with

- dmd 2.081.2 (recommend [the official installer](https://dlang.org/install.html))
- jupyter 4.4.0 (recommend anaconda)

and install d kernel and engine as follows
``` bash
git clone https://github.com/ShigekiKarita/grain
cd grain
jupyter kernelspec install ./example/third/jupyterd --user
dub build --config=jupyterd --compiler=dmd
export PATH=`pwd`:$PATH
jupyter notebook
```

Then you can create a new notebook with 'd' kernel or open existing 'd' notebooks.

## expected readers

Generally this tutorial is for D users. You can learn D easility in https://tour.dlang.org/tour/ Optionally, experiences of numerical library in D (mir) or in python (numpy, pytorch) help you understanding grain.

## reference

- Main topics in this tutorial are derived from https://pytorch.org/tutorials

## TODO

- DUB extension for drepl like `%load_dub grain`

# mir.ndslice.Slice and numir

Before introducing grain, we will first implement the network using mir. Mir provides an n-dimensional slice value, and many functions for manipulating these slices. Mir is a generic framework for scientific computing; it does not know anything about computation graphs, or deep learning, or gradients. However we can easily use mir to fit a two-layer network to random data by manually implementing the forward and backward passes through the network using mir operations:

In [1]:
// TODO hand-written 2 layer NN implementation like example/char_rnn_hand.d
import mir.ndslice;



# grain.autograd.Variable

In the above examples, we had to manually implement both the forward and backward passes of our neural network. Manually implementing the backward pass is not a big deal for a small two-layer network, but can quickly get very hairy for large complex networks.

Thankfully, we can use automatic differentiation to automate the computation of backward passes in neural networks. The autograd package in grain provides exactly this functionality. When using autograd, the forward pass of your network will define a computational graph; nodes in the graph will be `Variable`, and edges will be functions that produce output Variable from input Variable. In addition to Slice feature, Variable contains autograd history (simply previously forwarded `Function` object that also implements its backward function) as `grain.autograd.Backprop` type. Backpropagating through this graph then allows you to easily compute gradients. Moreover, unlike mir.ndslice only supports CPU memory slices, grain supports both CPU and CUDA memory slices.

This sounds complicated, it’s pretty simple to use in practice. Each Slice represents a node in a computational graph. If x is a Slice that has x.requiresGrad=True then x.grad is another memory holding the gradient of x with respect to some scalar value (and x.gradSlice hold a slice of the gradient).

Here we use `grain.autograd.Variable` and autograd to implement our two-layer network; now we no longer need to manually implement the backward pass through the network:

In [1]:
// TODO fix empty line outputs
import std.stdio;
import grain;
import mir.random;
import mir.ndslice;
{
    // enable autograd mode
    backprop = true;
    alias ftype = float;
    alias device = HostStorage; // DeviceStorage

    auto learningRate = 1e-6;
    auto batchSize = 6;
    auto inputDim = 10;
    auto hiddenDim = 10;
    auto outputDim = 5;
    
    auto genRand(bool requiresGrad = false, size_t dim)(size_t[dim] s...) {
        return iota(s)
            .map!(_ => rand!ftype)
            .slice
            .variable(requiresGrad)
            .to!device;
    }

    // create random tensors to hold input and outputs
    auto x = genRand(batchSize, inputDim);
    auto y = genRand(batchSize, outputDim);

    auto w1 = genRand!true(inputDim, hiddenDim);
    auto w2 = genRand!true(hiddenDim, outputDim);

    // compose NN sequence like torch with UFCS
    auto y_pred = x
        .matMul(w1)
        .relu
        .matMul(w2);

    auto loss = (y - y_pred).pow(2).sum;
    loss.backward();
    // see backprop start to end
    writeln(loss);
    writeln(w1.gradSlice);
}












































blasint = int
module `reduce` is in file 'grain/functions/reduce.d' which cannot be read
import path[0] = /tmp/drepl.ewpkUw
import path[1] = /home/skarita/.dub/packages/cblas-2.0.3/cblas/source/
import path[2] = /home/skarita/.dub/packages/derelict-cuda-3.1.1/derelict-cuda/source
import path[3] = /home/skarita/.dub/packages/derelict-util-3.0.0-beta.2/derelict-util/source/
import path[4] = /home/skarita/.dub/packages/lapack-0.0.6/lapack/source/
import path[5] = /home/skarita/.dub/packages/lubeck-0.0.7/lubeck/source/
import path[6] = /home/skarita/.dub/packages/mir-algorithm-1.1.6/mir-algorithm/source/
import path[7] = /home/skarita/.dub/packages/mir-blas-0.1.2-alpha3/mir-blas/source/
import path[8] = /home/skarita/.dub/packages/mir-lapack-0.0.7-alpha/mir-lapack/source/
import path[9] = /home/skarita/.dub/packages/mir-linux-kernel-1.0.0/mir-linux-kernel/source/
import path[10] = /home/skarita/.dub/packages/mir-random-0.4.7/mir-random/source/
imp



In [1]:
// numpy-like broadcasting is available
static assert(!__traits(compiles, [[0.1f], [0.3f]].variable + [0.2f].variable));
// but you need to make them same dimensions
[[0.1f, 0.2f], [0.3f, 0.4f]].variable + [[0.2f]].variable

Variable!(float, dim=2, HostStorage(T))(data=[0.3, 0.4, 0.5, 0.6], shape=[2, 2], strides=[2, 1])


## Defining new autograd functions

Under the hood, each primitive autograd operator is really two functions that operate on Variables. The forward function computes output Variables from input Variable. The backward function receives the gradient of the output Variables with respect to some scalar value, and computes the gradient of the input Variables with respect to that same scalar value.

In grain, we can easily define our own autograd operator by defining a `Function` with `grain.functions.common.FunctionCommon` mixin and implementing the forward and backward functions. We can then use our new autograd operator by constructing an instance and calling it like a function, passing Variables containing input data.

- you can find many examples in `grain.functions` module

In this example we define our own custom autograd function for performing the ReLU nonlinearity, and use it to implement our two-layer network:

In [1]:
/// rectified linear unit nonlinearity
struct MyReLU(T, size_t dim) {

    mixin FunctionCommon;
    Variable!(T, dim, HostStorage) hx;

    auto forward(Variable!(T, dim, HostStorage) x) {
        import mir.ndslice : each;
        this.hx = x.dup;
        auto y = x.dup;
        y.sliced.each!((ref a) {
            if (a < 0)
                a = 0;
        });
        return y;
    }

    auto backward(Variable!(T, dim, HostStorage) gy) {
        auto gx = gy.dup;
        foreach (i; 0 .. gx.data.length) {
            if (this.hx.data[i] < 0.0)
                gx.data[i] = 0.0;
        }
        return gx;
    }
}

