Walk-through
============

This walk-through guides users through several key concepts for using the nervana graph. The corresponding jupyter notebook is found [here](https://github.com/NervanaSystems/ngraph/blob/master/examples/walk_through/Graph_Introduction.ipynb).

Let's begin with a very simple example: computing ``x+1`` for several values of ``x`` using the ``ngraph``
API.  We should think of the computation as being invoked from the *host*, but possibly taking place
somewhere else, which we will refer to as *the device.*

The nervana graph currently uses a compilation model. Users first define the computations, then they are compiled and run. In the future, we plan an even more compiler-like approach, where an executable is produced that can later be run on various platforms, in addition to an interactive version.

Our first program will provide values for ``x`` and receive ``x+1`` for each ``x`` provided.

The x+1 program
---------------

The complete program, which we will walk through, is:

In [None]:
from __future__ import print_function
import ngraph as ng
import ngraph.transformers as ngt
import ngraph.transformers.passes.nviz

# Build the graph
with ng.metadata(device='numpy'):
    x = ng.placeholder(axes=())
x_plus_one = x + 1

# Select a transformer
transformer = ngt.make_transformer_factory('hetr')()
transformer.register_graph_pass(ngraph.transformers.passes.nviz.VizPass(show_all_metadata=True))

# Define a computation
plus_one = transformer.computation(x_plus_one, x)

# Run the computation
for i in range(5):
    print(plus_one(i))

We begin by importing ``ngraph``, the Python module for graph construction, and ``ngraph.transformers``, the module for transformer operations.


In [None]:
import ngraph as ng
import ngraph.transformers as ngt

Next, we create an operational graph (op-graph) for the computation.  Following TensorFlow terminology, we use ``placeholder`` to define a port for transferring tensors between the host and the device. ``Axes`` are used to tell the graph the tensor shape. In this example, ``x`` is a scalar so the axes are empty.

In [None]:
x = ng.placeholder(axes=())

The ``ngraph`` graph construction API uses functions to build a graph of ``Op`` objects. Each function may add operations to the graph, and will return an ``Op`` that represents the computation. Here, the ``Op`` returned is a ``TensorOp``, which defines the Python "magic methods" for arithmetic (for example, ``__add__()``).

In [None]:
x_plus_one = x + 1

Another bit of behind the scenes magic occurs with the Python ``1``, which is not an ``Op``. When an argument to a graph constructor is not an ``Op``, nervana graph will attempt to convert it to an ``Op`` using ``ng.constant``, the graph function for creating a constant. Thus, what is really happening is:

In [None]:
x_plus_one = ng.add(x, ng.constant(1))

Once the op-graph is defined, we can compile it with a *transformer*.  Here we use ``make_transformer`` to make a default transformer.  We tell the transformer the function to compute, ``x_plus_one``, and the associated parameter ``x``. The current default transformer uses NumPy for execution.

In [None]:
# Select a transformer
transformer = ngt.make_transformer()

# Define a computation
plus_one = transformer.computation(x_plus_one, x)

The first time the transformer executes a computation, the graph is analyzed and compiled, and storage is allocated and initialized on the device. Once compiled, the computations are callable Python objects.

On each call to ``x_plus_one`` the value of ``x`` is copied to the device, 1 is added, and then the result is copied
back from the device.

In [None]:
# Run the computation
for i in range(5):
    print(plus_one(i))

### The Compiled x + 1 Program
The compiled code can be examined (currently located in ``/tmp`` folder) to view the runtime device model. Here we show the code with some clarifying comments.

In [None]:
class Model(object):
    def __init__(self):
        self.a_AssignableTensorOp_0_0 = None
        self.a_AssignableTensorOp_0_0_v_AssignableTensorOp_0_0_ = None
        self.a_AssignableTensorOp_1_0 = None
        self.a_AssignableTensorOp_1_0_v_AssignableTensorOp_1_0_ = None
        self.a_AddZeroDim_0_0 = None
        self.a_AddZeroDim_0_0_v_AddZeroDim_0_0_ = None
        self.be = NervanaObject.be

    def alloc_a_AssignableTensorOp_0_0(self):
        self.update_a_AssignableTensorOp_0_0(np.empty(1, dtype=np.dtype('float32')))

    def update_a_AssignableTensorOp_0_0(self, buffer):
        self.a_AssignableTensorOp_0_0 = buffer
        self.a_AssignableTensorOp_0_0_v_AssignableTensorOp_0_0_ = np.ndarray(shape=(), dtype=np.float32,
            buffer=buffer, offset=0, strides=())

    def alloc_a_AssignableTensorOp_1_0(self):
        self.update_a_AssignableTensorOp_1_0(np.empty(1, dtype=np.dtype('float32')))

    def update_a_AssignableTensorOp_1_0(self, buffer):
        self.a_AssignableTensorOp_1_0 = buffer
        self.a_AssignableTensorOp_1_0_v_AssignableTensorOp_1_0_ = np.ndarray(shape=(), dtype=np.float32,
            buffer=buffer, offset=0, strides=())

    def alloc_a_AddZeroDim_0_0(self):
        self.update_a_AddZeroDim_0_0(np.empty(1, dtype=np.dtype('float32')))

    def update_a_AddZeroDim_0_0(self, buffer):
        self.a_AddZeroDim_0_0 = buffer
        self.a_AddZeroDim_0_0_v_AddZeroDim_0_0_ = np.ndarray(shape=(), dtype=np.float32,
            buffer=buffer, offset=0, strides=())

    def allocate(self):
        self.alloc_a_AssignableTensorOp_0_0()
        self.alloc_a_AssignableTensorOp_1_0()
        self.alloc_a_AddZeroDim_0_0()

    def Computation_0(self):
        np.add(self.a_AssignableTensorOp_0_0_v_AssignableTensorOp_0_0_, 
               self.a_AssignableTensorOp_1_0_v_AssignableTensorOp_1_0_, 
               out=self.a_AddZeroDim_0_0_v_AddZeroDim_0_0_)

    def init(self):
        pass

Tensors have two components: 
- storage for their elements (using the convention ``a_`` for the allocated storage of a tensor) and 
- views of that storage (denoted as ``a_...v_``).

The ``alloc_`` methods allocate storage and then create the views of the storage that will be needed.  The view creation is separated from the allocation because storage may be allocated in multiple ways.

Each allocated storage can also be initialized to, for example, random Gaussian variables. In this example, there are no initializations, so the method ``init``, which performs the one-time device
initialization, is empty.  Constants, such as 1, are copied to the device as part of the allocation process.

The method ``Computation_0`` handles the ``plus_one`` computation.  Clearly this is not the optimal way to add 1 to a scalar,
so let's look at a more complex example next in the Logistic Regression walk-through.