Logistic Regression
-------------------

This example performs logistic regression. The corresponding jupyter notebook is found [here](https://github.com/NervanaSystems/ngraph/blob/master/examples/walk_through/Logistic_Regression_Part_1.ipynb).

We want to classify an observation $x$ into one of two classes, denoted by $y=0$ and $y=1$. Using a simple linear model:
$$\hat{y}=\sigma(Wx)$$

we want to find the optimal values for $W$. Here, we use gradient descent with a learning rate of $\alpha$ and the cross-entropy as the error function.

### Axes

The nervana graph uses `Axes` to attach shape information to tensors. The identity of `Axis` objects are used to pair and specify dimensions in symbolic expressions. The function ``ng.make_axis`` will create an ``Axis`` object with an optionally supplied `name` argument. For example:

In [None]:
import ngraph as ng
import ngraph.transformers as ngt
    
my_axis = ng.make_axis(length=256, name='my_axis')

Alternatively, we can use a ``NameScope`` to set the names of the various axes. A ``NameScope`` is an object that sets the name of an object to that of its assigned attribute. So when we set ``ax.N`` to an ``Axis`` object, the ``name`` of the object is automatically set to ``ax.N``. This a convenient way to define axes, so we use this approach for the rest of this example.

In [None]:
ax = ng.make_name_scope("ax")
ax.N = ng.make_axis(length=128, batch=True)
ax.C = ng.make_axis(length=4)

We add ``batch`` as a property to ``ax.N`` to indicate that the axis is a batch axis. A batch axis is held out of the default set of axes reduced in reduction operations such as sums.

### Building the graph
Our model has three placeholders: ``X``, ``Y``, and ``alpha``, each of which need to have axes defined. ``alpha`` is a scalar, so we pass in empty axes:

In [None]:
alpha = ng.placeholder(axes=())

``X`` and ``Y`` are tensors for the input and output data, respectively. Our convention is to use the last axis for samples.  The placeholders can be specified as:

In [None]:
X = ng.placeholder(axes=[ax.C, ax.N])
Y = ng.placeholder(axes=[ax.N])

We also need to specify the training weights, ``W``.  Unlike a placeholder, ``W`` should retain its value from computation to computation (for example, across mini-batches of training).  Following TensorFlow, we call this a *variable*.  We specify the variable with both ``Axes`` and also an initial value:

In [None]:
W = ng.variable(axes=[ax.C - 1], initial_value=0)

The nervana graph axes are agnostic to data layout on the compute device, so the ordering of the axes does not matter. As a consequence, when two tensors are provided to a `ng.dot()` operation, for example, one needs to indicate which are the corresponding axes that should be matched together. We use "dual offsets" of +/- 1 to mark which axes should be matched during a multi-axis operation, which gives rise to the `ax.C - 1` observed above. For more information, see the `Axes` section of the user guide.

Now we can estimate ``y`` as ``Y_hat`` and compute the average loss ``L``:

In [None]:
Y_hat = ng.sigmoid(ng.dot(W, X))
L = ng.cross_entropy_binary(Y_hat, Y, out_axes=()) / ng.batch_size(Y_hat)

Here we use several ngraph functions, including ``ng.dot`` and ``ng.sigmoid``. Since a tensor can have multiple axes, we need a way to mark which axes in the first argument of ``ng.dot`` are to act on which axes in the second argument.

Every axis is a member of a family of axes we call duals of the axis, and each axis in the family has a position. When you create an axis, its dual position is 0. ``dot`` pairs axes in the first and second arguments that are of the same dual family and have consecutive positions.

We want the variable `W` to act on the `ax.C` axis, so we want the axis for `W` to be in the position before `ax.C`, which we can obtain with `ax.C - 1`. We initialize ``W`` to ``0``.

Gradient descent requires computing the gradient, $\frac{dL}{dW}$


In [None]:
grad = ng.deriv(L, W)

The ``ng.deriv`` function computes the backprop using autodiff. We are almost done.  The update step computes the new weight and assigns it to ``W``:

In [None]:
update = ng.assign(W, W - alpha * grad / ng.tensor_size(Y_hat))

### Computation

Now we create a transformer and define a computation. We pass the ops from which we want to retrieve the results for, followed by the placeholders:


In [None]:
transformer = ngt.make_transformer()
update_fun = transformer.computation([L, W, update], alpha, X, Y)

Here, the computation will return three values for the ``L``, ``W``, and ``update``, given inputs to fill the placeholders. 

The input data is synthetically generated as a mixture of two Gaussian distributions in 4-d space.  Our dataset consists of 10 mini-batches of 128 samples each, which we create with a convenience function:

In [None]:
import gendata

g = gendata.MixtureGenerator([.5, .5], (ax.C.length,))
XS, YS = g.gen_data(ax.N.length, 10)

Finally, we train the model across 10 epochs, printing the loss and updated weights:

In [None]:
    for i in range(10):
        for xs, ys in zip(XS, YS):
            loss_val, w_val, _ = update_fun(5.0 / (1 + i), xs, ys)
            print("W: %s, loss %s" % (w_val, loss_val))

Also see Part 2 of logistic regressions, which walks uses through adding additional variables, computations, and dimensions.