Logistic Regression
-------------------

This example performs logistic regression. The corresponding jupyter notebook is found [here](https://github.com/NervanaSystems/ngraph-neon/blob/master/examples/walk_through/Logistic_Regression_Part_1.ipynb).
We want to classify an observation $x$ into one of two classes, denoted by $y=0$ and $y=1$. Using a simple linear model:
$$\hat{y}=\sigma(Wx)$$

we want to find the optimal values for $W$. Here, we use gradient descent with a learning rate of $\alpha$ and the cross-entropy as the error function.

### Axes

The nervana graph uses `Axes` to attach shape information to tensors. The identity of `Axis` objects are used to pair and specify dimensions in symbolic expressions. The function ``ng.make_axis`` will create an ``Axis`` object with an optionally supplied `name` argument. For example:

In [None]:
import neon as ng
import neon.transformers as ngt
    
N = ng.make_axis(length=128, name='N')
C = ng.make_axis(length=4)

We add ``batch`` as a property to ``N`` to indicate that the axis is a batch axis. A batch axis is held out of the default set of axes reduced in reduction operations such as sums.

### Building the graph
Our model has three placeholders: ``X``, ``Y``, and ``alpha``, each of which need to have axes defined. ``alpha`` is a scalar, so we pass in empty axes:

In [None]:
alpha = ng.placeholder(axes=())

``X`` and ``Y`` are tensors for the input and output data, respectively. Our convention is to use the last axis for samples.  The placeholders can be specified as:

In [None]:
X = ng.placeholder(axes=[C, N])
Y = ng.placeholder(axes=[N])

We also need to specify the training weights, ``W``.  Unlike a placeholder, ``W`` should retain its value from computation to computation (for example, across mini-batches of training). Following TensorFlow, we call this a *variable*.  We specify the variable with both ``Axes`` and also an initial value:

In [None]:
W = ng.variable(axes=[C], initial_value=0)

Now we can estimate ``y`` as ``Y_hat`` and compute the average loss ``L``:

In [None]:
Y_hat = ng.sigmoid(ng.dot(W, X))
L = ng.cross_entropy_binary(Y_hat, Y, out_axes=()) / ng.batch_size(Y_hat)

Here we use several ngraph functions, including ``ng.dot`` and ``ng.sigmoid``. Since a tensor can have multiple axes, we need a way to mark which axes in the first argument of ``ng.dot`` are to act on which axes in the second argument. Please also note that the `W` has been defined with one axis, while `X` has two axis. Every tensor component along C axis in `X` is being dot-producted with `W`, and the `N` results are stored in `Y_hat`, that has only one axis, the `N` axis.

Once `Y_hat` has been computed (the whole batch computation was defined above), we can move on and update the weights in `W`. Gradient descent requires computing the gradient, $\frac{dL}{dW}$

In [None]:
grad = ng.deriv(L, W)

The ``ng.deriv`` function computes the backprop using autodiff. We are almost done as we are now ready to update ``W``.  The update step (which is an Op that will be carried out at the time of real computation on the device) computes the new weight and assigns it to ``W``:

In [None]:
update = ng.assign(W, W - alpha * grad / ng.tensor_size(Y_hat))

We will also need a way to generate input data. Below the input data, ``XS`` and ``YS``, is synthetically generated as a mixture of two Gaussian distributions in 4-d space.  We shape our entire dataset as 10 mini-batches of 128 samples each, which we create with a convenient function:

In [None]:
import gendata

g = gendata.MixtureGenerator([.5, .5], (C.length,))
XS, YS = g.gen_data(N.length, 10)


### Computation

Now we create a transformer and define a computation for learning. In order to do so, we pass the ops from which we want to retrieve the results for, followed by the placeholders.

Here, the computation will return three values for the ``L``, ``W``, and ``update``, given inputs to fill the placeholders, $\alpha$ (Learning Rate), X (inputs), Y (expected outputs).

In [None]:
from contextlib import closing

with closing(ngt.make_transformer()) as transformer:
    update_fun = transformer.computation([L, W, update], alpha, X, Y)
    
    for i in range(10):
        for xs, ys in zip(XS, YS):
            loss_val, w_val, _ = update_fun(5.0 / (1 + i), xs, ys)
            print("W: %s, loss %s" % (w_val, loss_val))

Finally, we train the model across the 10 epochs, printing the loss and updated weights. Please note that we are using a decreasing policy (with the epoch number) for $\alpha$. Also note that there is no need to specify the outputs when invoking update_fun, as they were specified at definition time. Now we need only to feed the inputs into the ``update_fun`` call:

Also see Part 2 of logistic regressions, which walks users through adding additional variables, computations, and dimensions. <br>  

