Logistic Regression Part 2
--------------------------

In this example, we extend the code from Part 1 with several important features:
- Instead of just updating the weight matrix ``W``, **we add a bias ``b``** and use the **``.variables()`` method** to compactly update both variables.
- We attach an additional computation to the transformer **to compute the loss on a held-out validation dataset.**
- We switch from a flat ``C``-dimensional feature space **to a ``W x H`` feature space** to demonstrate multi-dimensional logistic regression.

The corresponding jupyter notebook is found [here](https://github.com/NervanaSystems/ngraph/blob/master/examples/walk_through/Logistic_Regression_Part_2.ipynb).

In [None]:
import ngraph as ng
import ngraph.transformers as ngt
import gendata

The axes creation is the same as before, except **we now add a new axes ``H`` **to represent the new feature space.

In [None]:
ax = ng.make_name_scope(name="ax")

ax.W = ng.make_axis(length=4)
ax.H = ng.make_axis(length=1)  # new axis added. 1=Fictious,just to demonstrate multi-axis case
ax.N = ng.make_axis(length=128, batch=True)

### Building the graph
Our model, as in the previous example, has three placeholders: ``X``, ``Y``, and ``alpha``. But now, the the input ``X`` has shape ``(W, H, N)``:

In [None]:
alpha = ng.placeholder(())
X = ng.placeholder([ax.W, ax.H, ax.N])  # now has shape (W, H, N)
Y = ng.placeholder([ax.N])

Similarly, the weight matrix is now conceptually multi-dimensional, with shape ``(W, H)``, and we add a new scalar bias variable. We want also to specify that the weight matrix ``W`` axes will be reduced when computing the element-wise product and summation with the inputs (so we add ``-1``)

In [None]:
W = ng.variable([ax.W - 1, ax.H - 1], initial_value=0).named('W')  # now has shape (W, H)
b = ng.variable((), initial_value=0).named('b')

Our predicted output will be now computed including the bias ``b``. Please note there here the + operation implicitly extends ``b`` to the batch size N, the size of the only axis of Y_hat:

In [None]:
Y_hat = ng.sigmoid(ng.dot(W, X) + b)
L = ng.cross_entropy_binary(Y_hat, Y, out_axes=()) / ng.batch_size(Y_hat)

For the parameter updates, instead of explicitly specifying the variables ``W`` and ``b``, **we can call ``L.variables()`` to retrieve all the variables that the loss function depends on:**

In [None]:
print([var.name for var in L.variables()])

['W', 'b']


For complicated ngraphs, the ``variables()`` method makes it easy to iterate over all its dependant variables. Our new parameter update is then

In [None]:
updates = [ng.assign(v, v - alpha * ng.deriv(L, v) / ng.batch_size(Y_hat))
           for v in L.variables()]

Please note that this time we embedded the (call to the) gradient computation inside the definition of the weight update computation. As stated in the previous example, the ``ng.deriv`` function computes the backprop using autodiff. The update step computes the new weight and assigns it to ``W``:

In [None]:
all_updates = ng.doall(updates)

### Computation

We have our update computation as before, but **we also add an evaluation computation** that computes the loss on a separate dataset **without performing the updates**. Requiring only the computation of Loss function as output (first paramenter), we do not pass any Learning Rate or other parameters that are not required for the Loss computation.


In [None]:
transformer = ngt.make_transformer()

update_fun = transformer.computation([L, W, b, all_updates], alpha, X, Y)
eval_fun = transformer.computation(L, X, Y)

For convenience, we define a function that computes the average cost across the validation set.

In [None]:
def avg_loss(xs, ys):
    total_loss = 0
    for x, y in zip(xs, ys):
        loss_val = eval_fun(x, y)
        total_loss += loss_val
    return total_loss / x.shape[-1]

We then generate our training and evaluation sets and perform the updates with the same technique that we used in the previous example. We emit the average loss on the validation set during training. Please note that being lenght of axis H = 1, the number of weights is the same as in the previous example

In [None]:
g = gendata.MixtureGenerator([.5, .5], (ax.W.length, ax.H.length))
XS, YS = g.gen_data(ax.N.length, 10)
EVAL_XS, EVAL_YS = g.gen_data(ax.N.length, 4)

print("Starting avg loss: {}".format(avg_loss(EVAL_XS, EVAL_YS)))
for i in range(10):
    for xs, ys in zip(XS, YS):
        loss_val, w_val, b_val, _ = update_fun(5.0 / (1 + i), xs, ys)
    print("After epoch %d: W: %s, b: %s, avg loss %s" % (i, w_val.T, b_val, avg_loss(EVAL_XS, EVAL_YS)))

Starting avg loss: 0.0216608531773
After epoch 0: W: [[ 0.02026517  0.1151614   0.00125744 -0.05090945]], b: 0.0011455854401, avg loss 0.0203053080477
After epoch 1: W: [[ 0.03013911  0.16921099  0.00255081 -0.07355194]], b: 0.00417448114604, avg loss 0.0197170646861
After epoch 2: W: [[ 0.03663138  0.20406155  0.0036242  -0.08772986]], b: 0.00696960603818, avg loss 0.0193524714559
After epoch 3: W: [[ 0.0414547   0.22961351  0.00453026 -0.09791613]], b: 0.00943646207452, avg loss 0.0190920559689
After epoch 4: W: [[ 0.04528572  0.24970794  0.00531333 -0.10580339]], b: 0.0116233276203, avg loss 0.0188911929727
After epoch 5: W: [[ 0.04845981  0.26622492  0.00600331 -0.11220558]], b: 0.0135828573257, avg loss 0.018728595227
After epoch 6: W: [[ 0.05116731  0.28022128  0.00662058 -0.11757391]], b: 0.0153572438285, avg loss 0.0185925327241
After epoch 7: W: [[ 0.05352654  0.2923488   0.00717952 -0.12218346]], b: 0.0169788394123, avg loss 0.018475885503
After epoch 8: W: [[ 0.05561603  0.3