# Modeling in Gluon

Key steps in training a deep network:

1. Define network
1. Initialize parameters
1. Iterate over data
  1. Forward pass (propagate input to generate output)
  1. Compute loss (compare output to true labels)
  1. Compute loss gradient via backpropagation
  1. Update parameters by stochastic gradient descent

Import modules and get device context

In [1]:
import d2l
import mxnet as mx
from mxnet import nd, autograd, gluon

# device context
data_ctx = d2l.try_gpu()
model_ctx = d2l.try_gpu()

#data_ctx = mx.cpu()
#model_ctx = mx.cpu()
print(data_ctx)

cpu(0)


### Blocks in Gluon

`gluon.Block` is the basic constituent for models (e.g. a layer is a block). 

```
class Net(gluon.Block):
    [...]  # __init__ allocates resources (more later)

    # One or more NDArrays can be passed to `forward`
    def forward(self, x):
        # Computation
        # Do something with your data x to compute y
        return y
```

* Blocks hold parameters and functions
* Blocks can be composed to larger blocks (e.g. `Dense` is a block)

### A simple `Block`

In [None]:
net = gluon.nn.Dense(1, in_units=2)  # 1 output, 2 inputs, no activation function
print(net.weight)
print(net.bias)

The `Dense` block contains all relevant parameters. We can get them automatically. 

In [None]:
net.collect_params()

### Manipulating Parameters

The returned object is a `gluon.parameter.ParameterDict`. 

In [None]:
type(net.collect_params())

Before using a network we need to initialize parameters. We need:

* An initializer, many of which live in the `mx.init` module. 
* A device context where the parameters lives (CPU or GPU).

In [None]:
net.collect_params().initialize(mx.initializer.Uniform(0.01), ctx=model_ctx)

Now we can access the actual parameter value:

In [None]:
print(net.weight.data())
print(net.bias.data())

### Optimization

We need an objective (loss) and an optimization algorithm.

In [None]:
square_loss = gluon.loss.L2Loss()

In [None]:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.0001})

### Data

Let's generate some artificial data to train on.

In [None]:
num_inputs   = 2
num_outputs  = 1
num_examples = 10000
noise_sigma  = 0.01

In [None]:
X = nd.random_normal(shape=(num_examples, num_inputs))

def real_fn(X):
    return 2 * X[:, 0] - 3.4 * X[:, 1] + 4.2
y = real_fn(X) + noise_sigma * nd.random_normal(shape=(num_examples, ))

In [None]:
print(X)

In [None]:
print(y)

### Batching

In [None]:
batch_size = 4
train_data = gluon.data.DataLoader(
    gluon.data.ArrayDataset(X, y), batch_size=batch_size, shuffle=True)

### Training loop setup

In [None]:
epochs = 10
num_batches = num_examples / batch_size
print(num_batches)

In [None]:
def train_loop(epochs):
    for e in range(epochs):
        cumulative_loss = 0
        for i, (data, label) in enumerate(train_data):
            data = data.as_in_context(model_ctx)
            label = label.as_in_context(model_ctx)
            with autograd.record():
                output = net(data)
                loss = square_loss(output, label)
            loss.backward()
            trainer.step(batch_size)
            cumulative_loss += nd.mean(loss).asscalar()
        print("Epoch %s, loss: %.4f" % (e, cumulative_loss / num_examples))

In [None]:
train_loop(epochs)

## Getting the learned model parameters

In [None]:
params = net.collect_params() # this returns a ParameterDict
for param in params.values():
    print(param.name,param.data())