# Modeling in Gluon

Key steps in training a deep network:

1. Define network
1. Initialize parameters
1. Iterate over data
  1. Forward pass (propagate input to generate output)
  1. Compute loss (compare output to true labels)
  1. Compute loss gradient via backpropagation
  1. Update parameters by stochastic gradient descent

Import modules and get device context

In [1]:
import d2l
import mxnet as mx
from mxnet import nd, autograd, gluon

# device context
data_ctx = d2l.try_gpu()
model_ctx = d2l.try_gpu()

print(data_ctx)

gpu(0)


### Blocks in Gluon

`gluon.Block` is the basic constituent for models (e.g. a layer is a block). 

```
class Net(gluon.Block):
    [...]  # __init__ allocates resources (more later)

    # One or more NDArrays can be passed to `forward`
    def forward(self, x):
        # Computation
        # Do something with your data x to compute y
        return y
```

* Blocks hold parameters and functions
* Blocks can be composed to larger blocks (e.g. `Dense` is a block)

### A simple `Block`

In [2]:
net = gluon.nn.Dense(1, in_units=2)  # 1 output, 2 inputs, no activation function
print(net.weight)
print(net.bias)

Parameter dense0_weight (shape=(1, 2), dtype=float32)
Parameter dense0_bias (shape=(1,), dtype=float32)


The `Dense` block contains all relevant parameters. We can get them automatically. 

In [3]:
net.collect_params()

dense0_ (
  Parameter dense0_weight (shape=(1, 2), dtype=float32)
  Parameter dense0_bias (shape=(1,), dtype=float32)
)

### Manipulating Parameters

The returned object is a `gluon.parameter.ParameterDict`. 

In [4]:
type(net.collect_params())

mxnet.gluon.parameter.ParameterDict

Before using a network we need to initialize parameters. We need:

* An initializer, many of which live in the `mx.init` module. 
* A device context where the parameters lives (CPU or GPU).

In [5]:
net.collect_params().initialize(mx.initializer.Uniform(0.01), ctx=model_ctx)

Now we can access the actual parameter value:

In [6]:
print(net.weight.data())
print(net.bias.data())


[[0.00097627 0.00185689]]
<NDArray 1x2 @gpu(0)>

[0.]
<NDArray 1 @gpu(0)>


### Optimization

We need an objective (loss) and an optimization algorithm.

In [7]:
square_loss = gluon.loss.L2Loss()

In [8]:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.0001})

### Data

Let's generate some artificial data to train on.

In [9]:
num_inputs   = 2
num_outputs  = 1
num_examples = 10000
noise_sigma  = 0.01

In [10]:
X = nd.random_normal(shape=(num_examples, num_inputs))

def real_fn(X):
    return 2 * X[:, 0] - 3.4 * X[:, 1] + 4.2
y = real_fn(X) + noise_sigma * nd.random_normal(shape=(num_examples, ))

In [11]:
print(X)


[[ 0.7740038   1.0434405 ]
 [ 1.1839255   1.8917114 ]
 [-1.2347414  -1.771029  ]
 ...
 [ 0.08873925 -0.45150325]
 [-0.13049959  0.15614532]
 [-0.22753173 -0.19928493]]
<NDArray 10000x2 @cpu(0)>


In [12]:
print(y)


[2.2110515  0.13669638 7.760503   ... 5.9032497  3.4229667  4.4221096 ]
<NDArray 10000 @cpu(0)>


### Batching

In [13]:
batch_size = 4
train_data = gluon.data.DataLoader(
    gluon.data.ArrayDataset(X, y), batch_size=batch_size, shuffle=True)

### Training loop setup

In [14]:
epochs = 10
num_batches = num_examples / batch_size
print(num_batches)

2500.0


In [15]:
def train_loop(epochs):
    for e in range(epochs):
        cumulative_loss = 0
        for i, (data, label) in enumerate(train_data):
            data = data.as_in_context(model_ctx)
            label = label.as_in_context(model_ctx)
            with autograd.record():
                output = net(data)
                loss = square_loss(output, label)
            loss.backward()
            trainer.step(batch_size)
            cumulative_loss += nd.mean(loss).asscalar()
        print("Epoch %s, loss: %.4f" % (e, cumulative_loss / num_examples))

In [16]:
train_loop(epochs)

Epoch 0, loss: 3.2707
Epoch 1, loss: 1.9822
Epoch 2, loss: 1.2014
Epoch 3, loss: 0.7281
Epoch 4, loss: 0.4413
Epoch 5, loss: 0.2675
Epoch 6, loss: 0.1621
Epoch 7, loss: 0.0983
Epoch 8, loss: 0.0596
Epoch 9, loss: 0.0361


## Getting the learned model parameters

In [17]:
params = net.collect_params() # this returns a ParameterDict
for param in params.values():
    print(param.name,param.data())

dense0_weight 
[[ 1.845069  -3.1170497]]
<NDArray 1x2 @gpu(0)>
dense0_bias 
[3.85613]
<NDArray 1 @gpu(0)>
