In [1]:
from mxnet import nd, autograd, optimizer, gluon

# The optimizer library contains various mx and optimizers that are implementations of popular and state of the art deep learning optimization algorithms.

In [2]:
# Create and initialize a simple model
net = gluon.nn.Dense(1)
net.initialize()

In [4]:
# Before using the trainer to update model parameters, forward and backward passes must be run first

# The dataset has eight samples with four features
batch_size = 8
X = nd.random.uniform(shape=(batch_size, 4))
y = nd.random.uniform(shape=(batch_size,))

loss = gluon.loss.L2Loss()

def forward_backward():
    with autograd.record():
        l = loss(net(X), y)
    l.backward()

forward_backward()

Now, let's create a trainer instance using the model parameters and the simple optimizer stochastic gradient descent with learning rate as 1.

When creating an gluon trainer you must provide a collection of parameters that need to be learned.
- To get all of the trainable parameters from a gluon network or block you can invoke the `collect_params' method of the network, it returns a parameter dictionary. Our trainer uses these to access the parameters that need to be updated.
- You also provide an `optimizer` that will be used to calculate new values of parameters every training iteration.
- You can also specify the hyperparameters for the optimizer using `optimizer_params`.

In [6]:
# Create a trainer instance using the model parameter and the simple optimizer stochastic gradient descent with learning rate as 1

trainer = gluon.Trainer(net.collect_params(), optimizer='sgd', optimizer_params={'learning_rate':1})

Before updating, let's check the current network parameters. We can do that by accessing the data in the `weight` field of the block.

In [7]:
curr_weight = net.weight.data().copy()
print(curr_weight)


[[ 0.06700657 -0.00369488  0.0418822   0.0421275 ]]
<NDArray 1x4 @cpu(0)>


### Trainer Step

Now, we will call the `step` method to perform one update.

We provide the batch size as an argument to normalize the size of the gradient and make it independent of the batch. Otherwise, we'd get larger gradients with larger batch sizes.

In [8]:
trainer.step(batch_size)
print(net.weight.data())


[[0.26870278 0.2342061  0.444427   0.33788317]]
<NDArray 1x4 @cpu(0)>


We can see the network parameters have now changed.

### Results of SGD update

Since we use spleen SGD the update rule is the old weight minus the learning rate times the gradient.

In [9]:
# We can verify our trainer step by running the following code snippet which is explicitly performing the SGD update.
print(curr_weight - net.weight.grad() * 1 / batch_size)


[[0.26870278 0.2342061  0.444427   0.33788317]]
<NDArray 1x4 @cpu(0)>


The values are identical to the result of trainer dot step earlier.

### Using Optimizer Instance

we can also pass an optimizer instance directly into the trainer constructor.

In this case we'll use the Adam Optimizer the popular adaptive optimizer for deep learning. In MXNet you can simply call `optimizer.Adam` and pass in the learning rate. We can initialize the gluon trainer as before with the network parameters. But now we'll pass the optimizer objects directly as an argument.

In [10]:
optim = optimizer.Adam(learning_rate=1)
trainer = gluon.Trainer(net.collect_params(), optim)

In [11]:
forward_backward()
trainer.step(batch_size)
net.weight.data()


[[-0.73130214 -0.7657989  -0.5555788  -0.6621224 ]]
<NDArray 1x4 @cpu(0)>

### Changing the Learning Rate

sometimes we may need to change the learning rate during training.

In [12]:
# Accessing the current Learning Rate
trainer.learning_rate

1

In [13]:
# Changing the Learning Rate
trainer.set_learning_rate(0.1)
trainer.learning_rate

0.1

In [14]:
trainer = gluon.Trainer(net.collect_params(),
                        'adam',
                        optimizer_params={'learning_rate':0.001})

In [17]:
with autograd.record():
    loss = loss(net(X), y)

In [18]:
autograd.backward(loss)

In [20]:
trainer.step(batch_size)