# [Getting Started with the Gluon Interface](https://github.com/gluon-api/gluon-api#getting-started-with-the-gluon-interface)

In this example from the Github repository we will build and train a simple two-layer artificial neural network (ANN) called a multilayer perceptron.  
First, we need to import `mxnet` and MXNet's implementation of the `gluon` specification.  
We will also need `autograd`, `ndarray`, and `numpy`.  

In [5]:
import mxnet as mx
from mxnet import gluon, autograd, ndarray
import numpy as np

Next, we use `gluon.data.DataLoader`, Gluon's data iterator, to hold the training and test data.  
Iterators are a useful object class for traversing through large datasets.  
We pass Gluon's `DataLoader` a helper, `gluon.data.vision.MNIST`, that will pre-process the MNIST handwriting dataset, getting into the right size and format, using parameters to tell it which is test set and which is the training set.

In [6]:
train_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True, transform=lambda data, label: 
                                     (data.astype(np.float32)/255, label)), batch_size=32, shuffle=True)

test_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=lambda data, label:
                                    (data.astype(np.float32)/255, label)),batch_size=32, shuffle=False)

Now, we are ready to define the actual neural network, and we can do so in five simple lines of code.  
First, we initialize the network with `net = gluon.nn.Sequential()`.  
Then, with that `net`, we create three layers using `gluon.nn.Dense`:  
The first will have 128 nodes, and the second will have 64 nodes.  
They both incorporate the `relu` by passing that into the `activation` function parameter.  
The final layer for our model, `gluon.nn.Dense(10)`, is used to set up the output layer with the number of nodes corresponding to the total number of possible outputs.  
In our case with MNIST, there are only 10 possible outputs because the pictures represent numerical digits of which there are only 10 (i.e., 0 to 9).

In [7]:
# Initialize the model:
net = gluon.nn.Sequential()
# Define the model architecture:
with net.name_scope():
    # The first layer has 128 nodes:
    net.add(gluon.nn.Dense(128, activation="relu"))
    # The second layer has 64 nodes:
    net.add(gluon.nn.Dense(64, activation="relu"))
    # The output layer has 10 possible outputs:
    net.add(gluon.nn.Dense(10))

Prior to kicking off the model training process, we need to initialize the model’s parameters and set up the loss with `gluon.loss.SoftmaxCrossEntropyLoss()` and model optimizer functions with `gluon.Trainer`.  
As with creating the model, these normally complicated functions are distilled to one line of code each.

In [8]:
# Begin with pseudorandom values for all of the model's parameters from a normal distribution
# with a standard deviation of 0.05:
net.collect_params().initialize(mx.init.Normal(sigma=0.05))

# Use the softmax cross entropy loss function to measure how well the model is able to predict
# the correct answer:
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()

# Use stochastic gradient descent to train the model and set the learning rate hyperparameter to .1:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': .1})

Running the training is fairly typical and all the while using Gluon's functionality to make the process simple and seamless.  
There are four steps:  
1) pass in a batch of data;  
2) calculate the difference between the output generated by the neural network model and the actual truth (i.e., the loss);  
3) use Gluon's `autograd` to calculate the derivatives of the model’s parameters with respect to their impact on the loss;  
4) use Gluon's `trainer` method to optimize the parameters in a way that will decrease the loss.  
We set the number of epochs at 10, meaning that we will cycle through the entire training dataset 10 times.  

In [9]:
epochs = 10
for e in range(epochs):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(mx.cpu()).reshape((-1, 784))
        label = label.as_in_context(mx.cpu())
        # Start calculating and recording the derivatives:
        with autograd.record():
            # Optimize parameters -- Forward iteration:
            output = net(data)
            loss = softmax_cross_entropy(output, label)
            loss.backward()
        trainer.step(data.shape[0])
        # Record statistics on the model's performance over each epoch:
        curr_loss = ndarray.mean(loss).asscalar()
    print("Epoch {}. Current Loss: {}.".format(e, curr_loss))

Epoch 0. Current Loss: 0.1128312274813652.
Epoch 1. Current Loss: 0.14288707077503204.
Epoch 2. Current Loss: 0.020673630759119987.
Epoch 3. Current Loss: 0.12083777785301208.
Epoch 4. Current Loss: 0.013107175938785076.
Epoch 5. Current Loss: 0.04734328016638756.
Epoch 6. Current Loss: 0.11852305382490158.
Epoch 7. Current Loss: 0.007883351296186447.
Epoch 8. Current Loss: 0.04016139730811119.
Epoch 9. Current Loss: 0.002568621188402176.


We now have a trained neural network model, and can see how the accuracy improves over each epoch.  
A [Jupyter notebook of this code](https://github.com/gluon-api/gluon-api/blob/master/tutorials/mnist-gluon-example.ipynb) has been provided for your convenience.

To learn more about the Gluon interface and deep learning, you can reference this [comprehensive set of tutorials](http://gluon.mxnet.io/), which covers everything from an introduction to deep learning to how to implement cutting-edge neural network models.