Let's create a simple neural network that will classify the Iris flower dataset. The following is the code block for creating a simple neural network: 

In [None]:
import pandas as pd

dataset = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'])

dataset['species'] = pd.Categorical(dataset['species']).codes

dataset = dataset.sample(frac=1, random_state=1234)

# split the data set into train and test subsets
train_input = dataset.values[:120, :4]
train_target = dataset.values[:120, 4]

test_input = dataset.values[120:, :4]
test_target = dataset.values[120:, 4]

The preceding code is boilerplate code that downloads the Iris dataset CSV file and then loads it into the pandas DataFrame. We then shuffle the DataFrame rows and split the code into numpy arrays, train_input/train_target (flower properties/flower class), for the training data and test_input/test_target for the test data. 
We'll use 120 samples for training and 30 for testing. If you are not familiar with pandas, think of this as an advanced version of NumPy.

Let's define ourfirst neural network. We'll use a feedforward network with one hidden layer with five units, a ReLU activation function (this is just another type of activation, defined simply as *f(x) = max(0, x)*), and an output layer with three units. The output layer has three units, whereas each unit corresponds to one of the three classes of Iris flower. The following is the PyTorch definition of the network:

In [None]:
import torch

torch.manual_seed(1234)

hidden_units = 5

net = torch.nn.Sequential(
    torch.nn.Linear(4, hidden_units), # we'll use a network with 4 hidden units
    torch.nn.ReLU(), # ReLU activation
    torch.nn.Linear(hidden_units, 3) # 3 output units for each of the 3 possible classes
)

We'll use one-hot encoding for the target data. This means that each class of the flower will be represented as an array (Iris Setosa = [1, 0, 0], Iris Versicolour = [0, 1, 0], and Iris Virginica = [0, 0, 1]), and one element of the array will be the target for one unit of the output layer. When the network classifies a new sample, we'll determine the class by taking the unit with the highest activation value. 
`torch.manual_seed(1234)` enables us to use the same random data every time for the reproducibility of results. 

Next, we'll choose the loss function:

In [None]:
criterion = torch.nn.CrossEntropyLoss()

With the `criterion` variable, we define the loss function that we'll use, in this case, this is cross-entropy loss. The loss function will measure how different the output of the network is compared to the target data.

We then define the stochastic gradient descent (SGD) optimizer (a variation of the gradient descent algorithm) with a learning rate of 0.1 and a momentum of 0.9:

In [None]:
optimizer = torch.optim.SGD(net.parameters(), lr=0.1, momentum=0.9)

Now, let's train the network: 

In [None]:
epochs = 50

for epoch in range(epochs):
    inputs = torch.autograd.Variable(torch.Tensor(train_input).float())
    targets = torch.autograd.Variable(torch.Tensor(train_target).long())

    optimizer.zero_grad()
    out = net(inputs)
    loss = criterion(out, targets)
    loss.backward()
    optimizer.step()

    if epoch == 0 or (epoch + 1) % 10 == 0:
        print('Epoch %d Loss: %.4f' % (epoch + 1, loss.item()))

Epoch 1 Loss: 1.2181
Epoch 10 Loss: 0.6745
Epoch 20 Loss: 0.2447
Epoch 30 Loss: 0.1397
Epoch 40 Loss: 0.1001
Epoch 50 Loss: 0.0855


We'll run the training for 50 epochs, which means that we'll iterate 50 times over the training dataset: 


1.   Create the torch variable that are `input` and `target` from the numpy array train_input and train_target. 
2.   Zero the gradients of the optimizer to prevent accumulation from the previous iterations. We feed the training data to the neural network net (input) and we compute the loss function criterion (out, targets) between the network output and the target data.
3.   Propagate the loss value back through the network. We do this so that we can calculate how each network weight affects the loss function. 
4.   The optimizer updates the weights of the network in a way that will reduce the future loss function values.

Let's see what the final accuracy of our model is: 

In [None]:
import numpy as np

inputs = torch.autograd.Variable(torch.Tensor(test_input).float())
targets = torch.autograd.Variable(torch.Tensor(test_target).long())

optimizer.zero_grad()
out = net(inputs)
_, predicted = torch.max(out.data, 1)

error_count = test_target.size - np.count_nonzero((targets == predicted).numpy())
print('Errors: %d; Accuracy: %d%%' % (error_count, 100 * torch.sum(targets == predicted) / test_target.size))

Errors: 0; Accuracy: 100%
