# 5-2 An example

Hyperparameters are tunable in training a neural network, you can change the hyperprameters to find the best performance.

In neural networks, there are several hyperparameters that you can tune, like:
- Number of Hidden Layers
- Number of Neurons per Hidden Layers
- Learning Rate
- Batch Size
- Optimizer.

For example, you can change the value of Learning Rate to see what your training will be.

In this section, we just tune one parameter once. You can tune several parameters from a parameter search space using techniques such as [Grid Search](https://machinelearningmastery.com/how-to-grid-search-hyperparameters-for-pytorch-models/), [Random Search](https://machinelearningmastery.com/hyperparameter-optimization-with-random-search-and-grid-search/), and [ray tune](https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html).



In [None]:
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.autograd import Variable

## Define hyperparameters

It is important to tune these parameters so that you can extract the most possible from the models.

In [None]:
# set img_size = (28,28) ---> 28*28=784 pixels in total
input_size = 784

# number of nodes at hidden layer
hidden_size = 500

# number of output classes discrete range [0,9]
num_classes = 10

# number of times which the entire dataset is passed throughout the model
num_epochs = 30

# the size of input data took for one iteration
batch_size = 100

# learning rate
lr = 1e-3

## Loading data

In [None]:
train_data = dsets.MNIST(root = './data', train = True,
                        transform = transforms.ToTensor(), download = True)

test_data = dsets.MNIST(root = './data', train = False,
                       transform = transforms.ToTensor())

In [None]:
train_gen = torch.utils.data.DataLoader(dataset = train_data,
                                        batch_size = batch_size,
                                        shuffle = True)

test_gen = torch.utils.data.DataLoader(dataset = test_data,
                                      batch_size = batch_size,
                                      shuffle = False)

## Define model

In [None]:
class Net(nn.Module):
  def __init__(self, input_size, hidden_size, num_classes):
    super(Net,self).__init__()
    self.fc1 = nn.Linear(input_size, hidden_size)
    self.relu = nn.ReLU()  # Relu activation function, you can also use others like Tanh, Sigmold, etc.
    self.fc2 = nn.Linear(hidden_size, num_classes)

  def forward(self,x):
    out = self.fc1(x)
    out = self.relu(out)
    out = self.fc2(out)
    return out

In [None]:
net = Net(input_size, hidden_size, num_classes)
if torch.cuda.is_available():
  net.cuda()

## Define loss-function & optimizer

In [None]:
loss_function = nn.CrossEntropyLoss()

# Adam optimizer -- you can also use SGD, AdaGrad or RMSProp, etc.
optimizer = torch.optim.Adam( net.parameters(), lr=lr)

## Training the model

In [None]:
for epoch in range(num_epochs):
  for i ,(images,labels) in enumerate(train_gen):
    # if you have GPU, you can set as  .cuda()
    images = Variable(images.view(-1,28*28)).cuda()
    # otherwise, remove the .cuda(), as below
    # images = Variable(images.view(-1,28*28))

    # if you have GPU, you can set as  .cuda()
    labels = Variable(labels).cuda()
    # otherwise, remove the .cuda(), as below
    # labels = Variable(labels)

    optimizer.zero_grad()
    outputs = net(images)
    loss = loss_function(outputs, labels)
    loss.backward()
    optimizer.step()

    if (i+1) % 100 == 0:
      print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
                 %(epoch+1, num_epochs, i+1, len(train_data)//batch_size, loss.item()))


## Evaluating the accuracy of the model

In [None]:
correct = 0
total = 0
for images,labels in test_gen:
  # if you have GPU, you can set as  .cuda()
  images = Variable(images.view(-1,28*28)).cuda()
  # otherwise, remove the .cuda(), as below
  # images = Variable(images.view(-1,28*28))

  # labels = labels.cuda()
  labels = labels

  output = net(images)
  _, predicted = torch.max(output,1)
  correct += (predicted.cpu().numpy() == labels).sum()  # .cpu() tranfers the data from GPU to CPU, and .numpy() converts the data from torch to numpy
  total += labels.size(0)

print('Accuracy of the model: %.3f %%' %((100*correct)/(total+1)))

When **num_epochs** = 30, **learning rate** = 1e-3, and **batch size** = 100, the result is around 98%. When we change the **num_epochs** and the other hyperparameters, we can obatain other results. We can iterate the parameters and plot the best results.

This is left as an exercise for you, including the plot.