# PyTorch Optimizers

[PyTorch optimizers](https://pytorch.org/docs/stable/optim.html) are important tools that help improve how a neural network learns from data by adjusting the model's parameters. By using these optimizers, like stochastic gradient descent (SGD) with momentum or Adam, we can quickly get started learning!

[PyTorch tutorial](https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html)


**Gradients**: Directions and amounts by which a function increases most. The parameters can be changed in a direction opposite to the gradient of the loss function in order to reduce the loss.

**Learning Rate**: This hyperparameter specifies how big the steps are when adjusting the neural network's settings during training. Too big, and you might skip over the best setting; too small, and it'll take a very long time to get there.

**Momentum**: A technique that helps accelerate the optimizer in the right direction and dampens oscillations.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

In [2]:
class MLP(nn.Module):
    def __init__(self, input_size:int):
        super(MLP, self).__init__()
        self.hidden_layer = nn.Linear(in_features=input_size, out_features=64)
        self.output_layer = nn.Linear(64, 2)
        self.activation = nn.ReLU()
        # Single dimension input to the softmax layer, so dim=0
        self.softmax = nn.Softmax(dim=0)
        # https://discuss.pytorch.org/t/implicit-dimension-choice-for-softmax-warning/12314/17

    # def forward(self, x):
    #     hidden_pass = self.hidden_layer(x)
    #     activation_pass = self.activation(hidden_pass)
    #     output_pass = self.output_layer(activation_pass)
    #     return output_pass

    def forward(self, x):
        x = self.hidden_layer(x)
        x = self.activation(x)
        x = self.output_layer(x)
        x = self.softmax(x)
        
        return x

In [3]:
mlp_net = MLP(input_size=10)
# torch.manual_seed(0)
# mlp_net.forward(torch.rand(10))

In [4]:
# SGD, Stochastic Gradient Descent
optmzr = optim.SGD(
    params=mlp_net.parameters(),
    lr=0.01,
    momentum=0.9)

In [5]:
# Adam
optmzr = optim.Adam(
    params=mlp_net.parameters(),
    lr=0.01)