# Activation Functions
- Activation function applies a non-linear transformation and decides whether a neuron should be activated or not.
- Without activation functions, the model is basically just a stacked linear regression model.
- With non-linear transformations, our network can learn better and perform more complex tasks.
- After each layer we typically use an activation function.
- Most popular activation functions: 
    1. Step: Not used in practice
    2. Sigmoid: Typically used in last layer of a binary classification problem
    3. TanH: Range of (1, -1). Used in hidden layers.
    4. ReLU: f(x) = max(0, x). If you don't know what to use, just use ReLU for hidden layer.
    5. Leaky ReLU: Improved version of ReLU. Tries to solve the vanishing gradient problem.
    6. Softmax: Good in last layer in multi-class classification problem.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

### Option 1 (Create nn modules)

In [2]:
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(NeuralNet, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, 1)
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        x = self.linear1(x)
        x = self.relu(x)
        x = self.linear2(x)
        x = self.sigmoid(x)
        return x

### Option 2 (use activation functions directly in forward pass)

In [3]:
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(NeuralNet, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.linear2 = nn.Linear(hidden_size, 1)
    
    def forward(self, x):
        x = torch.relu(self.linear1(x)) # or, x = F.relu(...)
        x = torch.sigmoid(self.linear2(x))
        return x