### Why

With non-linear transformations, the network can learn better and perform more complex tasks

### Most popular activation functions

#### 1. Step function

$$
f(x)=
\begin{cases}
1& \text{if x ≥ 0}\\
0& \text{otherwise}
\end{cases}
$$

实践中不会使用

#### 2. Sigmoid

$$f(x) = \frac{1}{1+e^{-x}}$$

常用于二分类问题的最后一层

``nn.Sigmoid``
``torch.sigmoid``

#### 3. TanH

$$f(x)=\frac{2}{1+e^{-2x}}-1$$

Hiddeen layers

``nn.TanH``
``torch.tanh``

#### 4. ReLU

$$f(x)=max(0, x)$$

If you don't know what to use, just use a ReLU for hidden layers

``nn.ReLU``
``torch.relu``
``F.relu``

#### 5. Leaky ReLU

$$
f(x)=
\begin{cases}
x & \text{if x ≥ 0}\\
a \cdot x & \text{otherwise}
\end{cases}
$$

Improved version of ReLU. Tries to solve the canishing gradient problem

``nn.LeakyReLU``
``F.leaky_relu``

#### 6. Softmax

$$S(y_i)=\frac{e^{y_i}}{\sum{e^{y_j}}}$$

常用语多分类问题的最后一层

``nn.Softmax``
``torch.softmax``

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [None]:
# option 1 (create nn modeles)
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(NeuralNet, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, 1)
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)
        out = self.sigmoid(out)
        return out

In [None]:
# option 2 (use activation functions directly in forward pass)
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(NeuralNet, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.linear2 = nn.Linear(hidden_size, 1)
        
    def forward(self, x):
        out = torch.relu(self.linear1(x))
        out = torch.sigmoid(self.linear2(out))
        return out