# What are Activation Functions?

They apply a **non-linear** transformation and decide whether a neuron should be activated or not.

Without activation layers, then a neural network is just a stacked linear regression model that is not suited for complex tasks

Applied after each layer

## Examples

##### Step Function
1 if X is > Y , where Y is a certain value
0 Otherwise

Not used

#### Sigmoid Function
 1 / (1 + e ^ (-x))

 Typically used in the last layer of a binary classification

 #### TanH
Scaled sigmoid function that goes between -1 and 1

 2 / (1 + e ^ (-2x)) - 1

 #### ReLU

 max(0, x) - Outputs zero for negative values, otherwise the same value

 If you don't know what to use, use ReLU for hidden layers

 #### Leaky ReLU

 x if x >= 0

 a * x otherwise

 Solves the vanishing gradient problem 


 #### Softmax function

 Good in last layer for multi-class classification problems

In [1]:
import torch
import torch.nn as nn

In [5]:
class NeuralNet(nn.Module):
  def __init__(self, input_size, hidden_size):
    super().__init__()
    self.linear1 = nn.Linear(input_size, hidden_size)
    self.linear2 = nn.Linear(hidden_size, 1)
    #Alternative: Define here layers for relu and other activation functions
    # Example
    #self.relu = nn.ReLU()
    # self.softmax = nn.Softmax()

  def forward(self, x):
    out = torch.relu(self.linear1(x))
    out = torch.sigmoid(self.linear2(x))
    return out