# Architecture

## Feature Learning

In [19]:
import numpy as np
import keras
import torch
import matplotlib.pyplot as plt

### Convolution Layer

**Keras**

In [50]:
from keras.layers import Conv1D, Conv2D, Conv3D

**PyTorch**

In [None]:
from torch.nn import Conv1d, Conv2d, Conv3d

Takes input image $x_\text{in}$ and applies a kernel function $\Omega$, such that,

$$x_\text{out} = \Omega(x_\text{in})\tag{1}$$

where the output image $x_\text{out}$ is the result of convolution using kernel function $\Omega$ and the input image $x_\text{in}$. The dimensions of $x_\text{out} \in \mathbb{R}^{W_\text{out} \times H_\text{out} \times D_\text{out}}$ are dependent on the dimensions of $x_\text{in} \in \mathbb{R}^{W_\text{in} \times H_\text{in} \times D_\text{in}}$ and the hyperparameters of the kernel function $\Omega$: the number of kernels $K$, kernel size $F$, the stride $S$, and the amount of padding $P$.

$$x_\text{in} \in \mathbb{R}^{W_\text{in} \times H_\text{in} \times D_\text{in}} \quad\overset{\Omega}{\longrightarrow}\quad x_\text{out} \in \mathbb{R}^{W_\text{out} \times H_\text{out} \times D_\text{out}}$$

where

$$W_\text{out} = \frac{W_\text{in} - F + 2P}{S} + 1\tag{2}$$

$$H_\text{out} = \frac{H_\text{in} - F + 2P}{S} + 1\tag{3}$$

$$D_\text{out} = K\tag{4}$$

For example, let $x_\text{in} \in \mathbb{R}^{5 \times 5 \times 1}$ and the kernel filter $\omega \in \mathbb{R}^{3 \times 3}$. We compute the dimensions of the $x_\text{out}$ resuting from convolution of $x_\text{in}$ and $\omega$ using stride $S=1$ and no padding $P=0$ as follows:

$$W_\text{out} = \frac{W_\text{in} - F + 2P}{S} + 1 = \frac{5 - 3 + 2(0)}{1} + 1 = 3$$

$$H_\text{out} = \frac{H_\text{in} - F + 2P}{S} + 1 = \frac{5 - 3 + 2(0)}{1} + 1 = 3$$

$$D_\text{out} = K = 1$$

Therefore, $x_\text{out} \in \mathbb{R}^{3 \times 3 \times 1}$.

Further information about specific kernels and a more in-depth introduction to convolution can be found in the `image_processing` repository.

### Activation Layer

We use activation functions to increase non-linearity in the network. In theory, applying an activation function to convolutional layers is similar to biological action potential thresholding in the firing of neurons.

#### Rectified Linear Unit  (ReLU)
$$f(x) = \begin{cases} 0 & \text{for } x \leq 0 \\
x & \text{for } x > 0 \end{cases}$$

In [78]:
relu = keras.layers.Activation('relu')
relu = torch.nn.ReLU()

#### Sigmoid
$$f(x) = \sigma(x) = \frac{1}{1 + e^{-x}}$$

In [79]:
sigmoid = keras.layers.Activation('sigmoid')
sigmoid = torch.nn.Sigmoid()

#### Hyperbolic Tangent (tanh)
$$f(x) = \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$$

In [80]:
tanh = keras.layers.Activation('tanh')
tanh = torch.nn.Tanh()

#### Softmax
$$P(y=j| \mathbf{x}) = \frac{e^{\mathbf{x}^\mathsf{T}\mathbf{w}_j}}{\sum_{k=1}^{k} e^{\mathbf{x}^\mathsf{T}\mathbf{w}_k}}$$

In [81]:
softmax = keras.layers.Activation('softmax')
softmax = torch.nn.Softmax()

#### Exponential Linear Unit (ELU)
$$f(\alpha, x) = \begin{cases} \alpha (e^x - 1) & \text{for } x \leq 0 \\
x & \text{for } x > 0 \end{cases}$$

In [83]:
elu = keras.layers.Activation('elu')
elu = torch.nn.ELU()

#### Softplus
$$f(x) = \ln(1 + e^x)$$

In [84]:
softplus = keras.layers.Activation('softplus')
softplus = torch.nn.Softplus()

#### Softsign
$$f(x) = \frac{x}{1 + |x|}$$

In [85]:
softsign = keras.layers.Activation('softsign')
softsign = torch.nn.Softsign()

#### Others

- Leaky ReLU

- PReLU

- RReLU

- SELU

- GELU

### Pooling Layer

#### Max Pooling

In [86]:
from keras.layers import MaxPooling1D, MaxPooling2D, MaxPooling3D
from torch.nn import MaxPool1d, MaxPool2d, MaxPool3d

#### Average Pooling

In [87]:
from keras.layers import AveragePooling1D, AveragePooling2D, AveragePooling3D
from torch.nn import AvgPool1d, AvgPool2d, AvgPool3d

### Loss Layer

### Regularization

### Dropout