
# Convolutional Neural Network (CNN)

### Definition:

A CNN is a specialized type of neural network for processing grid-like data, such as images. Instead of fully connected layers, CNNs use convolutional layers, pooling layers, and optionally fully connected layers. 

### Convolutional Layer:

The convolution operation for a single filter \( \mathbf{W} \) applied to an input image \( \mathbf{X} \) can be expressed as:
\[\mathbf{Z}(i,j) = \sum_{m=1}^{M} \sum_{n=1}^{N} \mathbf{W}(m,n) \cdot \mathbf{X}(i+m, j+n)\]

Where:
- \( \mathbf{Z}(i,j) \) is the output of the convolution at position \( (i, j) \).
- \( \mathbf{W}(m,n) \) are the learnable weights (filter) of size \( M 	imes N \).
- \( \mathbf{X}(i+m, j+n) \) is the input data (image or feature map).

### Activation Function:

After the convolution, an activation function is applied elementwise to the output \( \mathbf{Z} \), typically the ReLU function:
\[ \mathbf{A}(i,j) = \max(0, \mathbf{Z}(i,j)) \]

### Pooling Layer:

The pooling operation, often max pooling, reduces the dimensionality of the data. It can be expressed as:
\[\mathbf{P}(i,j) = \max_{m,n} \mathbf{A}(i+m, j+n)\]

Where:
- \( \mathbf{P}(i,j) \) is the pooled output.
- \( \max_{m,n} \) denotes taking the maximum value within a local patch (e.g., \( 2 	imes 2 \)).

### Fully Connected Layer:

After several convolution and pooling layers, the data is typically flattened and fed into a fully connected layer, as in an MLP:
\[\mathbf{y} = \sigma(\mathbf{W}^{(L)} \mathbf{h}^{(L-1)} + \mathbf{b}^{(L)})\]

### Backpropagation:

Similar to MLPs, CNNs are trained using backpropagation. The gradients of the loss function \( \mathcal{L}(\mathbf{y}, \mathbf{t}) \) with respect to the filter weights are computed as:

\[rac{\partial \mathcal{L}}{\partial \mathbf{W}} = \sum_{i,j} rac{\partial \mathcal{L}}{\partial \mathbf{Z}(i,j)} \cdot \mathbf{X}(i,j)\]

Where:
- \( rac{\partial \mathcal{L}}{\partial \mathbf{Z}(i,j)} \) is the gradient of the loss with respect to the convolution output.
