#  Complete CNN Forward Pass Notes

##  What is CNN?
A **Convolutional Neural Network (CNN)**:
- Automatically **learns features from images** using:
  - Convolution
  - Activation
  - Pooling
  - Flattening
  - Dense layers

---

## 1. Input
- Grayscale image of size:
$$
N \times N
$$
- Example:
$$
\begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9
\end{bmatrix}
$$
---

## 2. Convolution

###  Process:
- Use a **filter (kernel)** of size:
$$
F \times F
$$
- Slide it over the image with **stride `S` (typically 1)**.
- For each position:
  - Take **element-wise multiplication**:
$$
\text{Region} \times \text{Kernel}
$$
  - Sum all products to get a scalar for that position in the **feature map**.

###  Output Size:
$$
O = \left\lfloor \frac{N - F}{S} \right\rfloor + 1
$$

###  Example Calculation:
For a `3x3` image, `2x2` filter, stride `1`:
$$
O = \left\lfloor \frac{3 - 2}{1} \right\rfloor + 1 = 2
$$
Feature map will be:
$$
2 \times 2
$$

---

##  Activation (ReLU)

Apply:
$$
\text{ReLU}(x) = \max(0, x)
$$
to each element of the feature map to:
*  Introduce **non-linearity**  
*  Zero out negative activations

---

##  Max Pooling

###  Purpose:
- Downsamples while retaining **important features**.
- Uses a pool size:

$P \times P$

with stride `S`:
- `S = P` for **non-overlapping pooling**.
- `S = 1` for **overlapping pooling**.

###  Operation:
- For each $P \times P$ block, take:
$$
\max \left( \text{block values} \right)
$$

###  Output Size:
$$
O = \left\lfloor \frac{N - P}{S} \right\rfloor + 1
$$

---

##  Flattening

- Converts a $d \times d$ pooled feature map to:
$$
\mathbb{R}^{d^2}
$$
- Example:
$$
\begin{bmatrix}
5 & 6 \\
8 & 9
\end{bmatrix}
\rightarrow
[5, 6, 8, 9]
$$
- Prepares for Dense layers.

---

## Dense Layer (Fully Connected)

Maps the **flattened vector to output classes**:
- Uses:
$$
y = W x + b
$$
where:
- $ x $: flattened input
- $ W $: weight matrix
- $ b $: bias vector

### Follow with **activation**:
- `ReLU` for hidden layers.
- `Softmax` for multi-class output:
$$
\text{Softmax}(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}
$$
- `Sigmoid` for binary output:
$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$

---

##  Summary Flow:

Input Image
↓
Convolution (Feature Extraction)
↓
ReLU (Non-Linearity)
↓
Max Pooling (Downsampling)
↓
Flatten (1D Vector)
↓
Dense Layers (Classification)


##  Key Points:
 **Convolution:** Local pattern detection.  
 **ReLU:** Adds non-linearity.  
 **Pooling:** Reduces spatial dimensions.  
 **Flatten:** Converts to 1D for Dense layers.  
 **Dense:** Maps features to outputs.


In [None]:
import numpy as np

image = np.array([
    [1, 2, 0, 2, 1],
    [0, 1, 3, 1, 0],
    [2, 2, 1, 0, 1],
    [1, 0, 1, 3, 2],
    [0, 1, 2, 2, 1]
])


print(" Original Image:\n", image)

# Define a kernel (filter) for edge detection (simple vertical filter)
kernel = np.array([
    [-1, 0, 1],
    [-1, 0, 1],
    [-1, 0, 1]
])


 Original Image:
 [[1 2 0 2 1]
 [0 1 3 1 0]
 [2 2 1 0 1]
 [1 0 1 3 2]
 [0 1 2 2 1]]


In [None]:
output_shape = (image.shape[0]-kernel.shape[0]+1,image.shape[1]-kernel.shape[1]+1)
feature_map = np.zeros(output_shape)

print(output_shape,feature_map)

(3, 3) [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [None]:
for i in range(output_shape[0]):
  for j in range(output_shape[1]):
    region = image[i:i+kernel.shape[0],j:j+kernel.shape[1]]
    feature_map[i,j]=np.sum(region*kernel)
print(feature_map)

[[ 1. -2. -2.]
 [ 2.  1. -2.]
 [ 1.  2.  0.]]


In [None]:
feature_map_relu = np.maximum(0,feature_map)
print(feature_map_relu)

[[1. 0. 0.]
 [2. 1. 0.]
 [1. 2. 0.]]


In [None]:
# Pooling parameters
filter_size = 2
stride = 1

# Calculate output shape for stride = 1
pooled_shape = (
    (feature_map_relu.shape[0] - filter_size) // stride + 1,
    (feature_map_relu.shape[1] - filter_size) // stride + 1
)

pooled = np.zeros(pooled_shape)

In [None]:
for i in range(pooled_shape[0]):
    for j in range(pooled_shape[1]):
        region = feature_map_relu[i:i+filter_size, j:j+filter_size]
        pooled[i, j] = np.max(region)

print("\n After Max Pooling with stride=1:\n", pooled)


 After Max Pooling with stride=1:
 [[2. 1.]
 [2. 2.]]


In [None]:
flattened = pooled.flatten()
print(flattened)


[2. 1. 2. 2.]
