# 🧠 How a Convolutional Neural Network (CNN) Works

CNN is a deep learning architecture designed to process grid-like data, especially images. Here's a breakdown of how a CNN works from start to finish:

---

## 🔹 1. Input Layer

- The input is typically an image (e.g., grayscale 28×28 pixels).
- Each pixel is normalized:

$$
p_{\text{normalized}} = \frac{p}{255}
$$

---

## 🔹 2. Convolution Layer

- A small matrix called a **kernel** or **filter** slides over the image.
- At each position \((i, j)\), a dot product is calculated:

$$
S(i, j) = \sum_{m=0}^{k-1} \sum_{n=0}^{k-1} I(i+m, j+n) \times K(m, n)
$$


- This operation detects features like edges, textures, etc.

---

## 🔹 3. Activation Function (ReLU)

- Applies a non-linear function:

$$
f(x) = \max(0, x)
$$

- Negative values become zero; positive values remain the same.
- Helps the network learn complex patterns.

---

## 🔹 4. Pooling Layer (Max Pooling)

- Reduces spatial size and keeps strong signals:

$$
P(i, j) = \max \{
x_{2i, 2j},\
x_{2i+1, 2j},\
x_{2i, 2j+1},\
x_{2i+1, 2j+1}
\}
$$

- Reduces computation and prevents overfitting.

---

## 🔹 5. Flattening

- Converts the 2D pooled feature maps into a 1D vector:

$$
\text{flattened} = [x_1, x_2, ..., x_n]
$$

---

## 🔹 6. Fully Connected Layer (Dense)

- Multiplies flattened input with weights and adds bias:

$$
\text{logits} = W \cdot \text{flattened} + b
$$

- Produces raw class scores (logits).

---

## 🔹 7. Softmax Layer

- Converts logits into probabilities:

$$
\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}
$$

---

## 🔹 8. Loss Function (Cross-Entropy)

- Measures how well the prediction matches the true label:

$$
L = -\log(p_{\text{target}})
$$

---

## 🔹 9. Backpropagation & Weight Update

- Calculates gradient and updates weights:
$$
W := W - \alpha \frac{\partial L}{\partial W}, \quad b := b - \alpha \frac{\partial L}{\partial b}
$$

---

## ✅ Final Prediction

- The class with the highest probability is chosen as output.

---

# ✅ Why Use CNN?

- **Preserves spatial structure:** Unlike traditional ANNs, CNNs understand the layout and nearby relationships in images.
- **Efficient with fewer parameters:** Thanks to local connectivity and weight sharing.
- **Automatic feature extraction:** CNNs learn to detect edges, textures, shapes without manual intervention.
- **Highly accurate in visual tasks:** Used in image classification, object detection, facial recognition, etc.

---

# 🔄 Why (or When) Use ANN Instead?

- **Use ANN when data is flat or tabular**, like:
  - Customer records
  - Stock market data
  - Sensor values
- **ANNs are simpler** and work well when the input doesn't have spatial/temporal structure.
- **Not suitable for images or sequences** — unless combined with CNNs or RNNs.

---



# CNN Working: Simple Example with a Single Convolution Kernel

This notebook demonstrates how a basic Convolutional Neural Network (CNN) works on a simple image using Python and NumPy.
No advanced libraries or math are required—just simple steps to understand the core ideas behind CNNs!

---

## 1. Load and Prepare the Image

We load an image, convert it to grayscale, resize it to 28x28 pixels, and normalize the pixel values to be between 0 and 1.


In [27]:
from PIL import Image
import numpy as np

# Load and prepare the image
image = Image.open("/content/tree.jpg").convert("L")  # Grayscale conversion
image = image.resize((28, 28))  # Resize image to 28x28
input_image = np.array(image, dtype=float) / 255.0  # Normalize pixels to [0, 1]

print(f"Input Image shape: {input_image.shape}")
print(f"Input Image sample data (5x5):\n{input_image[:5, :5]}")


Input Image shape: (28, 28)
Input Image sample data (5x5):
[[0.2627451  0.26666667 0.2627451  0.2627451  0.26666667]
 [0.27058824 0.2745098  0.27843137 0.27843137 0.28627451]
 [0.29019608 0.29019608 0.29019608 0.29411765 0.29803922]
 [0.30588235 0.30588235 0.30196078 0.30588235 0.30980392]
 [0.31764706 0.31764706 0.31372549 0.31764706 0.32156863]]


**Description**:
We start by loading a picture, turning it into black and white (grayscale), and resizing it to a small 28x28 pixel image. Then, the pixel colors are turned into numbers between 0 and 1 to make calculations easier. This small image will be the input for our CNN.

- The image is converted to grayscale and resized to 28×28 pixels.

- Each pixel value \( p \) is normalized:

$$
p_{\text{normalized}} = \frac{p}{255}
$$


### 2. Define Convolution Kernel
A fixed 3x3 kernel to detect vertical edges in the image.

In [28]:
kernel = np.array([
    [1, 0, -1],
    [1, 0, -1],
    [1, 0, -1]
], dtype=float)

print(f"\nConvolution Kernel:\n{kernel}")



Convolution Kernel:
[[ 1.  0. -1.]
 [ 1.  0. -1.]
 [ 1.  0. -1.]]


**Description**:
This small grid of numbers (called a kernel) acts like a filter. It will scan over the image to detect vertical edges — areas where color changes sharply from left to right.



- This \(3 \times 3\) matrix is a filter to detect vertical edges.

- The kernel \( K \) looks like:

$$
K = \begin{bmatrix}
1 & 0 & -1 \\
1 & 0 & -1 \\
1 & 0 & -1
\end{bmatrix}
$$


### 3. Define Convolution Operation
This function performs a 2D convolution by sliding the kernel over the image and computing element-wise multiplication sums.

In [29]:
def convolve2d(img, kernel):
    k = kernel.shape[0]  # Size of the kernel (3)
    out_dim = img.shape[0] - k + 1  # Calculate output size after applying kernel
    out = np.zeros((out_dim, out_dim))  # Prepare empty space for result
    for i in range(out_dim):
      for j in range(out_dim):
          region = img[i:i+k, j:j+k]  # Grab a small part of the image
          out[i, j] = np.sum(region * kernel)  # Multiply and add to get one number

    print(f"\nAfter Convolution (shape {out.shape}): sample 5x5 values:\n{out[:5, :5]}")
    return out


**Description**:
This function slides the kernel over the image, one small section at a time. It multiplies the kernel values with the image's pixel values and sums them up to create a new, smaller image highlighting edges.



- The kernel slides over the image.

- For each position \((i, j)\), we compute:

$$
S(i, j) = \sum_{m=0}^{k-1} \sum_{n=0}^{k-1} I(i+m, j+n) \times K(m, n)
$$

Where:

- \(I\) is the input image  
- \(K\) is the kernel  
- \(S\) is the resulting feature map (output)


### 4. Define ReLU Activation
ReLU (Rectified Linear Unit) sets all negative values to zero, adding non-linearity.

In [30]:
def relu(x):
    activated = np.maximum(0, x)
    print(f"\nAfter ReLU (shape {activated.shape}): sample 5x5 values:\n{activated[:5, :5]}")
    return activated


**Description**:
ReLU is a simple rule: if a number is negative, change it to zero; if it's positive, keep it. This helps the network focus on important features and ignore less useful informatio

- Applies element-wise ReLU activation:

$$
f(x) = \max(0, x)
$$

- Negative values become zero; positive values stay the same.


### 5. Define Max Pooling
This downsamples the feature map by taking the maximum value in non-overlapping 2x2 windows.

In [31]:
def max_pooling(x, size=2, stride=2):
    h, w = x.shape
    pooled = np.zeros((h//2, w//2))
    for i in range(0, h, stride):
        for j in range(0, w, stride):
            pooled[i//2, j//2] = np.max(x[i:i+size, j:j+size])
    print(f"\nAfter Max Pooling (shape {pooled.shape}): sample 5x5 values:\n{pooled[:5, :5]}")
    return pooled


**Description:**
Max pooling shrinks the image by taking the largest value in small blocks. This reduces the image size while keeping the most important information, helping the network be faster and less sensitive to small changes.

- Max pooling reduces spatial size by taking the maximum value in each non-overlapping \(2 \times 2\) block:

$$
P(i, j) = \max \{
x_{2i, 2j},\
x_{2i+1, 2j},\
x_{2i, 2j+1},\
x_{2i+1, 2j+1}
\}
$$

- This reduces computation and keeps strong signals.


### 6. Define Softmax Function
Converts logits to probabilities summing to 1, for classification outputs.

In [32]:
def softmax(x):
    e_x = np.exp(x - np.max(x))
    sm = e_x / np.sum(e_x)
    print(f"\nSoftmax probabilities: {sm}")
    return sm


**Description:**
Softmax turns a list of numbers into probabilities that add up to 100%. It helps us decide which class (e.g., "tree" or "not tree") the image most likely belongs to.

- Converts raw scores (logits) into probabilities:

$$
\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}
$$

- Ensures probabilities sum to 1.


### 7. Define Cross-Entropy Loss
Measures how far the predicted probabilities are from the true label.

In [33]:
def cross_entropy_loss(probs, target):
    loss = -np.log(probs[target] + 1e-8)
    print(f"Cross-Entropy Loss: {loss}")
    return loss


**Description:**
This loss measures how far off our prediction is from the truth. The smaller the loss, the better the prediction.

- Measures how well the predicted probability \( p_{\text{target}} \) matches the true label:

$$
L = -\log(p_{\text{target}})
$$

- Lower loss means better prediction.


### 8. Derivative of Cross-Entropy Loss
Needed for gradient calculation in backpropagation.

In [34]:
def cross_entropy_derivative(probs, target):
    grad = probs.copy()
    grad[target] -= 1
    return grad


**Description:**
This helps us figure out how to change the model’s settings (weights) to make the prediction better next time. Think of it as giving the model hints about what went wrong.



- Computes gradient for adjusting model weights:

$$
\frac{\partial L}{\partial z_i} = p_i - y_i
$$

- Where:
  - \( p_i \) is the predicted probability  
  - \( y_i \) is the true label (1 for correct class, 0 otherwise)


### 9. Initialize Training Parameters
Set a dummy label and learning rate.

In [35]:
label = 1  # Example label (e.g., class "tree")
learning_rate = 0.1


**Description:**
Here we set the correct answer (label) for training and how quickly the model should learn (learning rate).

# 10. Initial Forward Pass to Determine Sizes
Run through convolution, ReLU, and pooling once to determine the flattened feature vector size.




In [36]:
conv_out = convolve2d(input_image, kernel)
activated = relu(conv_out)
pooled = max_pooling(activated)
flattened = pooled.flatten()

print(f"\nAfter Flattening: shape {flattened.shape}")
print(f"Flattened vector sample (first 10 values): {flattened[:10]}")

fc_input_size = flattened.shape[0]

# Initialize fully connected layer weights and biases
np.random.seed(42)
fc_weights = np.random.randn(2, fc_input_size) * 0.01  # 2 classes, small random weights
fc_biases = np.zeros(2)



After Convolution (shape (26, 26)): sample 5x5 values:
[[-0.00784314 -0.00392157 -0.01960784 -0.02352941 -0.00392157]
 [-0.00392157 -0.00784314 -0.02352941 -0.01568627  0.00784314]
 [ 0.00784314 -0.00392157 -0.02352941 -0.01176471  0.00784314]
 [ 0.01176471  0.         -0.01568627 -0.00784314  0.00784314]
 [ 0.01176471  0.00784314 -0.00392157 -0.00392157  0.00392157]]

After ReLU (shape (26, 26)): sample 5x5 values:
[[0.         0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.00784314]
 [0.00784314 0.         0.         0.         0.00784314]
 [0.01176471 0.         0.         0.         0.00784314]
 [0.01176471 0.00784314 0.         0.         0.00392157]]

After Max Pooling (shape (13, 13)): sample 5x5 values:
[[0.         0.         0.00784314 0.         0.00392157]
 [0.01176471 0.         0.00784314 0.         0.00784314]
 [0.01176471 0.00392157 0.00392157 0.         0.08235294]
 [0.01960784 0.00784314 0.         0.10980392 0.22352941]


**Description**:
We run the image through the first steps to figure out how big the output will be. Then, we prepare a simple decision-making layer (fully connected layer) with small random starting settings.

- After convolution, activation, and pooling, flatten the 2D data to a 1D vector:

$$
\text{flattened} = [x_1, x_2, \dots, x_n]
$$

- Initialize weights \( W \) and biases \( b \) for a fully connected layer with 2 outputs (classes):

$$
\text{logits} = W \cdot \text{flattened} + b
$$


### 11. Training Loop (10 Epochs)
Forward pass → loss calculation → backward pass → parameter update.

In [37]:
for epoch in range(5):
    print(f"\n===== Epoch {epoch} =====")

    # Forward pass
    conv_out = convolve2d(input_image, kernel)
    activated = relu(conv_out)
    pooled = max_pooling(activated)
    flattened = pooled.flatten()
    print(f"Flattened input size: {flattened.shape}")

    logits = np.dot(fc_weights, flattened) + fc_biases
    print(f"Logits before softmax: {logits}")

    probs = softmax(logits)
    loss = cross_entropy_loss(probs, label)

    # Backward pass (gradient calculation)
    dL_dlogits = cross_entropy_derivative(probs, label)
    dL_dw = dL_dlogits[:, None] * flattened[None, :]  # Gradient for weights
    dL_db = dL_dlogits  # Gradient for biases

    # Update weights and biases using gradient descent
    fc_weights -= learning_rate * dL_dw
    fc_biases -= learning_rate * dL_db

    pred_class = np.argmax(probs)
    print(f"Predicted class: {pred_class} | Correct: {pred_class == label}")



===== Epoch 0 =====

After Convolution (shape (26, 26)): sample 5x5 values:
[[-0.00784314 -0.00392157 -0.01960784 -0.02352941 -0.00392157]
 [-0.00392157 -0.00784314 -0.02352941 -0.01568627  0.00784314]
 [ 0.00784314 -0.00392157 -0.02352941 -0.01176471  0.00784314]
 [ 0.01176471  0.         -0.01568627 -0.00784314  0.00784314]
 [ 0.01176471  0.00784314 -0.00392157 -0.00392157  0.00392157]]

After ReLU (shape (26, 26)): sample 5x5 values:
[[0.         0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.00784314]
 [0.00784314 0.         0.         0.         0.00784314]
 [0.01176471 0.         0.         0.         0.00784314]
 [0.01176471 0.00784314 0.         0.         0.00392157]]

After Max Pooling (shape (13, 13)): sample 5x5 values:
[[0.         0.         0.00784314 0.         0.00392157]
 [0.01176471 0.         0.00784314 0.         0.00784314]
 [0.01176471 0.00392157 0.00392157 0.         0.08235294]
 [0.01960784 0.00784314 0.         0.

**Description**:
This is the main learning loop where the CNN makes predictions, measures mistakes, and adjusts itself to improve over 10 rounds (epochs). After each round, it tries to guess the correct class and improves gradually.

- For each epoch (training step):

  - Calculate convolution → ReLU → pooling → flatten.

  - Compute logits:

    
$$
z = W \cdot x + b
$$
    

  - Compute softmax probabilities and loss.

  - Calculate gradients and update weights/biases:

   $$
    W := W - \alpha \frac{\partial L}{\partial W}, \quad b := b - \alpha \frac{\partial L}{\partial b}
  $$

    Where $\( \alpha \)$ is the learning rate.

  - Predict class as the one with the highest probability.


Summary
This notebook shows a simplified version of how a CNN processes an image step-by-step:

- It extracts edges with convolution.

- Highlights important parts with ReLU.

- Shrinks data with pooling.

- Makes a prediction using a simple classifier.

- Learns from mistakes through repeated training.

This helps beginners understand the building blocks of CNNs in a clear and intuitive way