# What is a Convolutional Neural Network (CNN)?

A network that learns **spatial features** by sliding filters (kernels) over inputs (images, sequences), capturing local patterns (edges → textures → parts → objects).
Rule of thumb: “Convolve → activate → pool (repeat), then classify.”

Think image recognition: early filters detect edges; deeper filters detect parts and objects.

---

# Variables and notation

- Input image: $$X \in \mathbb{R}^{H\times W\times C}$$ (height, width, channels)
- Convolution kernel: $$K \in \mathbb{R}^{k\times k\times C}$$, number of filters $$F$$
- Stride $$s$$, padding $$p$$
- Output feature map (same padding): height $$H' = \lceil H/s \rceil$$, width $$W' = \lceil W/s \rceil$$, channels $$F$$

Convolution at location $$(i,j)$$: $$Y[i,j,f] = \sum_{u,v,c} K_f[u,v,c] \cdot X[i+u, j+v, c] + b_f$$

---

# Typical block

1) Conv2D(filters=F, kernel_size=k)  
2) Activation (ReLU)  
3) (BatchNorm)  
4) Pooling (MaxPool2D) to downsample  
Repeat ×N → Flatten/GlobalAveragePool → Dense → Softmax

Data augmentation (random flips/crops/rotations) helps regularization by increasing variation.

---

# Output size formula (valid padding)

$$
H' = \left\lfloor \frac{H - k + 2p}{s} \right\rfloor + 1, \quad
W' = \left\lfloor \frac{W - k + 2p}{s} \right\rfloor + 1
$$

For “same” padding with $$s=1$$: $$H'=H, W'=W$$.

---

# Step‑by‑step tiny example

5×5 grayscale patch, 3×3 edge detector kernel:
- Slide kernel over 3×3 windows; multiply‑sum → highlights edges.
- Max pooling 2×2 halves both H and W → invariance to small shifts.

---

# Pseudocode

```
x = input_image
for block in conv_blocks:
  x = conv2d(x, filters=F, k=k, stride=s, padding=p)
  x = relu(x)
  if use_bn: x = batch_norm(x)
  x = max_pool(x, size=2)
x = flatten(x) or global_avg_pool(x)
logits = dense(x, units=C)
y_hat = softmax(logits)
loss = cross_entropy(y_hat, y)
opt.step(backprop(loss))
```

---

# Minimal Python (ready to paste – Keras)

```python
from tensorflow.keras import layers as L, models, optimizers, callbacks

model = models.Sequential([
    L.Input(shape=(H, W, C)),
    L.Conv2D(32, (3,3), activation="relu", padding="same"),
    L.MaxPool2D((2,2)),
    L.Conv2D(64, (3,3), activation="relu", padding="same"),
    L.MaxPool2D((2,2)),
    L.Flatten(),
    L.Dense(128, activation="relu"),
    L.Dropout(0.3),
    L.Dense(C, activation="softmax"),
])

model.compile(optimizer=optimizers.Adam(1e-3),
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])

es = callbacks.EarlyStopping(patience=8, restore_best_weights=True)
model.fit(train_ds, validation_data=val_ds, epochs=50, callbacks=[es])
```

---

# Practical tips, pitfalls, and variants

- Normalize inputs (per‑channel mean/std); verify channel order (NHWC).
- Start small; confirm model can overfit a tiny subset.
- Use augmentation (flip/crop/rotate/color jitter) and dropout.
- Watch LR schedules; too high → divergence, too low → slow.
- Memory: tune batch size and feature maps; consider depthwise separable convs.
- Variants: ResNet (residual connections), DenseNet, MobileNet.

---

# How this notebook implements it

- Dataset: images loaded via Keras generators or folders.
- Steps: conv‑blocks → dense → compile (Adam + cross‑entropy) → train with augmentation → evaluate.
- Tip: visualize filters and feature maps to debug.

---

# Quick cheat sheet

- [Conv→ReLU→(BN)→Pool] ×2–3 → Dense → Softmax
- Normalize inputs; use augmentation
- Early stopping; tune LR and batch size


# Convolutional Neural Network

### Importing the libraries

In [0]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [0]:
tf.__version__

## Part 1 - Data Preprocessing

### Preprocessing the Training set

In [0]:
train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)
training_set = train_datagen.flow_from_directory('dataset/training_set',
                                                 target_size = (64, 64),
                                                 batch_size = 32,
                                                 class_mode = 'binary')

### Preprocessing the Test set

In [0]:
test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_directory('dataset/test_set',
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'binary')

## Part 2 - Building the CNN

### Initialising the CNN

In [0]:
cnn = tf.keras.models.Sequential()

### Step 1 - Convolution

In [0]:
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu', input_shape=[64, 64, 3]))

### Step 2 - Pooling

In [0]:
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

### Adding a second convolutional layer

In [0]:
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu'))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

### Step 3 - Flattening

In [0]:
cnn.add(tf.keras.layers.Flatten())

### Step 4 - Full Connection

In [0]:
cnn.add(tf.keras.layers.Dense(units=128, activation='relu'))

### Step 5 - Output Layer

In [0]:
cnn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

## Part 3 - Training the CNN

### Compiling the CNN

In [0]:
cnn.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

### Training the CNN on the Training set and evaluating it on the Test set

In [0]:
cnn.fit(x = training_set, validation_data = test_set, epochs = 25)

## Part 4 - Making a single prediction

In [0]:
import numpy as np
from tensorflow.keras.preprocessing import image
test_image = image.load_img('dataset/single_prediction/cat_or_dog_1.jpg', target_size = (64, 64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis = 0)
result = cnn.predict(test_image)
training_set.class_indices
if result[0][0] == 1:
  prediction = 'dog'
else:
  prediction = 'cat'

In [0]:
print(prediction)