**Implementing ConvNets with Keras:**
- First - load and preprocess a couple of sample images using scikit-learn's load_sample_image function and keras's centercrop and rescaling layers. 

In [1]:
from sklearn.datasets import load_sample_images
import tensorflow as tf

In [2]:
images = load_sample_images()["images"]
images = tf.keras.layers.CenterCrop(height=70, width=120)(images)
images = tf.keras.layers.Rescaling(scale=1/255)(images)

In [3]:
images.shape

TensorShape([2, 70, 120, 3])

In [4]:
conv_layer = tf.keras.layers.Conv2D(filters=32, kernel_size=7)

In [5]:
fmaps = conv_layer(images)

In [6]:
fmaps.shape

TensorShape([2, 64, 114, 32])

Just like a dense layer, a conv2d layer holds all the layer's weights, including the kernels and biases

In [7]:
kernels, biases = conv_layer.get_weights()

In [8]:
kernels.shape

(7, 7, 3, 32)

In [9]:
biases.shape

(32,)

Other than reducing computations, memory usage, and the number of parameters, a max pooling layer also introduces some level of invariance to small translations - as shown 

Pooling layers:
- goal is to subsample (i.e. shrink) the input image in order to reduce the computational load, the memory usage, and the number of parameters
- a max pooling layer also introduces some level of invariance to small translations. By inserting a max pooling layer every few layers in a CNN, it is possible to get some level of translational invariance at a larger scale. Moreover, max pooling offers a small amount of rotational invariance and a slight scale invariance. Such invariance (even if it is limited) - can be useful in cases where the prediction should not depend on these details
- However, max pooling has some downsides too - its destructive - even a tiny 2x2 kernel with a stride of 2, the output will be 2 times smaller in both directions 

Implementing Pooling 

In [10]:
# following code creates a MaxPooling2D layer - alias MaxPool2D
# using a 2x2 kernel - strides default to kernel size - so this layer uses a stride of 2
# by default it uses valid padding (i.e. no padding at all)

max_pool = tf.keras.layers.MaxPool2D(pool_size=2)

MaxPooling - generally preferred to average pooling. as it preserves the strongest features, getting rid of the meaningless ones, so the next layers get a cleaner signal to work with. Moreover, max pooling offers stronger translation invariance than average pooling, and it requires slightly less compute. 

One last type of pooling ayer - global average pooling later. All it does is compute the mean of each entire feature map (its like an average pooling layer using a pooling kernel with the same spatial dimensions as the inputs.) This means that it just outputs a single number per feature map and per instance. Although this is extemely destructive (most of the information in the feature map is lost), it can be useful just before the output layer, as seen layer:

In [11]:
global_avg_pool = tf.keras.layers.GlobalAvgPool2D()

its equivalent to the following lambda layer, which computes the mean over the spatial dimensions (height and width):

In [12]:
global_avg_pool = tf.keras.layers.Lambda(
    lambda X: tf.reduce_mean(X, axis=[1,2])
)

For example, if we apply this layer to the input images - we get the mean intensity of red, green and blue for each image:

In [14]:
images.shape

TensorShape([2, 70, 120, 3])

In [15]:
global_avg_pool(images)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[0.64338624, 0.5971759 , 0.5824972 ],
       [0.76306933, 0.2601113 , 0.10849128]], dtype=float32)>

**CNN Architectures:**
- Typical CNN archirectures stack a few conv layers (each one followed by a relu layer), then a pooling layer, and then another few conv layers (+relu). With the image getting smaller as it progresses through the network but getting deeper. 
- common pitfall => using conv kernels that are too large => for example using 5x5 kernel stacks 2 layer

In [17]:
from functools import partial

DefaultConv2D = partial(tf.keras.layers.Conv2D, kernel_size=3, padding="same",
                        activation="relu", kernel_initializer="he_normal")

model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=[28,28,1]),
    DefaultConv2D(filters=64, kernel_size=7),
    tf.keras.layers.MaxPool2D(),
    DefaultConv2D(filters=128),
    DefaultConv2D(filters=128),
    tf.keras.layers.MaxPool2D(),
    DefaultConv2D(filters=256),
    DefaultConv2D(filters=256),
    tf.keras.layers.MaxPool2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units=128, activation="relu",kernel_initializer="he_normal"),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(units=64, activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(units=10, activation="softmax")
])

In [18]:
model.summary()

- 