<a href="https://colab.research.google.com/github/danielanezd/machine-learning-fundamentals/blob/main/ConvolutionalNeuralNetwork_Keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Dependencies
from sklearn.datasets import load_sample_images
import tensorflow as tf

In [None]:
images = load_sample_images()["images"]
images = tf.keras.layers.CenterCrop(height=70, width=120)(images)
images = tf.keras.layers.Rescaling(scale=1/255)(images)

In [None]:
# See tensor dimensions
images.shape

TensorShape([2, 70, 120, 3])

In [None]:
# Creating 2d conv layer
conv_layer = tf.keras.layers.Conv2D(filters=32, kernel_size=7, padding="same")
fmaps = conv_layer(images)
fmaps.shape

TensorShape([2, 70, 120, 32])

In [None]:
# Layers weights through tf (we can also get them with numpy 'get_weights')
kernels, biases = conv_layer.get_weights()
kernels.shape

(7, 7, 3, 32)

In [None]:
biases.shape

(32,)

In [11]:
# Pooling layers
max_pool = tf.keras.layers.MaxPool2D(pool_size=2)
avg_pool = tf.keras.layers.AvgPool2D(pool_size=2)
global_avrg_pool = tf.keras.layers.GlobalAvgPool2D()

In [10]:
# Depthwise pool
class DepthPool(tf.keras.layers.Layer):
    def __init__(self, pool_size):
      super().__init__()
      self.pool_size = pool_size

    def call(self, inputs):
      shape = tf.shape(inputs)
      groups = shape[-1]
      new_shape = tf.concat([shape[:-1], [groups, self.pool_size]], axis=0)
      return tf.reduce_max(tf.reshape(inputs, new_shape), axis=-1)

Pooling layers are destructive as they return only the values accordingly to the aggregation function.

This destruction increases if we apply the pooling layer to all the feature map using the Global Pooling layer (last example)

In [15]:
max_pool(images)

<tf.Tensor: shape=(2, 35, 60, 3), dtype=float32, numpy=
array([[[[0.4901961 , 0.54901963, 0.5686275 ],
         [0.3647059 , 0.47450984, 0.4784314 ],
         [0.24705884, 0.39607847, 0.3529412 ],
         ...,
         [0.90196085, 0.8980393 , 0.9176471 ],
         [0.9058824 , 0.90196085, 0.9215687 ],
         [0.9058824 , 0.9058824 , 0.9215687 ]],

        [[0.48627454, 0.4039216 , 0.46274513],
         [0.26666668, 0.27058825, 0.28627452],
         [0.43137258, 0.5137255 , 0.49411768],
         ...,
         [0.90196085, 0.90196085, 0.909804  ],
         [0.90196085, 0.90196085, 0.909804  ],
         [0.9058824 , 0.9058824 , 0.91372555]],

        [[0.32156864, 0.30588236, 0.23137257],
         [0.3254902 , 0.29803923, 0.19607845],
         [0.36078432, 0.30980393, 0.27450982],
         ...,
         [0.90196085, 0.90196085, 0.909804  ],
         [0.90196085, 0.90196085, 0.909804  ],
         [0.9058824 , 0.9058824 , 0.91372555]],

        ...,

        [[0.5137255 , 0.25490198, 0.

In [14]:
avg_pool(images)

<tf.Tensor: shape=(2, 35, 60, 3), dtype=float32, numpy=
array([[[[0.2676471 , 0.34313726, 0.35784316],
         [0.2627451 , 0.37058824, 0.36078435],
         [0.17941177, 0.2764706 , 0.2754902 ],
         ...,
         [0.90196085, 0.8980393 , 0.9176471 ],
         [0.9039216 , 0.9000001 , 0.91960794],
         [0.904902  , 0.90196085, 0.91862756]],

        [[0.3421569 , 0.3019608 , 0.29901963],
         [0.16176471, 0.16960785, 0.15      ],
         [0.28333336, 0.2911765 , 0.28137255],
         ...,
         [0.90196085, 0.90196085, 0.909804  ],
         [0.89901966, 0.89901966, 0.90686285],
         [0.9039216 , 0.9039216 , 0.9117648 ]],

        [[0.27058825, 0.26372552, 0.20686275],
         [0.27156866, 0.20588237, 0.11568628],
         [0.3137255 , 0.2529412 , 0.16960785],
         ...,
         [0.89803934, 0.89803934, 0.9058824 ],
         [0.8990197 , 0.8990197 , 0.9068628 ],
         [0.90196085, 0.90196085, 0.909804  ]],

        ...,

        [[0.46568632, 0.2519608 , 0.

Here we only get two lists with the Global Average for each channel because of the two input images we have in this example. See that is a list of lists, containing only one element.

In [16]:
global_avrg_pool(images)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[0.64338624, 0.5971759 , 0.5824972 ],
       [0.76306933, 0.2601113 , 0.10849128]], dtype=float32)>