In [None]:
#| echo: false
#| output: false
#| eval: true
#| warning: false
from fastai.vision.all import *
from fastai.vision.models import *
import pandas as pd
import numpy as np

path_imagenette=untar_data(URLs.IMAGENETTE)
path_mnist=untar_data(URLs.MNIST_SAMPLE)
im3 = Image.open(path_mnist/'train'/'3'/'12.png')
show_image(im3)

# Overview

In this blog post, we will explore the architecture of Convolution Neural Networks (CNN) and how they have been used to achieve state-of-the-art performance in image recognition tasks. We will also discuss some of the key components of CNNs, such as convolution layers, pooling layers, and activation functions. Finally, we will look at one of the most popular CNN architectures: `ResNet`.

# Overview of Convolution Neural Networks (CNN)
In the context of computer vision, **feature engineering** is the process of using domain knowledge to extract distinctive attributes from images that can be used to improve the performance of machine learning algorithms. For instance, in image classification tasks, the number 7 is characterized by a horizontal edge near the top, and a diagonal line that goes down to the right. These features can be used to distinguish the number 7 from other digits.

It turns out that finding the edges in an image is a crucial step in computer vision tasks. To achieve this, we can use a technique called **convolution**. Convolution is a mathematical operation that takes two inputs: an image and a filter (also known as a kernel). The filter is a small matrix that is used to scan the image and extract features. For example, the following filter can be used to detect horizontal edges in an image

## Convolution Layer
A convolution layer applies a set of filters (i.e., **kernel**) to the input image to extract features. Each filter/kernel is a small matrix that is used to scan the image and extract features. The output of a convolution layer is a set of feature maps, which are the result of applying each filter to the input image.
![An example of kernel](./imgs/kernel.png){#fig-kernel fig-align="center"}

As illustrated in Figure @fig-kernel, a 3x3 matrix kernel is applied to the input image, which is 7x7 grid. The kernel is applied to each pixel in the image, and the output is a new pixel value that is calculated by taking the dot product of the kernel and the corresponding pixels in the image. This process is repeated for each pixel in the image, resulting in a new feature map.

Let's take another look at how convolution works in practice. We will use the `im3` image, which is a 28x28 grayscale image of the digit 3 from the MNIST dataset. We will apply a 3x3 kernel to the image to extract features.

In [None]:
#| echo: false
#| output: true
#| eval: true
#| warning: false

im3_t=tensor(im3)
df = pd.DataFrame(im3_t[:10,:30])
df.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys')

Let's define a kernel that detects horizontal edges in the image. The kernel is a 3x3 matrix with values that are designed to highlight horizontal edges.

In [None]:
kernel = tensor([[-1., -1., -1.],
                 [ 0.,  0.,  0.],
                 [ 1.,  1.,  1.]]).float()

## Strides and Padding
## Improving Training Stability

# Residual Networks (ResNet)

# Conclusions