# Convolutional Neural Networks (CNNs)

**What are Convolutional Neural Networks (CNNs)?**

A CNN is like a computer program that helps computers see and understand pictures. Just like how you can see a cat in a picture, a CNN can too.

**How do Computers See Pictures?**

Imagine a team of tiny workers (called nodes) that work together to understand the picture. They look at the picture in small pieces, add some details, and then decide what's in the picture.

**Three Cool Things About CNNs:**

1. **They're Good at Finding Patterns**: CNNs use special math to find patterns in pictures.
2. **They Can See Objects**: CNNs can look at a picture and find objects like cats, dogs, or cars.
3. **They're Helping Us in Many Ways**: CNNs are used in self-driving cars, robots, and even in apps that can recognize objects in your pictures.

**What's the Catch?**

CNNs need powerful computers to work well, which can be slow and expensive. But scientists are working hard to make them faster and better.

**So, Why Do We Care About CNNs?**

Because CNNs are helping us make computers smarter and more helpful in our daily lives. Who knows, maybe one day you'll create a program that can recognize your favorite objects in pictures!


Convolutional neural networks (CNNs) are a type of deep learning model used primarily for image classification and object recognition. They work with three-dimensional data, which means they can process images that have width, height, and color channels (like RGB).

Neural networks, including CNNs, consist of layers of nodes. These layers include an input layer, one or more hidden layers, and an output layer. Each node connects to others and has a weight and a threshold. When the output of a node exceeds its threshold, it activates and passes data to the next layer; if not, it does nothing.

While we focused on feedforward networks, there are many types of neural networks for different applications. For instance, recurrent neural networks (RNNs) are often used in natural language processing and speech recognition, while CNNs excel in computer vision tasks.

Before CNNs, identifying objects in images required manual feature extraction, which was slow and labor-intensive. CNNs automate this process, using techniques from linear algebra, such as matrix multiplication, to find patterns in images. However, training CNNs can be resource-intensive, often requiring powerful graphics processing units (GPUs).

# How do convolutional neural networks work?

Convolutional Neural Networks (CNNs) are a class of deep learning models designed primarily for processing structured grid data, such as images. They work through several key components:

1. **Convolutional Layers**: These layers apply convolution operations using filters (or kernels) that slide across the input data. Each filter detects specific features, like edges or textures. The result is a feature map that highlights these detected features.

2. **Activation Function**: After convolution, an activation function, usually ReLU (Rectified Linear Unit), is applied to add non-linearity to the model. This helps the network learn complex patterns.

3. **Pooling Layers**: Pooling layers down-sample the feature maps, reducing their dimensions and retaining the most important information. This helps reduce computation and increases the network's robustness to variations in the input.

4. **Fully Connected Layers**: After several convolutional and pooling layers, the high-level features are flattened and fed into fully connected layers. These layers process the features to make final predictions, such as classifying an image.

5. **Training**: CNNs are trained using backpropagation, adjusting the weights of the filters and fully connected layers to minimize the difference between the predicted output and the actual labels.

Overall, CNNs are effective for tasks like image recognition, object detection, and more, due to their ability to automatically learn hierarchical features from the data.

# Convolutional layer

The convolutional layer is a fundamental part of Convolutional Neural Networks (CNNs), responsible for most of the computational work. It processes input data, typically a color image represented as a 3D matrix of pixels (height, width, depth corresponding to RGB). 

Key components include:

- **Feature Detector (Kernel/Filter)**: A 2D array of weights, often a 3x3 matrix, that moves over the input image to identify features. It calculates a dot product between the filter and image pixels, producing a feature map through successive applications across the image.

- **Parameter Sharing**: The filter weights remain fixed as the filter moves, but they are adjusted during training via backpropagation and gradient descent.

Three essential hyperparameters influence the output size:

1. **Number of Filters**: Determines the depth of the output; more filters yield more feature maps.
2. **Stride**: The number of pixels the filter moves during convolution; a larger stride results in a smaller output.
3. **Zero-padding**: Applied to adjust output size, preventing loss of information. It has three types:
   - Valid Padding: No padding, causing some data loss.
   - Same Padding: Output size matches input size.
   - Full Padding: Enlarges the output by adding zeros around the input.

After convolution, a Rectified Linear Unit (ReLU) activation is applied to the feature map to introduce nonlinearity into the model.

![image.png](attachment:1804aa40-7654-47e6-bd97-b350cdbbe80d.png)

# Additional convolutional layer

Additional convolutional layers in a convolutional neural network (CNN) create a hierarchical structure, allowing the deeper layers to analyze lower-level features identified in earlier layers. For instance, when detecting a bicycle in an image, the network first recognizes individual components like the frame, handlebars, and wheels as basic patterns. As these patterns combine, they form higher-level features that represent the whole bicycle. The convolutional layers effectively transform the image into numerical values, enabling the neural network to identify and extract important patterns.

![image.png](attachment:80bb56d6-b23a-474a-ac6f-d3811e69951c.png)

# Pooling layer

Pooling layers, or downsampling layers, are used in convolutional neural networks (CNNs) to reduce the input's dimensionality and number of parameters. They operate by applying a filter that aggregates values in a receptive field, without using weights. The two primary types of pooling are:

- Max pooling: Selects the maximum value from the receptive field as the output.
- Average pooling: Computes the average value from the receptive field as the output.

While pooling layers result in some loss of information, they provide benefits such as reducing complexity, improving efficiency, and minimizing the risk of overfitting.

# Fully-connected layer

A fully-connected (FC) layer in a neural network connects every node in the output layer to every node in the previous layer. This layer is responsible for classification based on features extracted by prior layers. While convolutional and pooling layers often use ReLU activation functions, FC layers typically use the softmax activation function to generate classification probabilities ranging from 0 to 1.

# Types of convolutional neural networks

Kunihiko Fukushima and Yann LeCun were pivotal in the development of convolutional neural networks (CNNs), with their key contributions dating back to the 1980s and 1989, respectively. LeCun notably applied backpropagation to train neural networks for recognizing handwritten zip codes, leading to the creation of "LeNet-5" in the 1990s, which was significant for document recognition. Since then, various CNN architectures have been developed, driven by new datasets like MNIST and CIFAR-10 and competitions like the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Notable architectures include AlexNet, VGGNet, GoogLeNet, ResNet, and ZFNet, but LeNet-5 is recognized as the foundational CNN architecture.

# Convolutional neural networks and computer vision

Convolutional neural networks are essential for image recognition and computer vision, a branch of AI that helps computers interpret visual data from images and videos. This technology allows systems to take actions based on the analyzed inputs, distinguishing it from simple image recognition. Key applications of computer vision include:

- **Marketing**: Social media platforms suggest tags for friends in uploaded photos.
- **Healthcare**: Used in radiology to help doctors identify cancerous tumors more effectively.
- **Retail**: E-commerce sites utilize visual search to recommend items that match existing wardrobe pieces.
- **Automotive**: Although fully driverless cars are not yet common, computer vision technologies enhance safety features in vehicles, such as lane detection.

In [28]:
import tensorflow
from tensorflow import keras
from keras.layers import Dense, Conv2D, Flatten, MaxPooling2D
from keras import Sequential
from keras.datasets import mnist

In [3]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Padding

In [13]:
model = Sequential()

model.add(Conv2D(32, kernel_size=(3,3),padding='valid', activation="relu", input_shape=(28,28,1)))
model.add(Conv2D(32, kernel_size=(3,3), padding='valid', activation="relu"))
model.add(Conv2D(32, kernel_size=(3,3), padding="valid", activation="relu"))

model.add(Flatten())

model.add(Dense(128, activation="relu"))
model.add(Dense(10,activation="softmax"))

model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_4 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 conv2d_5 (Conv2D)           (None, 24, 24, 32)        9248      
                                                                 
 conv2d_6 (Conv2D)           (None, 22, 22, 32)        9248      
                                                                 
 flatten_1 (Flatten)         (None, 15488)             0         
                                                                 
 dense_2 (Dense)             (None, 128)               1982592   
                                                                 
 dense_3 (Dense)             (None, 10)                1290      
                                                                 
Total params: 2002698 (7.64 MB)
Trainable params: 2002

# Padding & Strides

In [25]:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), padding="same", strides=(2,2), activation="relu", input_shape=(28,28,1)))
model.add(Conv2D(32, kernel_size=(3,3), padding="same", strides=(2,2), activation="relu"))
model.add(Conv2D(32, kernel_size=(3,3), padding="same", strides=(2,2), activation="relu"))

model.add(Flatten())

model.add(Dense(128, activation="relu"))
model.add(Dense(10, activation="relu"))

model.summary()

Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_11 (Conv2D)          (None, 14, 14, 32)        320       
                                                                 
 conv2d_12 (Conv2D)          (None, 7, 7, 32)          9248      
                                                                 
 conv2d_13 (Conv2D)          (None, 4, 4, 32)          9248      
                                                                 
 flatten_2 (Flatten)         (None, 512)               0         
                                                                 
 dense_4 (Dense)             (None, 128)               65664     
                                                                 
 dense_5 (Dense)             (None, 10)                1290      
                                                                 
Total params: 85770 (335.04 KB)
Trainable params: 8577

# Padding & Strides & Max pooling

In [32]:
model = Sequential()

model.add(Conv2D(32, kernel_size=(3,3), padding="valid", activation="relu", input_shape=(28,28,1)))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding="valid"))
model.add(Conv2D(32, kernel_size=(3,3), padding="valid", activation="relu"))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding="valid"))

model.add(Flatten())

model.add(Dense(128, activation="relu"))
model.add(Dense(10, activation="relu"))

model.summary()

Model: "sequential_9"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_16 (Conv2D)          (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d_2 (MaxPoolin  (None, 13, 13, 32)        0         
 g2D)                                                            
                                                                 
 conv2d_17 (Conv2D)          (None, 11, 11, 32)        9248      
                                                                 
 max_pooling2d_3 (MaxPoolin  (None, 5, 5, 32)          0         
 g2D)                                                            
                                                                 
 flatten_4 (Flatten)         (None, 800)               0         
                                                                 
 dense_8 (Dense)             (None, 128)              