# About CNN

* A **convolutional neural network (CNN)** is a type of **artificial neural network** designed for tasks such as image recognition and processing. It's particularly effective in analyzing visual data, thanks to its ability to automatically and adaptively learn spatial hierarchies of features from the input.



* CNNs use a specialized architecture that includes **convolutional layers**, **pooling layers**, and **fully connected layers.** Convolutional layers apply convolution operations to the input data, capturing local patterns and features. Pooling layers then reduce the spatial dimensions of the representation, focusing on the most important information. Fully connected layers connect every neuron in one layer to every neuron in the next layer, enabling high-level reasoning.



* CNNs have been highly successful in tasks like image classification, object detection, and facial recognition, among others. Their architecture is inspired by the visual processing in the human brain, making them well-suited for tasks involving spatial hierarchies of features.

In [9]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten
from tensorflow.keras.preprocessing.image import ImageDataGenerator

* The **Sequential()** represents a linear stack of layers to build a neural network model. 

In [10]:
#Initializing the CNN
classifier = Sequential()

# How the CNN works

* The **convolutional layer** in a **Convolutional Neural Network (CNN)** performs the core operation of convolution. **Convolution** is a mathematical operation that combines two functions to produce a third function. In the context of CNNs, this operation is applied to the input data and a set of learnable filters or kernels.

Here's a simplified explanation of what happens in a convolutional layer:

**1. Filter (Kernel):** A small matrix that slides over the input data. Each element in the filter has a weight.

**2. Convolution Operation:** The filter slides over the input data, and at each position, it performs element-wise multiplication with the local region of the input data. The results are summed up to produce a single value. This process is repeated across the entire input to produce an **output feature map.**

**3. Learnable Weights:** The weights in the filter are learnable parameters that the neural network optimizes during training. These weights capture important patterns or features in the input data.

**4. Activation Function:** After convolution, an **activation function (like ReLU - Rectified Linear Unit)** is often applied element-wise to introduce non-linearity and allow the network to learn complex relationships.

* The convolutional layer's key advantage is its ability to automatically learn spatial hierarchies of features. It can capture local patterns in the input data, such as edges, textures, or shapes, and then combine them in deeper layers to recognize more complex patterns and objects.

* In summary, the convolutional layer plays a crucial role in feature extraction and enables CNNs to effectively learn and recognize patterns in images or other grid-structured data.

# Breakdown of parameters in Conv2D

* **Conv2D** - 2D convolution used for processing 2D grid data like images.

* **32** - Number of filters or kernels in the convolutional layer. Each filter detects different features in the input.

* **(3*3)** - Filter size which is 3*3. During convolution this filter will slide over the input data in 3*3 patches.

* **input_shape=(64,64,3)** - Specifies the shape of the input data. It's a 3D input with dimensions 64*64 and 3 channels.

* **activation=relu** - Rectified Linear Unit activation function is applied element-wise to introduce non-linearity to the network. ReLU is commonly used in hidden layers to allow the network to learn complex patterns. 

# Usage of filters

* **Filters (or kernels)** in a convolutional layer play a crucial role in feature extraction. They act as small windows that slide over the input data, performing local operations to detect patterns and features. Here are a few reasons why filters are used in the convolution layer:

**1. Feature Detection:** Filters are designed to detect specific features in the input data, such as edges, textures, or shapes. By sliding these filters over the entire input, the convolutional layer can capture local patterns.

**2. Parameter Sharing:** Filters have learnable weights that are shared across the entire input. This parameter sharing reduces the number of parameters in the model, making it more efficient and reducing the risk of overfitting. The same filter is used at different spatial locations in the input.

**3. Spatial Hierarchies:** Convolutional layers can learn hierarchical representations of features. Lower layers may capture simple features like edges, while deeper layers combine these simple features to recognize more complex patterns or objects. This hierarchical approach mimics how the visual system works in biological organisms.

**4Translation Invariance:** The use of filters introduces translation invariance, meaning the network can recognize features regardless of their position in the input. If a particular pattern is detected in one part of the image, the same filter can detect a similar pattern elsewhere.

**5. Local Connectivity:** Filters operate on local regions of the input, allowing the network to focus on local features and spatial relationships. This local connectivity is especially useful for grid-structured data like images.

* In summary, filters in the convolutional layer enable the neural network to automatically learn and extract relevant features from the input data. This process is essential for the success of Convolutional Neural Networks (CNNs) in tasks such as image recognition, where understanding local patterns is crucial for identifying objects and patterns in images.

# About ReLU

* **ReLU, or Rectified Linear Unit**, is an **activation function** commonly used in neural networks, including Convolutional Neural Networks (CNNs). It introduces **non-linearity to the network by outputting the input directly if it is positive; otherwise, it outputs zero.**

* Mathematically, the ReLU activation function is defined as:
**f(x)=max(0,x)**

* Here's a simple explanation of what it does:
1. **Linear for Positive Values:** If the input x is positive, the function returns x itself. So, for any positive input, ReLU is a linear function.

2. **Zero for Negative Values:** If the input x is negative, the function returns zero. This introduces non-linearity to the model, which is crucial for enabling the network to learn complex relationships and patterns.

* The main advantages of using ReLU include:
* **Simplicity:** ReLU is computationally efficient and easy to implement.

* **Avoiding Vanishing Gradient Problem:** Unlike some other activation functions (e.g., sigmoid or tanh), ReLU does not saturate for positive inputs, helping to mitigate the vanishing gradient problem during training.

* **Promoting Sparsity:** ReLU sets negative values to zero, which can lead to sparse representations. This sparsity can be beneficial for memory efficiency and generalization.

* However, one drawback of ReLU is the "dying ReLU" problem, where neurons can sometimes become inactive (output zero) and stop learning if they consistently receive negative inputs during training. To address this, variants like Leaky ReLU and Parametric ReLU have been proposed, which allow a small, non-zero gradient for negative inputs, preventing neurons from becoming completely inactive.

# Why we have used Conv2D instead of Conv3D

* While the input to a convolutional layer is often described as a 3D image, it's more accurate to say it's a 3D tensor. 

* In the context of Convolutional Neural Networks (CNNs), the input data is indeed three-dimensional, representing an image with height, width, and color channels. The dimensions are typically organized as (height, width, channels). For example, a color image with dimensions 64x64 pixels and three color channels (RGB) would have an input shape of (64, 64, 3).

* However, when we talk about passing this data through a 2D convolutional layer, we're referring to the fact that the convolution operation is applied in two spatial dimensions (height and width). **The third dimension (channels) is treated independently during the convolution process.**

* In other words, each filter in the convolutional layer slides over the 2D spatial dimensions of the image, applying convolution independently to each color channel. The filters have depth that matches the number of input channels, and they slide across the height and width.

* So, even though we refer to it as a 2D convolution layer, it's still able to handle the depth of the input data due to its design. 

In [13]:
#Step1 - Convolution
classifier.add(Conv2D(32, (3,3), input_shape = (64,64,3), activation="relu"))

# What is the Pooling layer

* The **pooling layer** is a component commonly used in Convolutional Neural Networks (CNNs) to **downsample the spatial dimensions of the input data,** reducing its size and computational complexity. The pooling operation is applied independently to each depth slice of the input.

* There are two main types of pooling layers: Max Pooling and Average Pooling.

* **Max Pooling:** In max pooling, the output value of a specific region (often a 2x2 or 3x3 window) is the maximum value from that region in the input. It helps retain the most prominent features from the input, focusing on the presence of specific patterns.

* **Average Pooling:** In average pooling, the output value for a specific region is the average of all values in that region in the input. It provides a smoothed version of the input and is less likely to emphasize specific features.

* The pooling layer serves several purposes:
* **Spatial Reduction:** It reduces the spatial dimensions (width and height) of the input, making subsequent layers computationally more efficient.

* **Translation Invariance:** Pooling helps the network become somewhat invariant to small translations in the input, allowing it to recognize features regardless of their precise location.

* **Feature Generalization:** By summarizing information from a local neighborhood, pooling encourages the network to focus on the most relevant and general features.

In [14]:
#Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size = (2,2)))

########Describe the flatten layer#########

In [15]:
#Step 3 - Add flattening
classifier.add(Flatten())