<a href="https://colab.research.google.com/github/dishantgupta2004/Deep-Learning-Assignments/blob/main/DishantGupta_CNNFundamentals_Assignments.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Explain the basic components of a digital image and how it is represented in a computer. State the differences between grayscale and color images.**

**Basic Components of a Digital Image**: A digital image is a two-dimensional representation of a visual scene that is stored and processed on a computer. It is made up of the following basic components:
1. **Pixels**: The smallest unit of a digital image. Each pixel represents a single point in the image and holds information about its color or intensity. Images are usually organized in a grid of pixels (rows and columns).
2. **Resolution**: The total number of pixels in the image, typically given as width × height (e.g., 1920×1080). Higher resolution means more detail.
3. **Bit Depth**: Refers to the number of bits used to represent the color or intensity of each pixel.
4. **Common values**: 8-bit grayscale = 256 levels of intensity. 24-bit color = 8 bits per channel for RGB (Red, Green, Blue).
5. **Color Model**: Defines how colors are represented. The most common model for digital images is RGB (Red, Green, Blue). Each pixel in a color image is represented by three values corresponding to the R, G, and B channels.

**Image Representation in a Computer**: Digital images are stored as arrays (matrices) in memory. For a grayscale image, it is a 2D array where each element is a pixel intensity (e.g., 0 = black, 255 = white). For a color image, it is a 3D array: width × height × 3 (for RGB values). Images can be stored in various file formats such as JPEG, PNG, BMP, etc., which may include compression and metadata.

***Difference between Gray Scale and Color Images**

| Feature              | Grayscale Image                              | Color Image                                  |
|----------------------|----------------------------------------------|----------------------------------------------|
| Pixel Value          | Single intensity value (0–255)               | Three values (R, G, B), each 0–255           |
| Memory Requirement   | Less (1 byte per pixel)                      | More (typically 3 bytes per pixel)           |
| Bit Depth            | Usually 8 bits                               | Usually 24 bits (8 bits per channel)         |
| Visual Content       | Shades of gray (black to white)              | Full range of colors                         |
| Array Dimensions     | 2D array (height × width)                    | 3D array (height × width × 3)                |

### **Define Convolutional Neural Networks (CNNs) and discuss their role in image processing.Describe the key advantages of using CNNs over traditional neural networks for image-related tasks.**


- **Definition**: Convolutional Neural Networks (CNNs) are a specialized type of deep learning model designed primarily for processing grid-like data, such as images. They automatically and adaptively learn spatial hierarchies of features from input images through a series of layers. CNNs use a mathematical operation called convolution instead of relying on full connectivity (as in traditional neural networks). This operation helps to extract local features like edges, textures, and shapes.

- **Role of CNNs in Image Processing**: CNNs are widely used in image-related tasks such as:
 1. Image classification (e.g., identifying objects in photos)
 2. Object detection (e.g., locating faces in images)
 3. Image segmentation (e.g., separating objects from backgrounds)
 4. Facial recognition, medical image analysis, and self-driving car vision systems

- **Core components of CNNs**:
 1. Convolutional Layers: Apply filters (kernels) to extract local features Learn low-level features (edges) in early layers and high-level features (shapes, objects) in deeper layers.
 2. Activation Functions: Typically ReLU (Rectified Linear Unit), which introduces non-linearity.
 3. Pooling Layers: Reduce the spatial dimensions (e.g., max pooling). Helps in down-sampling, reducing computation and overfitting.
 4. Fully Connected Layers: Flatten the extracted features and perform final classification.
 5. Dropout & Batch Normalization: Help improve training efficiency and prevent overfitting. Key Advantages of CNNs Over Traditional Neural Networks

| Feature                          | CNNs                                               | Traditional Neural Networks                     |
|----------------------------------|----------------------------------------------------|--------------------------------------------------|
| **Parameter Efficiency**         | Shared weights reduce the number of parameters     | Full connectivity causes a huge number of weights |
| **Translation Invariance**       | Convolution detects features regardless of position| Sensitive to location of features                |
| **Feature Hierarchy Learning**   | Learns local to global features in layers          | Requires manual feature extraction               |
| **Better Performance**           | Superior accuracy in image-related tasks           | Inferior in complex image understanding tasks    |
| **Reduced Overfitting**          | Fewer parameters and pooling reduce overfitting    | More prone to overfitting without regularization |
| **Scalability**                  | Scales well to high-resolution images              | Computationally expensive on large images        |


In [1]:
## Implementation:

import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt

(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0

model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10)  # 10 output classes
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

history = model.fit(x_train, y_train, epochs=20,
                    validation_data=(x_test, y_test))


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 0us/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/20
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m72s[0m 45ms/step - accuracy: 0.3672 - loss: 1.7213 - val_accuracy: 0.5726 - val_loss: 1.1898
Epoch 2/20
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 44ms/step - accuracy: 0.5880 - loss: 1.1603 - val_accuracy: 0.6290 - val_loss: 1.0462
Epoch 3/20
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m82s[0m 45ms/step - accuracy: 0.6509 - loss: 1.0003 - val_accuracy: 0.6384 - val_loss: 1.0126
Epoch 4/20
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m70s[0m 45ms/step - accuracy: 0.6792 - loss: 0.9064 - val_accuracy: 0.6896 - val_loss: 0.9042
Epoch 5/20
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m69s[0m 44ms/step - accuracy: 0.7205 - loss: 0.8030 - val_accuracy: 0.6976 - val_loss: 0.8829
Epoch 6/20
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m84s[0m 45ms/step - accuracy: 0.7414 - loss: 0.7479 - val_accuracy: 0.7012 - val_loss: 0.8908
Epoc

### Define convolutional layers and their purpose in a CNN.Discuss the concept of filters and how they are applied during the convolution operation.Explain the use of padding and strides in convolutional layers and their impact on the output size.

📌 1. Convolutional Layers and Their Purpose in CNNs
A convolutional layer is the fundamental building block of a Convolutional Neural Network (CNN). It performs the convolution operation to extract features from the input data (usually images).

🔹 Purpose:
Automatically learns and detects local features such as edges, textures, and patterns.

Preserves the spatial relationship between pixels.

Reduces the need for manual feature extraction.

Enables deep networks to learn hierarchical representations — from low-level (edges) to high-level (shapes, objects).

📌 2. Filters (Kernels) and the Convolution Operation
A filter or kernel is a small matrix (e.g., 3×3 or 5×5) of trainable weights used in the convolution operation.

🔹 How Filters Work:
A filter slides over the input image, computing element-wise multiplications with overlapping parts of the input.

The results are summed to produce a single value in the output feature map.

Each filter is designed to detect a specific feature (e.g., vertical edge, corner, color pattern).

Multiple filters are used in each convolutional layer, producing multiple feature maps.

📌 3. Padding in Convolutional Layers
Padding refers to adding extra pixels (typically zeros) around the edges of the input image before applying convolution.

🔹 Types of Padding:
Valid Padding: No padding; results in a smaller output.

Same Padding: Pads input so output has the same dimensions as input.

🔹 Why Padding Is Used:
To preserve spatial size (important in deep networks).

To allow filters to reach edge pixels.

To control the size of the output feature map.

📌 4. Strides in Convolutional Layers
Stride defines how far the filter moves across the input in each step.

🔹 Effect of Stride:
Stride = 1: Filter slides one pixel at a time → output retains more detail.

Stride > 1: Filter jumps by 2+ pixels → output is smaller and more compressed.

| Component         | Description                                             | Effect on Output                     |
|------------------|---------------------------------------------------------|--------------------------------------|
| **Convolutional Layer** | Extracts features using filters                    | Learns patterns in input data        |
| **Filter (Kernel)**     | Small matrix of weights for feature detection      | Creates feature maps                 |
| **Padding**             | Adds pixels to border                              | Controls output size, retains edges  |
| **Stride**              | Step size of filter movement                       | Affects spatial resolution of output |

### Describe the purpose of pooling layers in CNNs.Compare max pooling and average pooling operations.

- **Purpose of Pooling Layers in CNNs**: Pooling layers are used in Convolutional Neural Networks (CNNs) to downsample feature maps, reducing their spatial dimensions (width and height) while retaining the most important features.
- **Main Purposes**:
 1. Dimensionality Reduction: Reduces the number of parameters and computation.
 2. Translation Invariance: Makes the model more robust to small shifts and distortions.
 3. Feature Consolidation: Emphasizes the most dominant features detected by convolutional layers.

- **How Pooling Works**: Pooling operates on small patches (usually 2×2) of the input and replaces each patch with a single value based on the type of pooling.

| Feature             | Max Pooling                                  | Average Pooling                             |
|---------------------|-----------------------------------------------|---------------------------------------------|
| **Definition**      | Takes the **maximum** value in each patch     | Takes the **average** of values in each patch |
| **Purpose**         | Captures the **strongest activation**         | Captures the **average presence** of features |
| **Effect**          | Highlights prominent features (edges, etc.)   | Smooths feature maps, may lose sharp details |
| **Output**          | More **sparse** representation                | More **blended** representation              |
| **Common Use Case** | Most widely used in modern CNNs               | Occasionally used for smoother features      |