<a href="https://colab.research.google.com/github/Festuskipkoech/Festus_data-science/blob/main/CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Key Concepts of Convolutional Neural Networks (CNNs)

## 1. **Introduction to CNNs**
   - CNNs are a type of deep learning algorithm designed for tasks involving spatial data, such as images and videos.
   - They automatically and adaptively learn spatial hierarchies of features from input data.

## 2. **Key Components of CNNs**

### a. **Convolutional Layer**
   - **Purpose**: Extracts features from input data by applying filters (kernels).
   - **Kernel/Filter**: A small matrix (e.g., 3x3 or 5x5) that slides over the input.
   - **Stride**: Determines the step size for the filter movement.
   - **Padding**: Adds extra pixels around the input to control the output size (e.g., 'same' or 'valid').

### b. **Pooling Layer**
   - **Purpose**: Reduces spatial dimensions of feature maps, retaining important information.
   - **Types**:
     - **Max Pooling**: Takes the maximum value in a region.
     - **Average Pooling**: Takes the average value in a region.
   - **Benefits**:
     - Reduces computational cost.
     - Prevents overfitting by simplifying the feature map.

### c. **Fully Connected Layer**
   - **Purpose**: Combines extracted features to make predictions.
   - **Structure**: Fully connects all neurons from the previous layer to the next layer.

### d. **Activation Functions**
   - **ReLU (Rectified Linear Unit)**: Applies a non-linear transformation, setting negative values to 0.
   - **Softmax**: Used in the output layer for multi-class classification tasks.convert the raw scores or logits into a probability distribution over all possible classes

### e. **Dropout Layer**
   - **Purpose**: Prevents overfitting by randomly disabling neurons during training.

## 3. **Training CNNs**
   - **Forward Propagation**: Computes the output of the network given the input.
   - **Loss Function**: Measures the difference between predicted and actual labels.
   - **Backpropagation**: Updates weights (parameters of the model that are adjusted to improve performance) using the gradients of the loss function, which indicate how much the weights contribute to the error.
   - **Optimization**: Typically uses algorithms like SGD (Stochastic Gradient Descent) or Adam.

## 4. **Hyperparameters in CNNs**
   - **Number of Filters**: Determines the depth of feature maps.
   - **Filter Size**: Affects the granularity of features extracted.
   - **Stride and Padding**: Controls the output size of convolutional layers.
   - **Learning Rate**: Influences the speed of model training.
   - **Batch Size**: Determines the number of samples processed at a time during training.

## 5. **Key Concepts to Remember**
   - **Feature Extraction**: Convolutional layers identify patterns such as edges (e.g., horizontal, vertical, or diagonal boundaries between contrasting regions in an image) and textures (e.g., repetitive patterns, fine details, or surface characteristics like smoothness, roughness, or graininess).
   - **Hierarchy of Features**: Deeper layers capture more complex patterns.
   - **Translation Invariance**: Pooling ensures the model is less sensitive to position changes in the input.

## 6. **Applications of CNNs**
   - Image classification (e.g., cats vs. dogs).
   - Object detection (e.g., detecting faces in an image).
   - Semantic segmentation (e.g., identifying objects pixel-by-pixel in an image).
   - Medical imaging, autonomous vehicles, and more.

## 7. **Common Architectures**
   - **LeNet**: The pioneer CNN model for handwritten digit recognition.
   - **AlexNet**: Popularized deep CNNs with deeper layers and ReLU activation.
   - **VGGNet**: Uses smaller filters but increases the number of layers for better feature extraction.
   - **ResNet**: Introduces skip connections to avoid the vanishing gradient problem.

## 8. **Practical Tips**
   - Use **data augmentation** to enhance the diversity of the training data.
   - Regularize using **dropout** and **batch normalization**.
   - Monitor for **overfitting** by observing the training and validation loss.
   - Choose an appropriate **learning rate** for smooth convergence.


In [2]:
import tensorflow as tf
from tensorflow.keras import layers, models

In [5]:
def create_cnn_model():
    # Initialize a Sequential model - layers will be added one after another in sequence
    model = models.Sequential()

    # First Convolutional Block
    # Add first conv layer with 32 filters of size 3x3, ReLU activation, same padding, input shape 32x32x3 (for RGB images)
    model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)))
    # Normalize the activations of the previous layer
    model.add(layers.BatchNormalization())
    # Second conv layer with same parameters (except no input_shape needed)
    model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same'))
    model.add(layers.BatchNormalization())
    # normalization-refers to the process of scaling and shifting the input data to have a specific distribution, typically a mean of 0 and a standard deviation of 1
    # Reduce spatial dimensions by half using max pooling
    model.add(layers.MaxPooling2D((2,2)))
    # Randomly drop 30% of connections to prevent overfitting
    # overfitting  occurs when a model is too complex and learns the noise and idiosyncrasies of the training data instead of generalizing to new, unseen data.
    model.add(layers.Dropout(0.3))

    # Second Convolutional Block
    # Increase filters to 64 to learn more complex features
    model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.MaxPooling2D((2,2)))
    # Increase dropout to 50% as we go deeper
    model.add(layers.Dropout(0.5))

    # Third Convolutional Block
    # Increase filters to 128 to learn even more complex features
    model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.MaxPooling2D((2,2)))
    model.add(layers.Dropout(0.5))

    # Fully Connected Layers
    # Flatten the 3D output to 1D for dense layers
    model.add(layers.Flatten())
    # Dense layer with 128 neurons
    model.add(layers.Dense(128, activation='relu'))
    model.add(layers.BatchNormalization())
    model.add(layers.Dropout(0.5))
    # Output layer with 10 neurons (for 10 classes) using softmax for probability distribution
    model.add(layers.Dense(10, activation='softmax'))

    return model

# Create the model
model = create_cnn_model()
# Display the model architecture and parameter count
model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [7]:
# Import necessary libraries
from tensorflow.keras.datasets import cifar10
# CIFAR-10 is a dataset of 60,000 32x32 color images in 10 classes
from tensorflow.keras.utils import to_categorical
# Utility to convert labels to one-hot encoded format
# One-Hot Encoding (OHE) is a technique used to convert categorical data into a numerical format that can be processed by machine learning algorithms, including Convolutional Neural Networks (CNNs)

# Load and preprocess the data
# cifar10.load_data() returns:
# - 50,000 32x32 RGB training images and their labels
# - 10,000 32x32 RGB test images and their labels
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()

# Normalize pixel values from [0, 255] to [0, 1] by dividing by 255
# Converting to float32 saves memory and is more efficient for training
train_images = train_images.astype('float32') / 255
test_images = test_images.astype('float32') / 255

# Convert labels to one-hot encoded format
# Example: if label is 3, it becomes [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
# This is necessary for categorical_crossentropy loss
# CCE measures the difference between the predicted probability distribution and the true label distribution
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Configure the model for training
model.compile(
    optimizer='adam',  # Adam optimizer automatically adapts learning rate during training
    loss='categorical_crossentropy',  # Appropriate loss function for multi-class classification
    metrics=['accuracy']  # Track accuracy during training
)

# Train the model
model.fit(
    train_images,  # Input training data
    train_labels,  # Target labels
    epochs=10,  # Number of times to iterate over the entire dataset
    batch_size=64,  # Number of samples to process before updating model weights
    validation_split=0.2  # Use 20% of training data for validation during training
)

Epoch 1/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m349s[0m 549ms/step - accuracy: 0.7743 - loss: 0.6533 - val_accuracy: 0.7732 - val_loss: 0.6574
Epoch 2/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m377s[0m 541ms/step - accuracy: 0.7861 - loss: 0.6284 - val_accuracy: 0.7536 - val_loss: 0.7448
Epoch 3/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m387s[0m 549ms/step - accuracy: 0.7981 - loss: 0.5864 - val_accuracy: 0.8148 - val_loss: 0.5309
Epoch 4/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m375s[0m 537ms/step - accuracy: 0.8053 - loss: 0.5696 - val_accuracy: 0.7723 - val_loss: 0.6929
Epoch 5/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m384s[0m 541ms/step - accuracy: 0.8094 - loss: 0.5523 - val_accuracy: 0.7463 - val_loss: 0.7942
Epoch 6/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m332s[0m 531ms/step - accuracy: 0.8175 - loss: 0.5378 - val_accuracy: 0.8119 - val_loss: 0.5597
Epoc

<keras.src.callbacks.history.History at 0x79dbbe1db040>

In [8]:
# evaluate the model
test_loss, test_acc =model.evaluate(test_images, test_labels)
print("Test Accuracy", test_acc)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 63ms/step - accuracy: 0.8060 - loss: 0.5830
Test Accuracy 0.8032000064849854
