<a href="https://colab.research.google.com/github/bshakhruz/DAN-templates/blob/main/DNN_video_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Training Deep Neural Networks on Videos Using Google Colab
---


# 1. Introduction to Google Colab

Google Colab is a free cloud service based on Jupyter Notebooks that supports free GPU and TPU usage. It's ideal for machine learning, data analysis, and education. The platform eliminates the need for expensive hardware, making deep learning more accessible.



# 2. Setting up the Colab Environment

- **Enable GPU/TPU**: Go to `Edit` > `Notebook Settings` or `Runtime` > `Change runtime type` and select GPU or TPU from the dropdown to accelerate your computations.

**Checkpoint:** Remember to save your notebook now to preserve this setting.

## Verifying the Accelerator Status

After enabling the hardware accelerator, run the following cell to confirm that it is active:


In [None]:
# Verify the hardware accelerator (GPU/TPU)
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
    print('No GPU found')
else:
    print('Found GPU at: {}'.format(device_name))

In [None]:
## 3. Installing Dependencies
!pip install --upgrade pip setuptools wheel
!pip install tensorflow opencv-python-headless
!pip3 install torch torchvision torchaudio

In [None]:
# Verify installation
import tensorflow as tf
import cv2
import torch
print(f'TensorFlow version: {tf.__version__}')
print(f'OpenCV version: {cv2.__version__}')
print(f'PyTorch version: {torch.__version__}')

# 3. Accessing and Preparing Video Data

Training deep learning models on video data requires organizing and preprocessing your video files into a format that your model can process. This section guides you through accessing your video data, whether stored locally or on Google Drive, and details the preprocessing steps necessary to prepare the data for model training.

## Preparing the Environment

Before you start preprocessing your video data, you'll need to install some additional dependencies.


In [None]:
# Installing Additional Dependencies for Preprocessing Phase
!apt update && !apt install ffmpeg
!pip install moviepy

In [None]:
# Mounting Google Drive (Optional)
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!wget -O /content/file.zip "https://drive.google.com/uc?export=download&id=YOUR_FILE_ID"

In [None]:
import zipfile

with zipfile.ZipFile('/content/file.zip', 'r') as zip_ref:
    zip_ref.extractall('/content/')

In [None]:
# This snippet loads a video, extract frames, and preprocess them,
# NOTE: place your actual path in the 'video_path' variable

# Necessary library imports
import cv2
import numpy as np
import os
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split

# Placeholder for directory where videos are located
video_directory = '<path_to_videos>' # :example: /content/

# List all video files in the directory
video_files = [file for file in os.listdir(video_directory) if file.endswith('.mp4')]

# Placeholder for labeling videos (replace with your own labels)
video_labels = {
    "example.mp4": 0,
    'example1.mp4': 1,
    # Add as many videos as you have...
}
# Function to extract and preprocess frames for a batch of videos
def extract_and_preprocess_batch(video_paths, labels, skip_frames=5, batch_size=32):
    frames_batch = []
    labels_batch = []
    for video_path, label in zip(video_paths, labels):
        count = 0
        frames = []
        cap = cv2.VideoCapture(video_path)
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break
            if count % skip_frames == 0:
                # Preprocess steps (e.g., resizing, normalization)
                frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
                resized_frame = cv2.resize(frame_rgb, (112, 112))  # Resize frame to model input size
                frames.append(resized_frame / 255.0)  # Normalize pixel values
            count += 1

        cap.release()
        frames_batch.extend(frames)
        labels_batch.extend([label] * len(frames))
        if len(frames_batch) >= batch_size:
            yield np.array(frames_batch), np.array(labels_batch)
            frames_batch = []
            labels_batch = []

    if frames_batch:
        yield np.array(frames_batch), np.array(labels_batch)

# Process videos in batches
batch_size = 32
processed_frames_batches = []
labels_batches = []

for batch_frames, batch_labels in extract_and_preprocess_batch(video_files, video_labels.values(), batch_size=batch_size):
    processed_frames_batches.append(batch_frames)
    labels_batches.append(batch_labels)
    print(f"Processed batch with {len(batch_frames)} frames")

# Concatenate batches
processed_frames = np.concatenate(processed_frames_batches, axis=0)
labels = np.concatenate(labels_batches, axis=0)

# After processing all batches
print(f"Total videos processed: {len(processed_frames_batches) * batch_size}")
print(f"Total frames: {len(processed_frames)}")
print(f"Labels shape: {labels.shape}")

# Optionally, display a frame to verify preprocessing
if len(processed_frames) > 0:
    plt.imshow(processed_frames[0])
    plt.axis('off')
    plt.show()
else:
    print("No frames available to display.")

# 4. Organizing the Dataset

After preprocessing the video data, the next step is to organize the dataset into distinct sets for training, validation, and testing. This ensures that we can train our model, fine-tune hyperparameters, and evaluate performance effectively.

## Splitting the Dataset

We will use `train_test_split` from `sklearn` to partition the data. The `test_size` parameter determines the proportion of the dataset to include in the test split.

In [None]:
from sklearn.model_selection import train_test_split

# Assuming 'processed_frames' contains all your preprocessed video frames
# and 'labels' is an array of corresponding labels for each video.


# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    processed_frames, labels, test_size=0.2, random_state=42
)

# Further split the training set to create a validation set
X_train, X_val, y_train, y_val = train_test_split(
    X_train, y_train, test_size=0.25, random_state=42
)  # 0.25 x 0.8 = 0.2 of the original dataset # Adjust test_size as per your requirement


# Note: Adjust the test_size parameter based on how much data you want to allocate for testing and validation.
# Now X_train, X_val, and X_test along with y_train, y_val, and y_test are ready to be used in the model training process.

# Save arrays to .npz file
np.savez('/content/dataset_splits.npz', X_train=X_train, X_val=X_val, X_test=X_test, y_train=y_train, y_val=y_val, y_test=y_test)

In [None]:
# Save Data Splits
import numpy as np

# Replace '/content/dataset_splits.npz' with your preferred save path
np.savez(
    '/content/dataset_splits.npz',
    X_train=X_train, X_val=X_val, X_test=X_test,
    y_train=y_train, y_val=y_val, y_test=y_test
)

# 5. Data Augmentation (Optional)

Data augmentation is an essential technique in the machine learning workflow, particularly when dealing with image and video data. By applying random transformations to your training data, you can artificially expand the size and variance of your dataset. This process is key to preventing overfitting and helps the model generalize better to new, unseen data.

In this section, we'll use Keras's preprocessing layers to implement on-the-fly data augmentation.

## Why Data Augmentation?

- **Improves Generalization:** By simulating a broader set of variations, the model is less likely to memorize specific data points.
- **Addresses Overfitting:** Especially in scenarios with limited data, augmentation can effectively increase the dataset size.
- **Enhances Robustness:** Models trained with augmented data often perform better in real-world scenarios where data imperfections are common.

## Implementing Data Augmentation

The `ImageDataGenerator` class in Keras provides a suite of tools for on-the-fly image augmentation.

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt

# Define your data augmentation pipeline
datagen = ImageDataGenerator(
    rotation_range=40,        # Random rotations from 0 to 40 degrees
    width_shift_range=0.2,    # Random horizontal shifts
    height_shift_range=0.2,   # Random vertical shifts
    shear_range=0.2,          # Shear transformations
    zoom_range=0.2,           # Random zoom
    horizontal_flip=True,     # Random horizontal flips
    fill_mode='nearest'       # Strategy for filling in new pixels
)

# Visualization of Data Augmentation
# Let's visualize some augmented examples to ensure our transformations are correct.
x_sample = X_train[0]
y_sample = y_train[0]

# Generate and plot a batch of augmented images
fig, axes = plt.subplots(1, 5, figsize=(20, 4))
axes = axes.flatten()
for ax in axes:
    # Apply a random transformation
    augmented_image = datagen.random_transform(x_sample)
    ax.imshow(augmented_image)
    ax.axis('off')
plt.tight_layout()
plt.show()

# To use data augmentation during training, pass the datagen.flow(...) as the training data in model.fit
# Example: model.fit(datagen.flow(X_train, y_train, batch_size=32), ...)

# 6. Overview of Different Architectures for Video Analysis

Selecting the right architecture for video analysis is pivotal to the success of your machine learning project. Below, we delve into several popular architectures, highlighting their uses and explaining how they work. Understanding these will help you choose the best fit for your project's needs.

## CNNs (Convolutional Neural Networks)

- **Use:** Primarily for extracting spatial features from video frames.
- **Explanation:** CNNs excel in identifying patterns, shapes, and textures within images, making them suitable for frame-level analysis.
- **Foundational Papers:** [Gradient-based learning applied to document recognition by LeCun et al.](https://ieeexplore.ieee.org/document/726791)

## RNN/LSTM/GRU

- **Use:** Best for analyzing temporal dependencies in video sequences.
- **Explanation:** RNNs and their variants, LSTM and GRU, are designed to model sequential data, capturing the temporal dynamics crucial for understanding video content over time.
- **Foundational Papers:**
  - RNN: [Finding Structure in Time by Elman.](https://crl.ucsd.edu/~elman/Papers/fsit.pdf)
  - LSTM: [Long Short-Term Memory by Hochreiter & Schmidhuber.](https://www.bioinf.jku.at/publications/older/2604.pdf)
  - GRU: [Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation by Cho et al.](https://arxiv.org/abs/1406.1078)

## 3D CNNs

- **Use:** For analyzing videos by extracting both spatial and temporal features.
- **Explanation:** By adding a time dimension to the convolutional layers, 3D CNNs can process sequences of frames, making them adept at recognizing actions and events in videos.
- **Foundational Paper:** [3D Convolutional Neural Networks for Human Action Recognition by Ji et al.](https://ieeexplore.ieee.org/document/6165309)

## Two-Stream Networks

- **Use:** For a comprehensive analysis by considering both spatial and temporal information.
- **Explanation:** This approach uses a dual-stream model, one for spatial features from single frames and another for temporal features from frame sequences, offering a balanced analysis.
- **Foundational Paper:** [Two-stream convolutional networks for action recognition in videos by Simonyan & Zisserman.](https://arxiv.org/abs/1406.2199)


## Transformers (e.g., Vision Transformers - ViT)

- **Use:** For tasks where capturing long-range dependencies within videos is crucial.
- **Explanation:** Adapting the attention mechanism from NLP, Vision Transformers process videos in a manner that emphasizes the interrelation of different parts of the video, both spatially and temporally.
- **Foundational Papers:**
  - General Transformers: [Attention is All You Need by Vaswani et al.](https://arxiv.org/abs/1706.03762)
  - Vision Transformers: [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al.](https://arxiv.org/abs/2010.11929)

---


## Choosing Your Neural Architecture Template

When selecting an architecture, consider the following:

- **Nature of the Task:** Is your focus on understanding the content of individual frames (CNN), the movement between frames (RNN, 3D CNN), or a combination of both (Two-Stream, Transformers)?
- **Complexity of the Video Data:** More complex data might benefit from architectures that can capture a wide range of dependencies, like Transformers.
- **Computational Resources:** Some models, especially 3D CNNs and Transformers, are more computationally intensive than others.

### Tips for Customization:

- **Adjusting Parameters:** Tailor parameters like `frame_height`, `frame_width`, and `num_classes` according to your dataset.
- **Preprocessing Needs:** Different architectures may require specific forms of input preprocessing. Ensure your data pipeline is compatible with your chosen model.

After considering these aspects, select the architecture template that aligns with your project's goals from the list below. Each choice entails specific considerations for dataset dimensions and task specifications (e.g., classification, detection).

1. **CNN for Spatial Features:** Ideal for projects focusing on frame-level analysis.
2. **RNN/LSTM for Temporal Features:** Suited for understanding sequences and temporal patterns.
3. **3D CNN for Spatio-Temporal Features:** Best for capturing actions and events over time.
4. **Two-Stream Network:** Offers a comprehensive analysis by leveraging both spatial and temporal data.
5. **Transformers:** For advanced projects requiring attention to complex patterns in large datasets.

Remember to customize your model based on the specific needs of your dataset and the computational resources available to you.

### Basic CNN Model for Spatial Feature Extraction

This CNN model is structured to extract spatial features from individual video frames, making it suitable for image-based analysis tasks within videos. Below is the template for creating a basic CNN:

In [None]:
# Basic CNN Model for Spatial Feature Extraction

import tensorflow as tf
from tensorflow.keras import layers, models

# Adjust these parameters to fit your dataset
frame_height = 112  # Height of the video frame
frame_width = 112   # Width of the video frame
num_classes = 7    # Number of output classes

# Basic CNN Model
model_cnn = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(frame_height, frame_width, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(num_classes, activation='softmax'),
])

model_cnn.summary()

# Compile the model
model_cnn.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

### RNN/LSTM for Temporal Feature Extraction

This cell demonstrates how to set up a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) layers. It's designed for extracting temporal features from sequences of video frames, which is crucial for understanding activities, actions, or any phenomena that evolve over time.

In [None]:
# RNN/LSTM for Temporal Feature Extraction
from tensorflow.keras import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Adjust these parameters to fit your dataset
timesteps = 100  # Length of your sequences
features = 128   # Features extracted from each frame or timestep
num_classes = 10 # Number of output classes

# RNN Model with LSTM
model_rnn = Sequential([
    LSTM(64, input_shape=(timesteps, features), return_sequences=True),
    LSTM(64),
    Dense(64, activation='relu'),
    Dense(num_classes, activation='softmax'),
])

model_rnn.summary()

# Compile the model
model_rnn.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

### 3D CNN for Spatio-Temporal Feature Extraction

This section introduces the setup for a 3D Convolutional Neural Network (3D CNN) designed to capture both spatial and temporal features from video clips. This model is capable of understanding motion and changes across consecutive frames, which is essential for tasks like action recognition.

# 3D CNN for Spatio-Temporal Feature Extraction

A 3D CNN extends the capabilities of traditional CNNs by analyzing sequences of frames to capture temporal dynamics alongside spatial features. Here's how to implement a 3D CNN model:

In [None]:
# 3D CNN for Spatio-Temporal Feature Extraction

from tensorflow.keras import layers, models
from tensorflow.keras.layers import Conv3D, MaxPooling3D, Flatten

# Adjust these parameters to fit your dataset
frames_per_clip = 16    # Number of frames per video clip
frame_height = 112      # Height of the video frame
frame_width = 112       # Width of the video frame
num_channels = 3        # Number of color channels (RGB)
num_classes = 10        # Number of output classes

# 3D CNN Model
model_3dcnn = models.Sequential([
    Conv3D(64, (3, 3, 3), activation='relu',
           input_shape=(frames_per_clip, frame_height, frame_width, num_channels)),
    MaxPooling3D((2, 2, 2)),
    Conv3D(128, (3, 3, 3), activation='relu'),
    MaxPooling3D((2, 2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(num_classes, activation='softmax'),
])

model_3dcnn.summary()

# Compile the model
model_3dcnn.compile(optimizer='adam',
                    loss='categorical_crossentropy',
                    metrics=['accuracy'])

### Two-Stream Networks for Video Analysis

Implementing a two-stream network involves creating and training two separate models: one focused on spatial features (using a CNN) and another on temporal features (using either a 3D CNN or an RNN). The predictions from these models are then combined to produce a final output. This combination can be achieved through simple averaging or a more complex learned fusion layer.

**Note:** This approach is conceptual, and specific implementation details will vary based on your project's needs.

### Implementation Overview:

- **For Spatial Features:** Utilize the CNN model outlined earlier.
- **For Temporal Features:** Employ either the 3D CNN model or an RNN model, depending on the nature of your data and the specific temporal dynamics you wish to capture.
- **Combining Models:** A straightforward method to combine these models is to average their predictions:

In [None]:
# Example of combining predictions
predictions = 0.5 * cnn_model.predict(spatial_data) + 0.5 * temporal_model.predict(temporal_data)

### Vision Transformers for Video Processing

Transformers have revolutionized natural language processing and are now making significant inroads into computer vision, including video processing tasks. Vision Transformers (ViT) apply the transformer architecture to image patches, treating each patch as a token similar to how words are treated in NLP. This method allows for capturing complex spatial hierarchies and has been extended to video processing to handle temporal dynamics as well.

# Transformers for Video Processing (Vision Transformers - ViT)

Vision Transformers (ViT) represent a novel approach in leveraging transformer architectures for video processing tasks. By decomposing video frames into a sequence of patches and processing these patches as tokens, ViTs can capture intricate spatial-temporal relationships within the video content.

### Preliminary Steps:

- **Base Model for Feature Extraction:** Leveraging pre-trained models like EfficientNet as a starting point for extracting features from video frames can be beneficial. These features then serve as inputs to the transformer model.

In [None]:
# Transformers for Video Processing (Vision Transformers - ViT)

from tensorflow.keras.applications import EfficientNetB0
from vit_keras import vit

# Example for loading a pre-trained model
base_model = EfficientNetB0(include_top=False, weights='imagenet')

# 7. Training and Monitoring the Model

Training your machine learning model is a crucial step where the model learns to recognize patterns from the data. To track the model's progress and ensure that we save the best version, we'll employ callbacks like `ModelCheckpoint` and `TensorBoard`.


In [None]:
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
import numpy as np

# Loading the dataset splits
data = np.load('/content/dataset_splits.npz') # Adjust according to your dataset path
X_train = data['X_train']
X_val = data['X_val']
y_train = data['y_train']
y_val = data['y_val']

# Setup callbacks
checkpoint_cb = ModelCheckpoint('best_model.h5', save_best_only=True)
tensorboard_cb = TensorBoard(log_dir='./logs')

# Train the model
history = model_cnn.fit(
    X_train, y_train,
    epochs=12,
    batch_size=32,
    validation_data=(X_val, y_val),
    callbacks=[checkpoint_cb, tensorboard_cb]
)

# Delete the dataset from Google Colab's disk to save RAM
del data  # Delete the variable holding the dataset from memory

# Optionally, delete the actual file from disk
os.remove('/content/dataset_splits.npz')  # Adjust the path if necessary

In [None]:
# Enabling TensorBoard within the notebook environment
%load_ext tensorboard
%tensorboard --logdir ./logs

# 8. Hyperparameter Tuning

Hyperparameter tuning is a pivotal phase in the machine learning pipeline. This process involves optimizing the model's architecture and the training procedure to maximize its performance. Hyperparameters, unlike the model's internal parameters learned during training, need to be set beforehand and have a substantial impact on the model's learning efficiency and output quality.

Commonly tuned hyperparameters include:
- **Learning Rate**: Controls how much to adjust the model in response to the estimated error each time the model weights are updated.
- **Batch Size**: Number of training examples utilized in one iteration.
- **Number of Epochs**: Total number of times the training dataset is passed forward and backward through the neural network.
- **Architecture-Specific Parameters**: Such as the number of layers or units in a layer, which can vary significantly across different models.

### Approaches to Hyperparameter Tuning:

1. **Manual Tuning**: Relying on experience and intuition to adjust hyperparameters.
2. **Grid Search**: Exhaustively searching through a predefined list of hyperparameter values.
3. **Random Search**: Randomly selecting hyperparameter values from a defined range and evaluating their performance.
4. **Bayesian Optimization**: Using probabilistic models to guide the search for the optimum hyperparameters.
5. **Automated Tools**: Leveraging tools like Keras Tuner or Hyperopt to automate the tuning process.

### Implementing Hyperparameter Tuning:

For practical hyperparameter tuning, consider starting with either Grid Search or Random Search as they are straightforward to implement and can yield significant improvements:

---

### Manual Tuning Approach

1. **Start with a baseline model**: Set up your model with a default set of hyperparameters.
2. **Identify key hyperparameters**: Focus on those most likely to impact performance, such as learning rate or the number of layers.
3. **Iteratively adjust values**: Manually change one hyperparameter at a time and monitor the effect on model performance.
4. **Use a systematic approach**: Keep a log of changes and results to guide future adjustments.
5. **Refinement**: Once the model shows improvement, refine your search around the best-performing values.

### Grid Search Template

Grid Search exhaustively tests a predefined range of hyperparameter values, ensuring that you explore all possible combinations within your specified grid. This method is particularly useful when the number of hyperparameters and their potential values are relatively low.

In [None]:
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier

# Assume 'build_model' is a function that constructs a Keras model
def build_model(optimizer='adam'):
    model = Sequential()
    model.add(Dense(12, input_shape=(input_dim,), activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=build_model)

param_grid = {
    'epochs': [10, 20],
    'batch_size': [16, 32],
    'optimizer': ['adam', 'rmsprop']
}

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X_train, y_train)

# Access the best set of parameters
best_params = grid_result.best_params_

### Random Search Template

Random Search optimizes hyperparameters by sampling values from a defined distribution. This method can be more efficient than Grid Search, particularly when the hyperparameter space is large.

In [None]:
from sklearn.model_selection import RandomizedSearchCV
from keras.wrappers.scikit_learn import KerasClassifier

def build_model(optimizer='adam'):
    # Model construction (omitted for brevity)
    return model

model = KerasClassifier(build_fn=build_model, epochs=20, batch_size=32)

param_dist = {
    'batch_size': [16, 32, 64],
    'epochs': [10, 20, 30],
    'optimizer': ['adam', 'rmsprop']
}

random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=10, n_jobs=-1, cv=3)
random_search_result = random_search.fit(X_train, y_train)

# Best parameters
best_params = random_search_result.best_params_

### Bayesian Optimization Template

Bayesian Optimization leverages a probabilistic model to select hyperparameters that are likely to yield better results. This method is efficient for finding optimal hyperparameters with fewer trials, making it ideal for complex models.

In [None]:
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam

def objective(space):
    model = Sequential()
    model.add(Dense(units=int(space['units']), input_dim=input_dim, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))
    model.compile(optimizer=Adam(learning_rate=space['learning_rate']),
                  loss='categorical_crossentropy', metrics=['accuracy'])

    history = model.fit(X_train, y_train, epochs=20, batch_size=int(space['batch_size']), verbose=0)

    val_loss = np.min(history.history['val_loss'])
    return {'loss': val_loss, 'status': STATUS_OK}

space = {
    'units': hp.quniform('units', 50, 150, 1),
    'batch_size': hp.choice('batch_size', [16, 32, 64]),
    'learning_rate': hp.loguniform('learning_rate', np.log(0.0001), np.log(0.01)),
}

trials = Trials()
best = fmin(fn=objective,
            space=space,
            algo=tpe.suggest,
            max_evals=100,
            trials=trials)

best_params = space_eval(space, best)

### Hyperparameter Tuning with Keras Tuner

Keras Tuner simplifies the task of finding the best hyperparameters for your model. The example below outlines how to use Keras Tuner to optimize a Convolutional Neural Network (CNN) architecture.

In [None]:
from keras_tuner import RandomSearch
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten

input_shape = (112, 112, 3)  # Example input shape; adjust as necessary
num_classes = 10  # Adjust based on your dataset

# Function to build the model (required for Keras Tuner)
def build_model(hp):
    model = Sequential()
    model.add(Conv2D(filters=hp.Int('conv_filters', min_value=32, max_value=128, step=32),
                     kernel_size=hp.Choice('conv_kernel_size', values=[3, 5]),
                     activation='relu',
                     input_shape=input_shape))
    model.add(MaxPooling2D())
    model.add(Flatten())
    model.add(Dense(units=hp.Int('dense_units', min_value=32, max_value=128, step=32),
                    activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))

    hp_learning_rate = hp.Float('learning_rate', min_value=1e-4, max_value=1e-2, sampling='LOG')

    model.compile(optimizer=Adam(learning_rate=hp_learning_rate),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model

# Initialize the tuner
tuner = RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=10,
    executions_per_trial=1,
    directory='my_dir',
    project_name='hparam_tuning'
)

# Perform the search
tuner.search(X_train, y_train, epochs=10, validation_data=(X_val, y_val))

# Retrieve the best hyperparameters
best_hps = tuner.get_best_hyperparameters()[0]
print(f"""
The optimal number of units in the first dense layer is {best_hps.get('dense_units')} and the
optimal learning rate for the optimizer is {best_hps.get('learning_rate')}.
""")

# Rebuild the model with the best hyperparameters and train it
model = tuner.hypermodel.build(best_hps)
history = model.fit(X_train, y_train, epochs=50, validation_data=(X_val, y_val))

# 9. Evaluating the Model

Once your model has been trained and tuned, the next step is to evaluate its performance on the test set. This is crucial for understanding how well the model can generalize to new, unseen data. Here's how you can evaluate your model using TensorFlow/Keras:

In [None]:
import numpy as np

# Assuming y_test is a list or a Pandas series, convert it to a numpy array for compatibility with Keras
y_test = np.array(y_test)

# Evaluate the model on the test set
test_loss, test_acc = model_cnn.evaluate(X_test, y_test)
print(f"Test accuracy: {test_acc:.2f}")

# 10. Implementing Regularization Techniques

Regularization techniques are critical in preventing overfitting, especially when you have a high-capacity model or limited data. Overfitting occurs when a model learns the noise in the training data to the extent that it negatively impacts the performance of the model on new data. Regularization methods provide ways to penalize model complexity or introduce noise to the training process to promote the generalizability of the model.

There are several regularization techniques:
- **L1 and L2 Regularization**: Penalizes the weights of the model during training, which can help to prevent overfitting by encouraging simpler models that may generalize better.
- **Dropout**: Randomly sets a fraction of input units to 0 at each update during training time, which helps to prevent overfitting by making the neural network less sensitive to the specific weights of neurons.
- **Batch Normalization**: Although primarily used to normalize the input layer by adjusting and scaling the activations, it can also have a regularizing effect.
- **Early Stopping**: Monitors the model's performance on a validation set and stops training when performance begins to degrade.

In the following code cells, we will add L2 regularization and dropout to our neural network to help with overfitting. We'll also implement early stopping to halt the training process at the optimal moment.




### Template for Building a Regularized Model

Below is a template function `build_regularized_model()` that defines a neural network model incorporating L2 regularization and dropout for improved generalization.


In [None]:
from tensorflow.keras import models, layers, regularizers

frame_height = 112  # Adjust based on your data
frame_width = 112   # Adjust based on your data
num_classes = 10    # Adjust based on your data

def build_regularized_model():
    model = models.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=(frame_height, frame_width, 3)),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(128, (3, 3), activation='relu'),
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation='softmax'),
    ])

    model.compile(optimizer='adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model

## Step 2: Employ Early Stopping During Training

Use the EarlyStopping callback to halt training when the validation loss stops improving, helping to prevent overfitting on the training data.


## Training with Early Stopping

To utilize Early Stopping, include it in the callbacks when training your model. This approach stops the training process early if the model's performance on the validation set does not improve, helping to save resources and prevent overfitting.

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

# Setting up EarlyStopping
early_stopping_cb = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

# Training the model with the regularization techniques and early stopping
history_regularized = model_regularized.fit(
    X_train, y_train,
    epochs=100,  # High epoch limit, but training may stop early due to early stopping
    batch_size=32,
    validation_data=(X_val, y_val),
    callbacks=[early_stopping_cb]  # Make sure to include other callbacks if needed
)

## Step 3: Evaluate the Model

After training, evaluate the regularized model on the test set to assess its performance on unseen data.

## Model Evaluation

Finally, assess how well your regularized model generalizes by evaluating it on the test set.

In [None]:
test_loss_reg, test_acc_reg = model_regularized.evaluate(X_test, y_test)
print(f"Test accuracy with regularization: {test_acc_reg:.2f}")

# 11. Preparing for Fine-Tuning

Fine-tuning enhances a pre-trained model's performance on a new dataset by carefully adjusting its parameters. The key steps involve freezing certain layers of the model while allowing others to update, and setting a lower learning rate for subtle adjustments.

In [None]:
import tensorflow as tf
from tensorflow.keras.optimizers import Adam

# Assuming 'model' is a pre-trained model and 'layer_to_freeze' is the layer up to which you want to freeze

# Freeze layers not intended for fine-tuning
for layer in model.layers[:layer_to_freeze]:
    layer.trainable = False

# Unfreeze layers for fine-tuning
for layer in model.layers[layer_to_freeze:]:
    layer.trainable = True

# Adjust the learning rate for fine-tuning
optimizer = Adam(learning_rate=1e-5)  # Set a lower learning rate for fine-tuning
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

# Review the model structure to confirm layer adjustments
model.summary()

## Fine-Tuning the Model

After preparing your model for fine-tuning, proceed with training, focusing on refining the model's understanding of your specific dataset. Monitoring the model's performance during this phase is crucial to ensure that it improves.

In [None]:
# Continue the training process with fine-tuning adjustments
history_fine = model.fit(
    X_train, y_train,
    epochs=10,  # Number of epochs may be adjusted based on observed performance improvements
    batch_size=32,
    validation_data=(X_val, y_val),
    callbacks=[checkpoint_cb, tensorboard_cb]  # Utilize callbacks from initial training for consistency
)

# 12. Evaluating the Model Post Fine-Tuning

Once fine-tuning is complete, it's crucial to evaluate the model's performance again. This allows you to measure the effectiveness of the fine-tuning process and understand how it has impacted the model's ability to generalize to unseen data.


In [None]:
# Assuming 'model_cnn' is your model after fine-tuning
test_loss_ft, test_acc_ft = model_cnn.evaluate(X_test, y_test)
print(f"Post Fine-Tuning Test Accuracy: {test_acc_ft:.2f}")

# 13. Saving the Trained Model

After your model has been trained and fine-tuned to your satisfaction, saving it allows for easy reuse and deployment. The following snippet demonstrates how to save your TensorFlow/Keras model as an H5 file, which is a portable format for storing models.

In [None]:
import os
from tensorflow.keras.models import load_model

# Assuming 'model_cnn' is your final model after training and fine-tuning
model_cnn.save('my_model.h5')

# Verification step to check if the model has been saved correctly
if os.path.exists('my_model.h5'):
    print('Model saved successfully as my_model.h5')
else:
    print('Model saving failed.')

## Model Loading and Inference