# Training our model

## Download Data

In [None]:
# Download TF Dataset into environment
!wget -O ASLDataset.zip "https://www.dropbox.com/scl/fi/1799mm7ovirghzpx3cnb1/ASLDataset.zip?rlkey=s8sjhss4ofs2r7k742ucxg8mf&dl=1"
!unzip -q ASLDataset.zip

## Defining Constants

Before starting with the tasks, let's get familiar with the constants and their significance:

- **`IMPORTANT_LANDMARKS`**: This list contains specific landmarks (points on the hands and face) that are crucial for recognizing sign language gestures. The numbers (e.g., 0, 9, 11) indicate the indices of these landmarks in the dataset.
- **`DATASET_PATH`**: The location of the dataset. This path is where our dataset is stored and from where it will be loaded.
- **`NUMBER_OF_SIGNS`**: The total number of unique sign language gestures included in the dataset. With our simplified dataset, this number is set to 5.

In [None]:
# Constants for data and model configuration

# Key landmarks for gesture recognition
IMPORTANT_LANDMARKS = [0, 9, 11, 13, 14, 17, 117, 118, 119, 199, 346, 347, 348] + list(range(468, 543))

# Dataset storage location
DATASET_PATH = "/content/ASLDataset"

# Number of unique gestures
NUMBER_OF_SIGNS = 5

### Task 1: Load and Preprocess Dataset

In this task, our goal is to prepare the dataset for our sign language recognition model. This involves loading the dataset, selecting specific data points (landmarks), handling missing values, and finally, splitting the dataset into training and testing sets.

#### **Code Explanation**:

1. **Defining the Preprocessing Function**:

   - **`def preprocess_data(data, labels):`**: This function prepares each batch of data for our model. The `data` parameter represents a batch of video frames, each containing several landmarks. The `labels` are the correct answers (signs) for each batch.

2. **Selecting Important Landmarks**:

   - **`processed_data = tf.gather(data, IMPORTANT_LANDMARKS, axis=2)`**: We extract only the landmarks defined in `IMPORTANT_LANDMARKS`. This focuses our model on the most relevant features. The `axis=2` specifies that landmarks are selected across the depth of our data, ensuring we get the right points from each frame.

3. **Cleaning Missing Values**:

   - **`processed_data = tf.where(tf.math.is_nan(processed_data), tf.zeros_like(processed_data), processed_data)`**: If any landmarks are missing (represented by NaN values), we replace them with zeros. This step is necessary for consistency in our data and avoid feeding invalid data to the model.

4. **Reshaping Landmark Data**:

   - The final step in preprocessing is to reshape the landmark data so it can be efficiently processed by our model. This is done by concatenating the `x`, `y`, and `z` coordinates of each landmark, forming a unified input structure.

#### **Loading and Preprocessing the Dataset**:

- **`dataset = tf.data.Dataset.load(DATASET_PATH)`**: Here, we load our dataset from a specified path.

- **`dataset = dataset.map(preprocess_data)`**: We apply our `preprocess_data` function to each element of our dataset. This ensures all data is preprocessed uniformly, making it suitable for training and evaluation.

#### **Splitting into Training and Validation Sets**:

- **Training and Validation Datasets**: We split our dataset into two parts: one for training the model and another for validation. This is crucial for evaluating our model's performance on unseen data.

  - **`val_ds = dataset.take(1).cache().prefetch(tf.data.AUTOTUNE)`**: Creates a validation dataset from the first part of our dataset. The `.cache()` and `.prefetch(tf.data.AUTOTUNE)` methods are used to optimize data loading, making it more efficient.

  - **`train_ds = dataset.skip(1).cache().shuffle(20).prefetch(tf.data.AUTOTUNE)`**: Forms the training dataset by excluding the part used for validation. Shuffling the data helps in preventing the model from learning the order of the data, instead of learning the actual patterns.

#### **Summary**:

- Task 1 focuses on the initial steps of loading and preprocessing our dataset. It ensures that the data fed into our model is clean, relevant, and suitably formatted.

In [None]:
# Preprocess the data: select landmarks, replace missing values, and prepare for model
def preprocess_data(data, labels):
    # Select important landmarks
    processed_data = tf.gather(data, IMPORTANT_LANDMARKS, axis=2)
    # Replace missing values with 0
    processed_data = tf.where(tf.math.is_nan(processed_data), tf.zeros_like(processed_data), processed_data)
    # Reshape and return data
    return tf.concat([processed_data[..., i] for i in range(3)], -1), labels

# Load the dataset and apply preprocessing
dataset = tf.data.Dataset.load(DATASET_PATH)
dataset = dataset.map(preprocess_data)

# Split the dataset into training and testing sets
val_ds = dataset.take(1).cache().prefetch(tf.data.AUTOTUNE)
train_ds = dataset.skip(1).cache().shuffle(20).prefetch(tf.data.AUTOTUNE)

### Task 2: Setup Training Callbacks

In this task, we introduce an aspect of training neural networks efficiently: callbacks. Callbacks are tools in TensorFlow and Keras that allow us to monitor our model's performance during training and take specific actions based on those observations. They help in preventing overfitting, optimizing training time, and adjusting learning rates to achieve better performance.

#### **Code Explanation**:

1. **EarlyStopping Callback**:

   - **`tf.keras.callbacks.EarlyStopping(monitor="val_accuracy", patience=10, restore_best_weights=True)`**: This callback monitors the model's validation accuracy. If the validation accuracy stops improving for 10 consecutive epochs (`patience=10`), the training process will be halted. By setting `restore_best_weights=True`, the model will revert to the weights from the epoch with the highest validation accuracy. This mechanism helps in preventing overfitting—where the model performs well on the training data but poorly on unseen data.

2. **ReduceLROnPlateau Callback**:

   - **`tf.keras.callbacks.ReduceLROnPlateau(monitor="val_accuracy", factor=0.5, patience=3)`**: This callback also monitors the model's validation accuracy. If the accuracy does not improve for three consecutive epochs (`patience=3`), the learning rate will be reduced by a factor of 0.5. Adjusting the learning rate is a strategy to escape local minima in the loss landscape and potentially improve model performance after a plateau has been reached.

#### **Summary**:

- Task 2 focuses on setting up callbacks to enhance the training process of our sign language recognition model. By integrating these callbacks, we can make the training process more efficient and effective, avoiding unnecessary computations and improving the model's generalization ability on unseen data.

In [None]:
# Callbacks
callbacks = [
    # Stops training when the validation accuracy stops improving
    tf.keras.callbacks.EarlyStopping(monitor="val_accuracy", patience=10, restore_best_weights=True),
    # Reduces the learning rate when the validation accuracy plateaus
    tf.keras.callbacks.ReduceLROnPlateau(monitor="val_accuracy", factor=0.5, patience=3)
]

### Task 3: Define the Model Architecture

In this task, we're going to build the neural network architecture for our sign language recognition model. This involves setting up the input, processing layers, and the output layer to classify the different signs.

#### **Code Explanation**:

1. **Input Layer**:

   - **`input_layer = tf.keras.Input(shape=(None,3*len(IMPORTANT_LANDMARKS)), ragged=True, name="input_layer")`**: The input layer is designed to accept a ragged tensor with a dynamic number of frames per video. Each frame contains a flattened list of the `x`, `y`, and `z` coordinates for the selected landmarks. This flexibility allows the model to handle videos of varying lengths.

2. **Dense Layers with Normalization and Dropout**:

   - **For Loop Over Units**: We iterate over a list of units (512 and 256) to create dense layers. Each dense layer is followed by layer normalization, a ReLU activation function, and dropout. This sequence of operations helps the model learn complex patterns while avoiding overfitting through regularization.
   
     - **Layer Normalization**: Normalizes the output of each dense layer, ensuring that the activation distribution remains consistent across different inputs.
   
     - **ReLU Activation**: Introduces non-linearity to the model, allowing it to learn more complex relationships between the input and output.
   
     - **Dropout**: Randomly sets input units to 0 with a frequency of 10% at each step during training, which helps prevent overfitting by making the network's neurons less sensitive to the weights of other neurons.

3. **LSTM Layer for Sequential Data Processing**:

   - **`sequence = layers.LSTM(250, name="lstm_layer")(sequence)`**: Adds an LSTM (Long Short-Term Memory) layer with 250 units. LSTM layers are particularly suited for modeling sequences (like video frames) because they can maintain information in memory for long periods, making them ideal for understanding the temporal dynamics of sign language.

4. **Output Layer for Classification**:

   - **`output_layer = layers.Dense(NUMBER_OF_SIGNS, activation="softmax", name="output_layer")(sequence)`**: The final layer is a dense layer with a number of units equal to the number of signs we want to recognize (`NUMBER_OF_SIGNS`). It uses the softmax activation function to output a probability distribution over the sign classes, helping the model to classify which sign is being performed.

#### **Model Definition**:

- **`model = models.Model(inputs=input_layer, outputs=output_layer, name="sign_language_model")`**: This line brings together the input and output layers to define the model.

- **`model.summary()`**: Displays a summary of the model, including the layers and their shapes, which is helpful for understanding the architecture and ensuring everything is set up correctly.

#### **Summary**:

- Task 3 constructs the architecture of the sign language recognition model, layer by layer, from input to output. It strategically combines dense, normalization, dropout, and LSTM layers to process and learn from the sequential landmark data effectively. This structure is designed to capture both the spatial relationships between landmarks in individual frames and the temporal relationships across frames, which is required for recognizing sign language gestures.

In [None]:
# Model architecture
input_layer = tf.keras.Input(shape=(None,3*len(IMPORTANT_LANDMARKS)), ragged=True, name="input_layer")

# Applying Dense layers with normalization and dropout for regularization
sequence = input_layer
for units in [512, 256]:
    sequence = layers.Dense(units)(sequence)
    sequence = layers.LayerNormalization()(sequence)
    sequence = layers.Activation("relu")(sequence)
    sequence = layers.Dropout(0.1)(sequence)

# Adding LSTM layer for sequential data processing
sequence = layers.LSTM(250, name="lstm_layer")(sequence)

# Output layer for classification
output_layer = layers.Dense(NUMBER_OF_SIGNS, activation="softmax", name="output_layer")(sequence)

# Define the model
model = models.Model(inputs=input_layer, outputs=output_layer, name="sign_language_model")
model.summary()

### Task 4: Compile the Model

After defining the architecture of our sign language recognition model, the next step is to prepare it for training. This involves specifying the optimizer, loss function, and metrics to evaluate the model's performance.

#### **Code Explanation**:

1. **Learning Rate Schedule**:

   - **`learning_rate_schedule = optimizers.schedules.PiecewiseConstantDecay([10, 15], [1e-3, 1e-4, 1e-5])`**: Before compiling the model, we define a learning rate schedule. This schedule adjusts the learning rate at specified epochs to optimize the training process. After 10 epochs, the learning rate drops from `1e-3` to `1e-4`, and after 15 epochs, it further reduces to `1e-5`. This strategy helps in fine-tuning the model as it converges, improving performance by taking smaller steps towards the minimum loss as training progresses.

2. **Compiling the Model**:

   - **`model.compile()`**: The compile method prepares the model for training. We specify the following parameters:
     - **Optimizer**: `optimizers.Adam(learning_rate=learning_rate_schedule)` uses the Adam optimizer with our predefined learning rate schedule. Adam is a popular choice for deep learning models due to its efficient computation and low memory requirement. It adapts the learning rate for each parameter, helping to find optimal weights faster.
     - **Loss Function**: `loss="sparse_categorical_crossentropy"` is used for models predicting multiple classes. It measures the difference between the distribution of the predicted probabilities and the true distribution, with the goal of minimizing this difference.
     - **Metrics**: `metrics=["accuracy", "sparse_top_k_categorical_accuracy"]` provides two metrics:
       - **Accuracy**: The percentage of correctly predicted labels.
       - **Sparse Top K Categorical Accuracy**: This metric checks if the true label is one of the top `k` predicted labels. It's useful for cases where the model's confidence is split among several classes, providing insight into whether the correct class was close to being predicted.

#### **Summary**:

- Task 4 focuses on compiling the model, a crucial step that sets up the model for efficient training. By defining a learning rate schedule, we ensure the model adapts its learning pace as it learns, optimizing the training process. The selection of the Adam optimizer and appropriate loss function, along with meaningful metrics, prepares our model to learn from the training data accurately and effectively. This setup aims to minimize loss and maximize accuracy, guiding the model toward better understanding and classifying sign language gestures.

In [None]:
# Compile model and train
learning_rate_schedule = optimizers.schedules.PiecewiseConstantDecay([10, 15], [1e-3, 1e-4, 1e-5])
model.compile(optimizer=optimizers.Adam(learning_rate=learning_rate_schedule),
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy", "sparse_top_k_categorical_accuracy"])

### Task 5: Train the Model

The final step in developing our sign language recognition model is to train it with our prepared dataset. This process involves feeding the training data into the model, allowing it to learn and adjust its weights to minimize the loss function and improve its predictions.

#### **Code Explanation**:

1. **Model Training**:

   - **`model.fit(train_ds, validation_data=val_ds, callbacks=callbacks, epochs=100)`**: This line is where the actual training happens. Let's break down the parameters:
     - **`train_ds`**: The training dataset, which includes the features (video frames with landmarks) and labels (signs). The model learns from this data.
     - **`validation_data=val_ds`**: The validation dataset is used to evaluate the model's performance on data it hasn't seen during training. This helps monitor for overfitting and ensure that the model generalizes well.
     - **`callbacks=callbacks`**: Includes the callbacks we defined earlier (`EarlyStopping` and `ReduceLROnPlateau`). These are used to optimize the training process by stopping early if the model isn't improving and adjusting the learning rate based on the model's performance on the validation set.
     - **`epochs=100`**: The number of times the entire training dataset is passed through the model. However, due to our early stopping callback, training may stop before reaching 100 epochs if no improvement is observed in the validation accuracy.

#### **Summary**:

- Task 5 is all about training the model to recognize sign language from video data. The training process is carefully monitored using validation data and optimized with callbacks to prevent overfitting and ensure efficient learning. After training, the model can be evaluated on new, unseen data to test its generalization capabilities and fine-tuned as necessary for improved performance.

In [None]:
model.fit(train_ds, validation_data=val_ds, callbacks=callbacks, epochs=100)

# **SOLUTION**

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers

#Constants
IMPORTANT_LANDMARKS = [0, 9, 11, 13, 14, 17, 117, 118, 119, 199, 346, 347, 348] + list(range(468, 543))
DATASET_PATH = "/content/ASLDataset"
NUMBER_OF_SIGNS = 250

# Preprocess the data: select landmarks, replace missing values, and prepare for model
def preprocess_data(data, labels):
    # Select important landmarks
    processed_data = tf.gather(data, IMPORTANT_LANDMARKS, axis=2)
    # Replace missing values with 0
    processed_data = tf.where(tf.math.is_nan(processed_data), tf.zeros_like(processed_data), processed_data)
    # Reshape and return data
    return tf.concat([processed_data[..., i] for i in range(3)], -1), labels

# Load the dataset and apply preprocessing
dataset = tf.data.Dataset.load(DATASET_PATH)
dataset = dataset.map(preprocess_data)

# Split the dataset into training and testing sets
val_ds = dataset.take(1).cache().prefetch(tf.data.AUTOTUNE)
train_ds = dataset.skip(1).cache().shuffle(20).prefetch(tf.data.AUTOTUNE)

# Callbacks
callbacks = [
    # Stops training when the validation accuracy stops improving
    tf.keras.callbacks.EarlyStopping(monitor="val_accuracy", patience=10, restore_best_weights=True),
    # Reduces the learning rate when the validation accuracy plateaus
    tf.keras.callbacks.ReduceLROnPlateau(monitor="val_accuracy", factor=0.5, patience=3)
]

# Model architecture
input_layer = tf.keras.Input(shape=(None,3*len(IMPORTANT_LANDMARKS)), ragged=True, name="input_layer")

# Applying Dense layers with normalization and dropout for regularization
sequence = input_layer
for units in [512, 256]:
    sequence = layers.Dense(units)(sequence)
    sequence = layers.LayerNormalization()(sequence)
    sequence = layers.Activation("relu")(sequence)
    sequence = layers.Dropout(0.1)(sequence)

# Adding LSTM layer for sequential data processing
sequence = layers.LSTM(250, name="lstm_layer")(sequence)

# Output layer for classification
output_layer = layers.Dense(NUMBER_OF_SIGNS, activation="softmax", name="output_layer")(sequence)

# Define the model
model = models.Model(inputs=input_layer, outputs=output_layer, name="sign_language_model")
model.summary()

# Compile model and train
learning_rate_schedule = optimizers.schedules.PiecewiseConstantDecay([10, 15], [1e-3, 1e-4, 1e-5])
model.compile(optimizer=optimizers.Adam(learning_rate=learning_rate_schedule),
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy", "sparse_top_k_categorical_accuracy"])

model.fit(train_ds, validation_data=val_ds, callbacks=callbacks, epochs=100)