# Hand Gesture Classification Using Deep Learning

## Problem Description

In ML1, I worked on a multi-class hand gesture classification problem using classical machine learning models such as SVM and Random Forest.

Each sample consists of 21 hand landmarks extracted from MediaPipe.  
Each landmark has (x, y, z) coordinates, so the total number of input features is:

21 × 3 = 63 features.

The goal is to predict the gesture label (e.g., fist, call, dislike, etc.).

This is a supervised multi-class classification problem where:

X ∈ R^63  
y ∈ {0, 1, ..., K-1}

In this notebook, I reimplement the same problem using a Deep Neural Network.

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

## Data Preprocessing

Before training a neural network, preprocessing is very important.

1) Label Encoding  
The neural network requires numerical class labels, so I encode the gesture names into integer values.

2) Feature Standardization  
I apply StandardScaler to normalize the features.

This is important from an optimization perspective. Neural networks are trained using gradient descent. If features are on different scales, the loss surface becomes elongated, which leads to zig-zag updates and slower convergence.

By standardizing the features, the gradients become more stable and training becomes faster and more reliable.

In [None]:
# Load dataset
df = pd.read_csv("/Users/ahmedtarek/Developer/Python/DL/hand_landmarks_data.csv")

# Separate features and target
X = df.drop("label", axis=1).values
y = df["label"].values

# Encode labels
le = LabelEncoder()
y_encoded = le.fit_transform(y)

# Standardization (VERY important for NN)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

## Train / Validation / Test Split

The dataset is split into:

- Training set → used to update model parameters.
- Validation set → used to monitor generalization and apply early stopping.
- Test set → used only once at the end for final evaluation.

This separation ensures that the test set remains completely unseen during training.  
It allows us to measure the true generalization performance of the model.

In [None]:
# Train-validation-test split
X_train, X_temp, y_train, y_temp = train_test_split(
    X_scaled, y_encoded, test_size=0.3, random_state=42, stratify=y_encoded
)

X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp
)

# Convert to PyTorch tensors
X_train = torch.FloatTensor(X_train)
y_train = torch.LongTensor(y_train)

X_val = torch.FloatTensor(X_val)
y_val = torch.LongTensor(y_val)

X_test = torch.FloatTensor(X_test)
y_test = torch.LongTensor(y_test)

## Neural Network Architecture

I implement a fully connected neural network (Multi-Layer Perceptron).

Structure:
- Input layer: 63 neurons (one per feature)
- Hidden Layer 1: 128 neurons + ReLU
- Hidden Layer 2: 64 neurons + ReLU
- Output layer: K neurons (number of gesture classes)

Why ReLU?

ReLU(x) = max(0, x)

ReLU helps reduce the vanishing gradient problem and allows deeper models to train more efficiently compared to sigmoid or tanh.

The model learns nonlinear transformations of the input features, which allows it to capture complex decision boundaries compared to classical linear models.

In [25]:
class GestureNN(nn.Module):
    def __init__(self, input_size, num_classes):
        super(GestureNN, self).__init__()

        self.model = nn.Sequential(
            nn.Linear(input_size, 128),
            nn.ReLU(),
            nn.Dropout(0.3),

            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Dropout(0.3),

            nn.Linear(64, num_classes)
        )

    def forward(self, x):
        return self.model(x)

model = GestureNN(input_size=63, num_classes=len(le.classes_))

## Loss Function and Optimization

For multi-class classification, I use CrossEntropyLoss.

Cross-entropy measures the difference between predicted class probabilities and the true class label.

Mathematically:

L = - Σ y_i log(ŷ_i)

This encourages the model to assign high probability to the correct class.

For optimization, I use Adam.

Adam combines:
- Momentum (to accelerate convergence)
- Adaptive learning rates (to handle different parameter scales)

This makes training more stable and faster compared to standard gradient descent.

In [26]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

## Training Process with Early Stopping

During training, each batch of data goes through the following steps:

1. **Forward pass** – compute predictions.  
2. **Compute loss** – measure the difference between predictions and true labels.  
3. **Backward pass** – compute gradients using backpropagation.  
4. **Update weights** – adjust model parameters using the Adam optimizer according to the gradient descent rule:

\[
\theta = \theta - \alpha \nabla L(\theta)
\]

Where:  
- **θ** represents the model parameters.  
- **α** is the learning rate.  
- **∇L(θ)** is the gradient of the loss with respect to the parameters.  

This process is repeated for multiple epochs until the model converges.

### Early Stopping (Regularization)

To prevent overfitting, we implement **early stopping** based on validation loss:

- Training loss usually decreases continuously.  
- Validation loss decreases initially but may start increasing once overfitting begins.  

We monitor the validation loss after each epoch. If it does not improve for a fixed number of epochs (called **patience**), training stops, and the best model (with the lowest validation loss) is restored.  

By selecting the parameters that minimize validation loss rather than just training loss, early stopping acts as an **implicit regularization method**, helping the model generalize better to unseen data.

In [27]:
train_loader = DataLoader(TensorDataset(X_train, y_train), batch_size=32, shuffle=True)
val_loader = DataLoader(TensorDataset(X_val, y_val), batch_size=32)

num_epochs = 100
patience = 10  # number of epochs to wait
best_val_loss = float('inf')
counter = 0

for epoch in range(num_epochs):
    model.train()
    train_loss = 0

    for xb, yb in train_loader:
        optimizer.zero_grad()
        outputs = model(xb)
        loss = criterion(outputs, yb)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()

    train_loss /= len(train_loader)

    # ----- Validation -----
    model.eval()
    val_loss = 0
    correct = 0
    total = 0

    with torch.no_grad():
        for xb, yb in val_loader:
            outputs = model(xb)
            loss = criterion(outputs, yb)
            val_loss += loss.item()

            _, predicted = torch.max(outputs, 1)
            total += yb.size(0)
            correct += (predicted == yb).sum().item()

    val_loss /= len(val_loader)
    val_acc = correct / total

    print(f"Epoch [{epoch+1}/{num_epochs}] "
          f"Train Loss: {train_loss:.4f} "
          f"Val Loss: {val_loss:.4f} "
          f"Val Acc: {val_acc:.4f}")

    # ----- Early Stopping Logic -----
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        counter = 0
        torch.save(model.state_dict(), "best_model.pth")  # Save best model
    else:
        counter += 1
        print(f"Early stopping counter: {counter}/{patience}")

        if counter >= patience:
            print("Early stopping triggered.")
            break

Epoch [1/100] Train Loss: 1.5310 Val Loss: 0.6980 Val Acc: 0.7349
Epoch [2/100] Train Loss: 0.7206 Val Loss: 0.5262 Val Acc: 0.7967
Epoch [3/100] Train Loss: 0.6144 Val Loss: 0.4609 Val Acc: 0.8154
Epoch [4/100] Train Loss: 0.5490 Val Loss: 0.4082 Val Acc: 0.8364
Epoch [5/100] Train Loss: 0.4944 Val Loss: 0.3616 Val Acc: 0.8561
Epoch [6/100] Train Loss: 0.4492 Val Loss: 0.3260 Val Acc: 0.8699
Epoch [7/100] Train Loss: 0.4068 Val Loss: 0.2950 Val Acc: 0.8826
Epoch [8/100] Train Loss: 0.3804 Val Loss: 0.2816 Val Acc: 0.8876
Epoch [9/100] Train Loss: 0.3596 Val Loss: 0.2704 Val Acc: 0.8826
Epoch [10/100] Train Loss: 0.3430 Val Loss: 0.2643 Val Acc: 0.8907
Epoch [11/100] Train Loss: 0.3278 Val Loss: 0.2616 Val Acc: 0.8860
Epoch [12/100] Train Loss: 0.3263 Val Loss: 0.2372 Val Acc: 0.9031
Epoch [13/100] Train Loss: 0.3193 Val Loss: 0.2443 Val Acc: 0.9018
Early stopping counter: 1/10
Epoch [14/100] Train Loss: 0.3037 Val Loss: 0.2303 Val Acc: 0.9078
Epoch [15/100] Train Loss: 0.3003 Val Loss

## Final Test Evaluation

After training finishes, I load the best saved model and evaluate it on the test set.

The test set was never used during training or validation.  
Therefore, it provides an unbiased estimate of the model’s generalization performance.

Evaluation metrics include:
- Accuracy
- Precision
- Recall
- F1-score

Accuracy is calculated as:

Accuracy = Correct Predictions / Total Samples

These metrics give a complete view of classification performance.

In [None]:
model.load_state_dict(torch.load("best_model.pth"))
model.eval()

test_loader = DataLoader(TensorDataset(X_test, y_test), batch_size=32)

all_preds = []
all_labels = []

with torch.no_grad():
    for xb, yb in test_loader:
        outputs = model(xb)
        _, predicted = torch.max(outputs, 1)

        all_preds.extend(predicted.numpy())
        all_labels.extend(yb.numpy())

# Convert to numpy arrays
all_preds = np.array(all_preds)
all_labels = np.array(all_labels)

# Accuracy
test_accuracy = (all_preds == all_labels).mean()
print("Test Accuracy:", test_accuracy)

Test Accuracy: 0.9800103842159917


In [30]:
from sklearn.metrics import classification_report

print(classification_report(
    all_labels,
    all_preds,
    target_names=le.classes_
))

                 precision    recall  f1-score   support

           call       0.99      0.98      0.98       226
        dislike       0.99      1.00      0.99       194
           fist       1.00      0.99      0.99       141
           four       0.98      1.00      0.99       245
           like       0.98      0.99      0.98       216
           mute       0.94      0.94      0.94       163
             ok       1.00      1.00      1.00       239
            one       0.97      0.95      0.96       189
           palm       0.96      0.99      0.98       248
          peace       0.99      0.98      0.98       216
 peace_inverted       0.99      0.96      0.98       225
           rock       1.00      0.99      0.99       219
           stop       0.95      0.96      0.95       223
  stop_inverted       0.99      0.98      0.99       235
          three       0.99      0.96      0.98       219
         three2       1.00      1.00      1.00       248
         two_up       0.97    

## Conclusion

In this notebook, I reimplemented a classical supervised classification problem using a Deep Neural Network.

Compared to classical ML models:
- The neural network learns nonlinear feature representations.
- It uses backpropagation for optimization.
- It can model more complex decision boundaries.

The full pipeline included:
- Preprocessing
- Proper data splitting
- Model training
- Early stopping
- Final unbiased evaluation

This demonstrates how a classical ML problem can be effectively solved using a Deep Learning approach.