# Volleyball Reception Classification Report

This notebook documents the complete development process of a binary classifier to distinguish between **"upper"** and **"lower"** volleyball receptions using skeleton keypoints extracted via a YOLO pose estimation model. The workflow covers data curation, cleaning, preprocessing, model training, evaluation, data augmentation, and final conclusions.

## 1. Introduction

In recent years, **sports analytics** has grown into a powerful tool for performance optimization, tactical decision-making, and even injury prevention. Within this domain, **computer vision and machine learning** are playing increasingly important roles — offering insights that were previously unachievable at scale or in real time.

This project is a **case study** in building a lightweight yet effective model for classifying **volleyball reception types** ("upper" vs. "lower") using only **skeleton keypoint data**, rather than full images or video. The use of **YOLO11x pose estimation** enables us to extract 2D joint coordinates for each player from video frames — offering a **computationally efficient** alternative to using full-resolution images.

The classification task revolves around analyzing the **relative body posture** of the player performing the reception:
- In an **"upper" reception**, arms are typically raised above or level with the chest.
- In a **"lower" reception**, arms are extended downward or outward near the legs.

By analyzing only this **structured pose data**, I demonstrate that it is possible to build a binary classifier using a simple **Multilayer Perceptron (MLP)** architecture.

**Project Objectives:**
- Curate and clean a dataset of YOLO-based skeleton keypoints for volleyball receptions.
- Build and evaluate a classifier to distinguish between "upper" and "lower" receptions.
- Augment the dataset in a geometrically valid way to improve model generalization.
- Document the end-to-end pipeline to demonstrate  a reproducible ML workflow.

**Upper Reception:**

![Upper Reception](resource/upper_22.png)

**Lower Reception:**

![Lower Reception](resource/lower_6.png)

## 2. Data Curation and Cleaning

### Data Source and Format

- The dataset is provided as a CSV file where each row is formatted as follows:
  ```
  kp1_x, kp1_y, kp2_x, kp2_y, ..., kpN_x, kpN_y, label
  ```
  where `label` is either "upper" or "lower".

### Cleaning Process

- Keypoint predictions generated by the pose estimation model were manually reviewed using overlay images.
- Samples with clearly incorrect skeletons or where keypoints were extracted from the wrong person were removed.
- Utility scripts were used to organize and curate the images and CSV files.

**Sample of CSV Keypoint dataset:** ![Keypoints sample](resource/csvExample.png)

**Wrong predictions of YOLO Keypoint Model:**

![YOLO keypoint sample](resource/upper_25.png)

## 3. Data Preprocessing

The CSV file is loaded, and features (keypoints) and labels are separated. The labels are mapped to binary values ("upper" → 1, "lower" → 0), and the features are scaled using StandardScaler. This helps reduce the impact of absolute positions, allowing the model to focus on relative body posture.

Below is the code used for data loading and preprocessing:

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

def load_and_preprocess_data(csv_path):
    """
    Load dataset from a CSV file and preprocess the data.
    
    Expected CSV format per row:
        kp1_x, kp1_y, kp2_x, kp2_y, ..., kpN_x, kpN_y, label
    """
    df = pd.read_csv(csv_path)
    X = df.iloc[:, :-1].values.astype(np.float32)
    y = df.iloc[:, -1].values
    label_map = {"upper": 1, "lower": 0}
    y = np.array([label_map[label.lower()] for label in y])
    
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    return X_scaled, y, scaler

# Example usage:
csv_path = "data/original_dataset.csv"  # Update this path as needed
X_scaled, y, scaler = load_and_preprocess_data(csv_path)
print("Data shape:", X_scaled.shape)

## 4. Model Building and Training

We use a simple Multilayer Perceptron (MLP) with two hidden layers and dropout regularization. The decision to use an MLP was made because the input data (flattened keypoints) is low-dimensional and structured. If raw images were used (high-dimensional data), a CNN architecture would be more appropriate. The network architecture is as follows (Dropout is added to reduce the risk of overfitting):

- **Dense(64, ReLU)** → **Dropout(0.3)**
- **Dense(32, ReLU)** → **Dropout(0.3)**
- **Dense(1, Sigmoid)**

The dataset is split into training and test sets (80/20 split), with a validation split taken from the training set. EarlyStopping and ModelCheckpoint callbacks are used to prevent overfitting.

Below is the code for model building and training:

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from sklearn.model_selection import train_test_split

def build_model(input_dim):
    model = Sequential([
        Dense(64, activation='relu', input_shape=(input_dim,)),
        Dropout(0.3),
        Dense(32, activation='relu'),
        Dropout(0.3),
        Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer=Adam(learning_rate=0.001),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    return model

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42, shuffle=True)
print("Training set shape:", X_train.shape, "Test set shape:", X_test.shape)

input_dim = X_train.shape[1]
model = build_model(input_dim)
model.summary()

early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
checkpoint = ModelCheckpoint("models/volleyball_receive_classifier.h5", monitor='val_loss', save_best_only=True)

history = model.fit(X_train, y_train, epochs=20, batch_size=24,
                    validation_split=0.1, shuffle=True, callbacks=[early_stop, checkpoint])

In [None]:
# Plot training and validation curves
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training vs. Validation Loss")
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.title("Training vs. Validation Accuracy")
plt.legend()

plt.tight_layout()
plt.show()

## 5. Model Evaluation

I evaluate the model on the previously held-out test set. Below, we compute overall loss and accuracy, and also generate a detailed classification report and a confusion matrix.

### Evaluation Visualizations

Below are the key evaluation graphs:

**Training and Validation Curves:**

![Training & Validation Graphs](resource/TrainingValidationGraphs.png)

**Confusion Matrix:**

![Confusion Matrix](resource/ConfusionMatrix.png)

**Classification Report:**

![Classification Report](resource/classificationReport.png)

In [None]:
# Evaluate the model on the test set
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss:.4f}, Test Accuracy: {accuracy:.4f}")

# Detailed classification report and confusion matrix
from sklearn.metrics import classification_report, confusion_matrix
y_pred = (model.predict(X_test) > 0.5).astype(int)
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=["lower", "upper"]))

import seaborn as sns
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=["lower", "upper"], yticklabels=["lower", "upper"])
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()

## 6. Data Augmentation

Due to the small size of our initial dataset, we apply data augmentation to quadruple its size. The following augmentations are used:

- **Global Noise:** Adds small Gaussian noise to all keypoints.
- **Arm Shift:** Shifts keypoints corresponding to the arms horizontally.
- **Global Scaling:** Scales the entire skeleton about its centroid.

Below is the code for the augmentation functions and an example of how to augment the dataset.

In [None]:
def augment_global_noise(keypoints, noise_std=2.0):
    noise = np.random.normal(0, noise_std, keypoints.shape)
    return keypoints + noise

def augment_arm_shift(keypoints, shift_range=5.0):
    kp_aug = keypoints.copy()
    left_arm_idx = [5, 7, 9]
    right_arm_idx = [6, 8, 10]
    left_shift = np.random.uniform(-shift_range, shift_range)
    right_shift = np.random.uniform(-shift_range, shift_range)
    for idx in left_arm_idx:
        if idx < kp_aug.shape[0]:
            kp_aug[idx, 0] += left_shift
    for idx in right_arm_idx:
        if idx < kp_aug.shape[0]:
            kp_aug[idx, 0] += right_shift
    return kp_aug

def augment_global_scaling(keypoints, scale_range=(0.95, 1.05)):
    kp_aug = keypoints.copy()
    centroid = np.mean(kp_aug, axis=0)
    scale_factor = np.random.uniform(scale_range[0], scale_range[1])
    kp_aug = (kp_aug - centroid) * scale_factor + centroid
    return kp_aug

def augment_sample(flat_sample):
    keypoints = flat_sample.reshape(-1, 2)
    augmented_samples = []
    augmented_samples.append(augment_global_noise(keypoints).flatten())
    augmented_samples.append(augment_arm_shift(keypoints).flatten())
    augmented_samples.append(augment_global_scaling(keypoints).flatten())
    return augmented_samples

def augment_dataset(input_csv, output_csv):
    df = pd.read_csv(input_csv)
    feature_cols = df.columns[:-1]
    label_col = df.columns[-1]
    augmented_rows = []
    
    for idx, row in df.iterrows():
        flat_sample = row[feature_cols].values.astype(np.float32)
        label = row[label_col]
        augmented_rows.append(np.concatenate([flat_sample, [label]]))
        for aug_sample in augment_sample(flat_sample):
            augmented_rows.append(np.concatenate([aug_sample, [label]]))
    
    augmented_rows = np.array(augmented_rows)
    num_keypoints = augmented_rows.shape[1] - 1
    num_points = num_keypoints // 2
    columns = []
    for i in range(num_points):
        columns.append(f"kp{i+1}_x")
        columns.append(f"kp{i+1}_y")
    columns.append("label")
    
    df_aug = pd.DataFrame(augmented_rows, columns=columns)
    df_aug.to_csv(output_csv, index=False)
    print(f"Augmented dataset saved to {output_csv}")

# Example usage:
# augment_dataset("data/original_dataset.csv", "data/augmented_dataset.csv")

## 7. Inference

A separate inference pipeline (provided in `inference_receive_classifier.py`) is used to predict on new keypoint data. This notebook focuses on the training, evaluation, and augmentation process. Refer to the inference script for details on how new data is processed and classified.

## 8. Conclusions and Recommendations

### Model Performance

- The model achieved high overall accuracy with strong precision and recall on both classes, although there is concern that it might still have overfitted to the limited training data. Further evaluation on a larger or more diverse external dataset is recommended.
- Training and validation curves indicate that early stopping and dropout helped manage overfitting. However, given the small dataset size, additional data would likely improve generalization. We tried to mitigate the limited quantity by including a wide variety of poses for both reception types and by applying diverse augmentation techniques.

### Data Augmentation

- Augmenting the data (via noise, arm shift, and scaling) effectively quadrupled the dataset, enhancing model robustness despite the limited original data.

### Future Work

- Validate the model on an external evaluation dataset to confirm generalization.

Additional Considerations:
- This model is presented as a case study to show that it is relatively simple to perform classification using keypoint data. In practice, one could extend the approach to further distinguish between different types of lower receptions (e.g., whether the ball is received perfectly in front of the body or shifted to the left or right), which could generate further insights.
- The model is highly reliant on the quality of the keypoint data. Our results were affected by the fact that some original images were blurry or not in full HD resolution, which in turn impacted the accuracy of the keypoint predictions.
- While YOLO was used for pose estimation in this project, alternative models such as ViTPose or other state-of-the-art methods might offer improved performance. Moreover, incorporating temporal tracking of keypoints over multiple frames (with interpolation to handle mispredictions) could further enhance real-world applicability.

Overall, this project demonstrates that a relatively simple MLP can effectively classify volleyball receptions using keypoint data, but further validation and exploration of alternative pose estimation techniques are recommended to improve robustness and generalization.