# Kinect XY to Z Prediction Experiment Pipeline
In this experiment, we predict the Kinect sensor's Z-coordinate for 13 body joints using only their X and Y coordinates. We use a sliding window of consecutive frames to add temporal context. Our approach predicts the 13 Z-values (one per joint) for the last frame in each window and compares different neural network types, including dense ,cnn,gru, to see which one works best.

1. Data Preparation:
Load Kinect CSV files and extract the X and Y coordinates as inputs, with the Z coordinates as targets. Create training samples using a sliding window, making sure the window stays within one sequence (each CSV file represents one sequence).

2. Model Building:
Build TensorFlow/Keras models with hidden layers activated by ReLU and a linear output layer to predict raw Z-values. Use the Adam optimizer with Mean Squared Error as the loss function, and track Mean Absolute Error as an extra performance metric.

3. Training & Evaluation:
Use 10-fold cross-validation by splitting sequences into folds. Apply early stopping to avoid overfitting and use model checkpointing to save the best weights. Perform a grid search over key hyperparameters (window size, learning rate, architecture type, number of layers, and units per layer) and log the performance results to a CSV file for analysis.

### Import Libraries and Configure GPU

We import the necessary libraries for data handling (pandas, NumPy), data splitting and scaling (scikit-learn), and building neural networks (TensorFlow/Keras). We also enable GPU memory growth to avoid allocating all GPU memory at once.

In [3]:
# Import required libraries
import os
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from sklearn.model_selection import KFold, train_test_split
from sklearn.preprocessing import StandardScaler

# Configure TensorFlow to use GPU efficiently (if available)
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    try:
        tf.config.experimental.set_memory_growth(gpus[0], True)
        print("GPU found. Enabled memory growth on", gpus[0].name)
    except Exception as e:
        print("Error enabling GPU memory growth:", e)
else:
    print("No GPU found. Using CPU for training.")
    print("TensorFlow version:", tf.__version__)
    print("GPUs:", tf.config.list_physical_devices('GPU'))


No GPU found. Using CPU for training.
TensorFlow version: 2.19.0
GPUs: []


### Data Loading and Preprocessing

Each Kinect CSV file holds one sequence of recorded joint coordinates. The code loads each file into a pandas DataFrame, drops the frame index column, and then separates the input features (columns ending in _x or _y) from the target outputs (columns ending in _z). The X inputs (26 features from 13 joints) and the Z targets (13 values) are converted to NumPy arrays and collected into lists. Later, we apply feature scaling to each training fold during cross-validation to standardize the inputs without risking data leakage.

In [None]:
# Directory containing Kinect CSV files
data_dir = "data/kinect_sequences"  # replace with your actual path
csv_files = sorted([f for f in os.listdir(data_dir) if f.endswith(".csv")])

# Lists to hold sequences of features (X) and targets (Z)
sequences_X = []
sequences_Z = []

for file in csv_files:
    file_path = os.path.join(data_dir, file)
    try:
        # Load the CSV into a pandas DataFrame
        df = pd.read_csv(file_path)
        print(f"Successfully loaded {file}")
    except Exception as e:
        print(f"Error loading {file}: {e}")
        continue
    
    # Drop frame number column if present
    if 'FrameNo' in df.columns or 'frame' in df.columns:
        df = df.drop(columns=['FrameNo'], errors='ignore')
        df = df.drop(columns=['frame'], errors='ignore')
    
    # Strip any whitespace from column names (if needed)
    df.columns = df.columns.str.strip()
    
    # Separate feature and target columns
    feature_cols = [col for col in df.columns if col.endswith('_x') or col.endswith('_y')]
    target_cols = [col for col in df.columns if col.endswith('_z')]
    
    X_values = df[feature_cols].to_numpy()
    Z_values = df[target_cols].to_numpy()
    
    sequences_X.append(X_values)
    sequences_Z.append(Z_values)

print(f"Loaded {len(sequences_X)} sequences. Example sequence shape: {sequences_X[0].shape}")


### Sliding Window Data Generation

We generate training samples by sliding a fixed-length window over each sequence, ensuring that windows do not cross sequence boundaries. Each sample uses the X and Y coordinates from W frames (shape W×26) as input and the Z coordinates from the last frame (shape 13) as the target. The function below takes a sequence of inputs (X_seq) and targets (Z_seq) and returns all possible windowed samples as NumPy arrays. A quick test at the end shows how many samples are generated and their shapes.

In [None]:
def create_windows_from_sequence(X_seq, Z_seq, window_size):
    """
    Given a single sequence of features X_seq (shape: num_frames x 26) and 
    targets Z_seq (shape: num_frames x 13), generate all sliding window samples of length window_size.
    Returns:
      X_windows: array of shape (num_samples, window_size, 26)
      Y_windows: array of shape (num_samples, 13) corresponding to Z of last frame in each window
    """
    X_windows = []
    Y_windows = []
    num_frames = X_seq.shape[0]
    if num_frames < window_size:
        # Not enough frames for even one window
        return np.array(X_windows), np.array(Y_windows)
    for start in range(0, num_frames - window_size + 1):
        end = start + window_size
        # Stack frames [start, ..., end-1] as one window
        X_w = X_seq[start:end]              # shape (window_size, 26)
        Y_w = Z_seq[end-1]                 # shape (13,) - Z of last frame
        X_windows.append(X_w)
        Y_windows.append(Y_w)
    # Convert to numpy arrays
    X_windows = np.array(X_windows)
    Y_windows = np.array(Y_windows)
    return X_windows, Y_windows

# Quick test on the first sequence (using a small window for demonstration)
test_w = 5
X_test_win, Y_test_win = create_windows_from_sequence(sequences_X[0], sequences_Z[0], window_size=test_w)
print(f"Created {X_test_win.shape[0]} window samples from one sequence (window={test_w}).")
print("Sample window input shape:", X_test_win[0].shape, "/ Sample window output shape:", Y_test_win[0].shape)


### Model Architecture Definition

We define a function to build a Keras model based on a given architecture type and set of hyperparameters. This function lets us create models dynamically for our grid search. The parameters include:

* arch: The architecture type ('dense', 'conv1d', 'lstm', 'hybrid', or 'cnn+lstm').

* num_layers: Number of hidden layers (excluding the output layer). This value determines how many layers to add for each architecture:


* units: Number of units/filters in each hidden layer. We use the same size for all layers.

* learning_rate: The optimizer’s learning rate.

All models have an input shape of (window_size, 26) and output 13 values. Hidden layers use ReLU activation with He initialization, and the output layer uses linear activation for regression. The models are compiled with the Adam optimizer, Mean Squared Error loss, and we track Mean Absolute Error as a metric.


In [None]:
def create_model(window_size, arch, num_layers, units, learning_rate):
    """Constructs and compiles a Keras model given the architecture and hyperparameters."""
    model = models.Sequential()
    # Define input shape (window_size timesteps, 26 features per timestep)
    model.add(layers.Input(shape=(window_size, 26)))
    
    if arch == 'dense':
        # Flatten time dimension and use Dense layers
        model.add(layers.Flatten())  # shape becomes (window_size*26,)
        for _ in range(num_layers):
            model.add(layers.Dense(units, activation='relu', kernel_initializer='he_uniform'))
        # Output layer
        model.add(layers.Dense(13, activation='linear'))
    
    elif arch == 'conv1d':
        # Conv1D layers across time dimension
        for _ in range(num_layers):
            model.add(layers.Conv1D(filters=units, kernel_size=3, padding='same',
                                     activation='relu', kernel_initializer='he_uniform'))
        model.add(layers.Flatten())
        model.add(layers.Dense(13, activation='linear'))
    
    elif arch == 'lstm':
        # Stacked LSTM layers
        for i in range(num_layers):
            # If not the last LSTM layer, return sequences to feed next LSTM
            return_seq = (i < num_layers - 1)
            model.add(layers.LSTM(units, return_sequences=return_seq))
        model.add(layers.Dense(13, activation='linear'))
    
    elif arch == 'hybrid':
        # Combination of Conv1D and Dense layers
        conv_count = num_layers // 2
        dense_count = num_layers - conv_count
        # Conv layers first
        for _ in range(conv_count):
            model.add(layers.Conv1D(filters=units, kernel_size=3, padding='same',
                                     activation='relu', kernel_initializer='he_uniform'))
        if conv_count > 0:
            model.add(layers.Flatten())
        # Dense layers after conv
        for _ in range(dense_count):
            model.add(layers.Dense(units, activation='relu', kernel_initializer='he_uniform'))
        model.add(layers.Dense(13, activation='linear'))
    
    elif arch == 'cnn+lstm':
        # Conv layers followed by a single LSTM layer
        conv_layers = max(1, num_layers - 1)  # ensure at least 1 conv
        for _ in range(conv_layers):
            model.add(layers.Conv1D(filters=units, kernel_size=3, padding='same',
                                     activation='relu', kernel_initializer='he_uniform'))
        # Follow with one LSTM layer
        model.add(layers.LSTM(units, return_sequences=False))
        model.add(layers.Dense(13, activation='linear'))
    
    else:
        raise ValueError(f"Unknown architecture type: {arch}")
    
    # Compile the model with Adam optimizer and MSE loss
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
                  loss='mse', metrics=['mae'])
    return model

# Test the model creation function for one example configuration
test_model = create_model(window_size=5, arch='conv1d', num_layers=3, units=64, learning_rate=0.001)
print(test_model.summary())


### Training with Cross-Validation and Hyperparameter Search

With data loading, preprocessing, and model definition in place, we now set up a grid search over hyperparameters while performing 10-fold cross-validation for each combination. The hyperparameters we explore are:

* Window sizes: [3, 5, 7, 9, 11, 13, 15, 17, 20]

* Learning rates: [0.5, 0.01, 0.005, 0.001, 0.0005, 0.0001]

* Architectures: ['dense', 'conv1d', 'hybrid', 'lstm', 'cnn+lstm']

* Number of layers: [2, 3, 4, 5, 6, 8, 10, 12] (this refers to the count of hidden layers, as defined per architecture)

* Units per layer: [64, 128, 256, 512]

We use a 10-fold cross-validation strategy where, for each fold:

1. We split the sequences into training and testing sets using the KFold splitter (ensuring randomness by shuffling the sequence indices).

2. We combine all frames from the training sequences to fit a StandardScaler, which is then used to transform both training and test features. This scaling is done after splitting to prevent data leakage.

3. We generate sliding window samples on the scaled features for training and testing.

4. We further split a portion (10%) of the training windows for validation to monitor early stopping.

5. A new model is created using `create_model` with the current hyperparameters.

6. The model is trained using EarlyStopping (monitoring 'val_loss' with a patience of 5 epochs and restoring the best weights) and ModelCheckpoint (saving the best weights to a file).

7. After training, the model is evaluated on the test data, and the Mean Squared Error (MSE) and Mean Absolute Error (MAE) are recorded.

8. We clear the model and session from memory before proceeding to the next fold.

After all folds are complete for a given hyperparameter combination, we calculate the average MSE and MAE, record the training time, and log all the results (including each fold's MSE, the averages, and hyperparameter values) into a CSV file. This grid search is computationally heavy but is manageable with GPU acceleration and early stopping to limit unnecessary training epochs.

In [None]:
# Define hyperparameter grid
window_sizes = [3, 5, 7, 9, 11, 13, 15, 17, 20]
learning_rates = [0.5, 0.01, 0.005, 0.001, 0.0005, 0.0001]
architectures = ['dense', 'conv1d', 'hybrid', 'lstm', 'cnn+lstm']
num_layers_list = [2, 3, 4, 5, 6, 8, 10, 12]
units_list = [64, 128, 256, 512]
epochs_per_fold = 50

# Prepare K-fold splitter (we will shuffle the sequence indices for randomness)
kf = KFold(n_splits=10, shuffle=True, random_state=42)

# CSV file to log results
results_file = "experiment_results.csv"
# Write CSV header
results_header = (["architecture", "window_size", "num_layers", "units", "learning_rate"] +
                  [f"fold{i+1}_mse" for i in range(10)] +
                  ["avg_mse", "avg_mae", "training_time_sec"])
with open(results_file, 'w') as f:
    f.write(",".join(results_header) + "\n")

# Begin grid search
import time
experiment_count = 0
total_experiments = (len(window_sizes) * len(learning_rates) * 
                     len(architectures) * len(num_layers_list) * len(units_list))
print(f"Total experiments to run: {total_experiments}")
for arch in architectures:
    for window_size in window_sizes:
        for num_layers in num_layers_list:
            for units in units_list:
                for lr in learning_rates:
                    experiment_count += 1
                    config_description = (f"arch={arch}, window={window_size}, layers={num_layers}, "
                                           f"units={units}, lr={lr}")
                    print(f"\n=== Experiment {experiment_count}/{total_experiments}: {config_description} ===")
                    start_time = time.time()
                    
                    fold_mse_scores = []
                    fold_mae_scores = []
                    
                    # Perform 10-fold cross-validation for this config
                    fold_index = 1
                    for train_idx, test_idx in kf.split(sequences_X):
                        # Prepare training and testing data for this fold
                        # Combine training sequences' frames to fit scaler
                        train_frames = []
                        for seq_idx in train_idx:
                            train_frames.append(sequences_X[seq_idx])
                        train_frames = np.vstack(train_frames)
                        # Fit scaler on all training frames (for X features)
                        scaler = StandardScaler().fit(train_frames)
                        
                        # Generate windowed data for training
                        X_train_all = []
                        Y_train_all = []
                        for seq_idx in train_idx:
                            # Scale the entire sequence's features
                            X_seq = scaler.transform(sequences_X[seq_idx])
                            Z_seq = sequences_Z[seq_idx]  # target can remain unscaled
                            X_wins, Y_wins = create_windows_from_sequence(X_seq, Z_seq, window_size)
                            if X_wins.size == 0:
                                continue  # sequence too short for this window size
                            X_train_all.append(X_wins)
                            Y_train_all.append(Y_wins)
                        if len(X_train_all) == 0:
                            # If no training data (should not happen unless window_size > all seq lengths)
                            continue
                        # Concatenate all training windows from all sequences
                        X_train_all = np.vstack(X_train_all)
                        Y_train_all = np.vstack(Y_train_all)
                        
                        # Generate windowed data for testing (evaluation fold)
                        X_test_all = []
                        Y_test_all = []
                        for seq_idx in test_idx:
                            X_seq = scaler.transform(sequences_X[seq_idx])
                            Z_seq = sequences_Z[seq_idx]
                            X_wins, Y_wins = create_windows_from_sequence(X_seq, Z_seq, window_size)
                            if X_wins.size == 0:
                                continue  # If test sequence too short, it contributes no samples
                            X_test_all.append(X_wins)
                            Y_test_all.append(Y_wins)
                        if len(X_test_all) == 0:
                            # If no test data for this fold (all test sequences too short), skip fold
                            print(f"Fold {fold_index}: no test data (sequence too short for window={window_size}). Skipping.")
                            continue
                        X_test_all = np.vstack(X_test_all)
                        Y_test_all = np.vstack(Y_test_all)
                        
                        # Split off a validation set from training data for early stopping
                        X_train, X_val, Y_train, Y_val = train_test_split(
                            X_train_all, Y_train_all, test_size=0.1, random_state=42)
                        
                        # Build model for this configuration
                        model = create_model(window_size, arch, num_layers, units, lr)
                        
                        # Callbacks for early stopping and checkpointing
                        callbacks = [
                            EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True),
                            ModelCheckpoint(filepath='best_model.h5', monitor='val_loss',
                                            save_best_only=True, save_weights_only=True)
                        ]
                        
                        # Train the model
                        model.fit(X_train, Y_train, epochs=epochs_per_fold, batch_size=32,
                                  validation_data=(X_val, Y_val), callbacks=callbacks, verbose=0)
                        
                        # Load best weights (if not already restored by early stopping)
                        # (EarlyStopping with restore_best_weights=True already did this, but we'll ensure by loading checkpoint)
                        try:
                            model.load_weights('best_model.h5')
                        except Exception as e:
                            pass  # If file not found (e.g., not saved because no improvement), ignore
                        
                        # Evaluate on the test set
                        loss, mae = model.evaluate(X_test_all, Y_test_all, verbose=0)
                        fold_mse_scores.append(loss)
                        fold_mae_scores.append(mae)
                        print(f"Fold {fold_index} MSE: {loss:.6f}, MAE: {mae:.6f}")
                        
                        # Clean up model to free memory before next fold
                        tf.keras.backend.clear_session()
                        del model
                        fold_index += 1
                    
                    # Compute average metrics across folds
                    avg_mse = float(np.mean(fold_mse_scores)) if fold_mse_scores else float('nan')
                    avg_mae = float(np.mean(fold_mae_scores)) if fold_mae_scores else float('nan')
                    elapsed = time.time() - start_time
                    print(f"Avg MSE: {avg_mse:.6f}, Avg MAE: {avg_mae:.6f}, Training time: {elapsed:.2f} sec")
                    
                    # Log results to CSV
                    result_data = [arch, window_size, num_layers, units, lr]
                    # Add each fold's MSE
                    for i in range(10):
                        result_data.append(fold_mse_scores[i] if i < len(fold_mse_scores) else "")
                    result_data += [avg_mse, avg_mae, elapsed]
                    result_line = ",".join(map(str, result_data))
                    with open(results_file, 'a') as f:
                        f.write(result_line + "\n")


### Running the Experiment

The grid search will run 787 configurations, each using 10-fold cross-validation. For every hyperparameter combination, training samples are generated, the model is trained with early stopping and checkpointing, and evaluation metrics (MSE and MAE) are recorded for each fold. All configuration details, fold metrics, average errors, and training times are logged incrementally to a CSV file, ensuring results are preserved even if the process is interrupted.
best models result : 



```markdown
## ✅ Best Model Summary

```
_________________________________________________________________
 Layer (type)                Output Shape              Param #     
=================================================================
 flatten_9 (Flatten)         (None, 260)               0           
 dense_63 (Dense)            (None, 128)               33408       
 dense_64 (Dense)            (None, 128)               16512       
 dense_65 (Dense)            (None, 128)               16512       
 dense_66 (Dense)            (None, 128)               16512       
 dense_67 (Dense)            (None, 128)               16512       
 dense_68 (Dense)            (None, 128)               16512       
 dense_69 (Dense)            (None, 13)                1677        
=================================================================
```

---

### 📊 **Average Metrics Across 10-Fold Validation**
- **RSS**: `96.631887`  
- **MAE**: `0.037999`  
- **MSE**: `0.003328`  
- **Cross-Entropy**: `3.450090`  
- **KL Divergence**: `2.313532`
```