# Sprint 10 Report

---

# 3D Skeleton Visualization System

## Overview

This section details the implementation of a 3D skeleton visualization system for comparing ground truth and predicted skeletal poses. 

## System Architecture

The system is composed of several React components working together:

1. **CSVPrediction** - The main container component that handles file uploading, data processing, and state management
2. **PoseNet3DVisualization** - Orchestrates the visualization components and manages display modes
3. **SkeletonRenderer** - Core Three.js component that renders 3D skeletons with rotation capabilities
4. **SkeletonContext & Provider** - Context API implementation for sharing state across components
5. **SkeletonControls** - UI controls for animation playback and view options
6. **AnimationManager** - Handles animation timing and frame synchronization

## Detailed Component Breakdown

### 1. CSVPrediction Component

This is the main entry point component that handles:

- **File Management**: CSV uploads via drag-and-drop or file selection
- **Backend Integration**: Loading sample ground truth data from the server
- **Data Processing**: Parsing CSV files with automatic delimiter detection
- **State Management**: Tracking data state, UI state, and visualization modes
- **UI Rendering**: Tab-based interface for data and visualization views

#### Key Features:

- Adaptive delimiter detection (comma or tab-separated values)
- Progress indication during file processing
- Data preview with Z-value highlighting
- Tab-based interface switching between data view and 3D visualization
- Toggle for switching between side-by-side and overlapping visualization modes

```javascript
// Example of CSV data processing function
const processCSV = async (file: File) => {
  // Read file content
  const text = await file.text();
  const lines = text.split("\n");
  
  // Detect delimiter (comma or tab)
  const firstLine = lines[0];
  const delimiter = firstLine.includes("\t") ? "\t" : ",";
  
  // Parse headers and rows...
  // Convert to format for 3D visualization...
  
  // Set as predicted data
  setPredictedData(objectRows);
};
```

### 2. PoseNet3DVisualization Component

This component orchestrates the visualization experience:

- Wraps child components in the SkeletonProvider context
- Determines visualization mode (side-by-side or overlapping)
- Manages synchronized rendering of multiple SkeletonRenderer instances
- Provides feedback for invalid skeletal data

```javascript
// Rendering logic for different modes
{showSideBySide ? (
  // Side-by-side view
  <div className="grid grid-cols-2 gap-4">
    <SkeletonRenderer
      poseData={poseData}
      isGroundTruth={true}
      label={groundTruthLabel}
    />
    <SkeletonRenderer
      poseData={predictedData}
      isGroundTruth={false}
      label={predictedLabel}
    />
  </div>
) : (
  // Overlapping view
  <SkeletonRenderer
    poseData={poseData}
    isGroundTruth={true}
    label={`${groundTruthLabel} vs ${predictedLabel}`}
    comparisonPoseData={predictedData}
  />
)}
```

### 3. SkeletonRenderer Component

This is the core visualization component built with Three.js:

- **3D Scene Setup**: Canvas creation, camera positioning, lighting
- **Skeleton Rendering**: Converting joint data into 3D geometry
- **Animation Support**: Handling frame updates with position preservation
- **Interaction**: Camera controls for panning, rotation, and zooming
- **Visual Feedback**: Color coding and labels for different data types

The component follows this initialization sequence:
1. Create THREE.js scene, camera, renderer, and lights
2. Set up OrbitControls for user interaction
3. Add visual elements like labels and legends
4. Create the animation loop
5. Set up event listeners and cleanups

```javascript
// Core part of the rendering logic
const updateSkeleton = (frameIndex) => {
  // Get frame data and create skeleton group
  const frameData = poseData[validFrameIndex];
  const skeletonGroup = new THREE.Group();
  
  // Extract joints from frame data
  for (const key in frameData) {
    if (key.endsWith("_x") || key.endsWith("_y") || key.endsWith("_z")) {
      // Extract joint coordinates...
    }
  }
  
  // Create joint spheres
  Object.keys(joints).forEach((jointName) => {
    // Create sphere for each joint...
    skeletonGroup.add(sphere);
  });
  
  // Create bone connections
  POSE_CONNECTIONS.forEach((connection) => {
    // Create cylinder between connected joints...
    skeletonGroup.add(bone);
  });
  
  // Add skeleton to scene
  sceneRef.current.add(skeletonGroup);
}
```

#### Key Technical Features:

1. **Rotation Management**:
   - Manual rotation via OrbitControls
   - Auto-rotation toggled by the user
   - Synchronized rotation between primary and comparison skeletons
   - Rotation persistence across frame changes

2. **Skeleton Visualization**:
   - Color-coded joints and bones (blue for ground truth, red for prediction)
   - Thicker bones and joints for better visibility
   - Synchronized positioning between comparison skeletons
   - Proper scaling and Y-flipping based on data characteristics

3. **Visual Aids**:
   - In-scene legend indicating ground truth vs prediction
   - Dynamic labeling based on view type
   - Label coloring to match the corresponding skeleton color

### 4. SkeletonContext and Provider

Implements React Context API for managing shared state:

```javascript
const SkeletonContext = createContext({
  currentFrame: 0,
  setCurrentFrame: (frame: number) => {},
  isPlaying: false,
  setIsPlaying: (isPlaying: boolean) => {},
  playbackSpeed: 1,
  setPlaybackSpeed: (speed: number) => {},
  autoRotate: false,
  setAutoRotate: (autoRotate: boolean) => {},
  hasSkeletonData: false,
  setHasSkeletonData: (hasData: boolean) => {},
});
```

Key shared state includes:
- Current animation frame
- Playback state (playing/paused)
- Playback speed
- Auto-rotation toggle
- Skeleton data availability flag

### 5. SkeletonControls Component

Provides user interface for controlling the visualization:

- **Playback Controls**: Play, pause, and reset buttons
- **Frame Navigation**: Seek bar and frame counter
- **Speed Controls**: Playback speed adjustment
- **View Options**: Auto-rotation toggle

```javascript
<div className="flex items-center justify-between">
  <div className="flex items-center space-x-2">
    <Button onClick={togglePlayback} variant="outline" size="sm">
      {isPlaying ? <Pause className="h-4 w-4" /> : <Play className="h-4 w-4" />}
    </Button>
    <Button onClick={resetAnimation} variant="outline" size="sm">
      <RotateCcw className="h-4 w-4" />
    </Button>
  </div>
  
  <Slider
    value={[currentFrame]}
    max={totalFrames - 1}
    step={1}
    onValueChange={handleSliderChange}
  />
  
  <div className="flex items-center space-x-2">
    <Label htmlFor="auto-rotate" className="text-sm">
      Auto Rotate
    </Label>
    <Switch
      id="auto-rotate"
      checked={autoRotate}
      onCheckedChange={setAutoRotate}
    />
  </div>
</div>
```

### 6. AnimationManager Component

Handles animation timing and synchronization:

- **requestAnimationFrame** loop management
- Frame rate control and timing
- Synchronization between multiple skeleton views
- Animation state persistence

```javascript
useEffect(() => {
  if (!isPlaying) return;
  
  let lastFrameTime = 0;
  const frameInterval = 1000 / (fps * playbackSpeed);
  
  const animate = (timestamp) => {
    if (timestamp - lastFrameTime >= frameInterval) {
      setCurrentFrame((prev) => {
        if (prev >= totalFrames - 1) {
          return 0; // Loop back to start
        }
        return prev + 1;
      });
      lastFrameTime = timestamp;
    }
    
    animationRef.current = requestAnimationFrame(animate);
  };
  
  animationRef.current = requestAnimationFrame(animate);
  
  return () => {
    if (animationRef.current) {
      cancelAnimationFrame(animationRef.current);
    }
  };
}, [isPlaying, playbackSpeed, totalFrames]);
```

## Technical Implementation Details

### CSV Data Processing

The system processes CSV data with the following approach:

1. **Reading & Parsing**: 
   - Detects and adapts to different delimiters (comma or tab)
   - Handles header detection and column mapping
   - Converts numeric values automatically

2. **Data Structure**:
   - Each frame becomes an object with properties for each joint coordinate
   - Properties follow the naming convention: `[joint_name]_[x|y|z]`
   - Z-values are optional and handled appropriately when present

3. **Validation**:
   - Checks for required coordinates and proper formatting
   - Validates joint structure consistency across frames
   - Provides feedback for missing or invalid data

### Three.js Implementation

The 3D rendering utilizes Three.js with these key techniques:

1. **Scene Setup**:
   - Dark-themed background with appropriate lighting
   - Perspective camera with controlled viewing parameters
   - OrbitControls for intuitive user interaction

2. **Skeleton Construction**:
   - Joints represented as spheres with appropriate colors
   - Bones implemented as oriented cylinders connecting joints
   - Dynamic scaling based on data characteristics
   - Y-flipping when coordinate systems differ

3. **Animation & Interaction**:
   - Smooth transitions between animation frames
   - Position and rotation preservation during updates
   - Synchronized rotation between compared skeletons
   - Responsive resizing with window dimensions

### UI/UX Design

The user interface is designed for clarity and ease of use:

1. **Tab-Based Navigation**:
   - Data view for uploading and previewing CSV files
   - 3D view for visualization and interaction
   - Seamless transition between views

2. **Data Upload Experience**:
   - Drag-and-drop support with visual feedback
   - File selection alternative via dialog
   - Sample data loading option for quick testing

3. **Visualization Controls**:
   - Toggle between side-by-side and overlapping views
   - Playback controls with speed adjustment
   - Frame-by-frame navigation via slider
   - Auto-rotation for better spatial perception

4. **Visual Feedback**:
   - Color-coded skeletons (blue for ground truth, red for prediction)
   - In-scene legend explaining the color scheme
   - Labels for clear identification
   - Error alerts for invalid data

## Example Usage Flow

1. User loads the application and sees the data tab
2. They either upload a CSV file or load sample data
3. The system processes the file and displays a preview
4. User switches to the 3D view tab to see the visualization
5. They can toggle between side-by-side and overlapping views
6. Animation controls allow them to play through the frames
7. Auto-rotation can be enabled for better spatial understanding
8. Frame by frame examination is possible via the slider


## Conclusion

The 3D Skeleton Visualization System provides a solution for comparing ground truth and predicted skeletal poses. By leveraging Three.js for rendering and React for UI/state management.

---

# PoseNet to Kinect

For the deep learning part, this sprint focused on training a neural network to convert 2D PoseNet coordinates into 2D Kinect coordinates, enabling us to feed its output into last week’s 2D-to-3D Kinect transformer and thus fully translate PoseNet data into Kinect format.

We loaded and aligned the x/y coordinates from both sources, trained and tuned a neural network via grid search, evaluated it on held-out data, and saved the final model.

## Loading the data

The data consists of CSV files representing video sequences, with each row corresponding to a single frame. We have two directories:

* `output_poses` for PoseNet data

* `kinect_good_preprocessed` for Kinect data

It iterates over each PoseNet CSV, locates its corresponding Kinect file, and skips any pairs where the Kinect file is missing. It loads both into pandas DataFrames, aligns them by intersecting their FrameNo values to find shared frames, filters and sorts those frames, and extracts the x- and y-coordinate columns.

This process repeats for every valid file pair. At the end, it vertically stacks the per-file results into two large DataFrames, one holding PoseNet data (features) and the other holding Kinect data (targets), with their frames aligned.

In [None]:
from pathlib import Path
import numpy as np
import pandas as pd
import tensorflow as tf
import keras
from sklearn.model_selection import train_test_split, GridSearchCV, ParameterGrid
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import make_scorer, mean_squared_error
from joblib import dump
import warnings, os

warnings.filterwarnings("ignore", category=UserWarning)
tf.random.set_seed(42)
np.random.seed(42)


DIR_POSE   = Path("ML/data/output_poses")
DIR_KINECT = Path("ML/data/kinect_good_preprocessed")

def xy_columns(df: pd.DataFrame) -> list[str]:
    """Return columns that end with '_x' or '_y' (order preserved)."""
    return [c for c in df.columns if c.endswith(("_x", "_y"))]

X_chunks, y_chunks = [], []

for pose_path in sorted(DIR_POSE.glob("*.csv")):
    key      = pose_path.stem
    kin_path = DIR_KINECT / f"{key}_kinect.csv"
    if not kin_path.exists():
        print(f"⚠️  {key}: missing sister file – skipped.")
        continue

    # ---------------- read & strip column whitespace ----------------
    df_pose = pd.read_csv(pose_path)
    df_pose.columns = df_pose.columns.str.strip()

    df_kin  = pd.read_csv(kin_path)
    df_kin.columns  = df_kin.columns.str.strip()

    # ---------------- align on FrameNo ----------------
    shared_frames = np.intersect1d(df_pose["FrameNo"], df_kin["FrameNo"])
    if shared_frames.size == 0:
        print(f"⚠️  {key}: no overlapping frames – skipped.")
        continue

    df_pose = df_pose[df_pose["FrameNo"].isin(shared_frames)].sort_values("FrameNo")
    df_kin  = df_kin [df_kin ["FrameNo"].isin(shared_frames)].sort_values("FrameNo")

    if not np.array_equal(df_pose["FrameNo"].values, df_kin["FrameNo"].values):
        print(f"⚠️  {key}: frame mismatch after alignment – skipped.")
        continue

    # ---------------- collect xy columns ----------------
    pose_xy_cols = xy_columns(df_pose)

    missing = [c for c in pose_xy_cols if c not in df_kin.columns]
    if missing:
        print(f"⚠️  {key}: Kinect file missing {len(missing)} XY columns – skipped.")
        continue

    X_chunks.append(df_pose[pose_xy_cols].to_numpy(dtype=float))
    y_chunks.append(df_kin [pose_xy_cols].to_numpy(dtype=float))

    print(f"✅  {key}: kept {len(df_pose)} frames.")

# ---------------- stack everything ----------------
if not X_chunks:
    raise RuntimeError("No valid file pairs were found – nothing to train on.")

features = np.vstack(X_chunks)
targets  = np.vstack(y_chunks)

print("\n🎯  Finished:")
print("    features :", features.shape)
print("    targets  :", targets.shape)

#!/usr/bin/env python3
"""
train_xy_to_xy_grid.py
----------------------
PoseNet XY  ➜  Kinect  XY
Adds:
    • tqdm progress‑bar for GridSearchCV
    • prints best parameters neatly
    • saves model as .keras (Keras v3 format)
"""

## Deep Learning Steps

### 1. Load the generated arrays

We copy the PoseNet data (features) and Kinect data (targets) into `X` and `y`:

In [None]:
X = features.copy()
y = targets.copy()

### 2. Configuration

We set up our training options and define the hyperparameter search space:

* `USE_SCALER`: whether to apply scaling to the data

* `PATIENCE`: number of epochs with no improvement before early stopping

* `MAX_EPOCHS`: absolute cap on training epochs

We then define a hyperparameter grid for grid-searching over:

* `units`: number of neurons per hidden layer

* `n_hidden`: number of hidden layers

* `batch_size`: training batch size

* `learning_rate`: optimizer learning rate

In [None]:
USE_SCALER = True           # flip to False to disable StandardScaler
PATIENCE   = 10             # EarlyStopping patience
MAX_EPOCHS = 200            # hard cap; EarlyStopping usually ends sooner

param_grid = {
    "units":        [32, 64, 128],
    "n_hidden":     [2, 3 ,4],
    "batch_size":   [128, 256, 512],
    "learning_rate":[0.001, 0.0005],
}

### 3. Data Scaling and Split

We optionally scale the data if `USE_SCALER` is enabled. Otherwise, we leave it unscaled. After that, we split the data into training and test sets, using 10% for testing.

In [None]:
if USE_SCALER:
    X_scaler = StandardScaler().fit(X)
    y_scaler = StandardScaler().fit(y)
    X = X_scaler.transform(X)
    y = y_scaler.transform(y)
else:
    X_scaler = y_scaler = None

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.10, random_state=42
)

### 4. Model Builder and GridSearch Setup

We define a `build_model` function that constructs a Keras sequential network using the Adam optimizer. The network begins with a 26-node input layer (the x- and y-PoseNet coordinates), passes through `n_hidden` layers of `units` neurons with ReLU activation, and finishes with a 26-node linear output layer (the x- and y-Kinect coordinates). We wrap this in a `KerasRegressor`, attach an `EarlyStopping` callback (monitoring validation loss with our `PATIENCE`), and set up a negative-MSE scorer for grid search. Finally, we initialize `GridSearchCV` to explore our parameter grid via 3-fold cross-validation, using all CPU cores and restoring the best model at the end.

In [None]:
def build_model(units=128, n_hidden=2, learning_rate=0.001):
    model = keras.Sequential([keras.layers.Input(shape=(26,))])
    for _ in range(n_hidden):
        model.add(keras.layers.Dense(units, activation='relu'))
    model.add(keras.layers.Dense(26))         # linear output
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
        loss='mse',
        metrics=['mae']
    )
    return model

from scikeras.wrappers import KerasRegressor
reg = KerasRegressor(model=build_model, epochs=MAX_EPOCHS, verbose=0)

early_stop = keras.callbacks.EarlyStopping(
    monitor="val_loss", patience=PATIENCE, restore_best_weights=True
)

neg_mse = make_scorer(mean_squared_error, greater_is_better=False)

grid = GridSearchCV(
    estimator=reg,
    param_grid=param_grid,
    scoring=neg_mse,
    cv=3,
    n_jobs=-1,
    refit=True,
    verbose=2,                 # we’ll drive output via tqdm instead
)

### 5. Run Grid-search

We kick off the hyperparameter search by printing how many total configurations we’ll try across our 3-fold CV. Then we call `grid.fit()`, passing in our training data along with a 10% internal validation split and the `early_stop` callback. Thanks to our `tqdm` integration, you’ll see a live progress bar as the grid search runs.

In [None]:
print(f"⏳  Running GridSearchCV with {len(ParameterGrid(param_grid))} configs × {grid.cv}‑fold CV\n")
grid_result = grid.fit(
    X_train, y_train,
    validation_split=0.1,
    callbacks=[early_stop],
)

### 6. Report Best Hyperparameters

We print the best hyperparameter combination found by the grid search along with its corresponding cross-validation MSE, then extract the underlying Keras model for further evaluation and saving.

In [None]:
print("\n🏆  Best hyper‑parameters:")
for k, v in grid_result.best_params_.items():
    print(f"   • {k:12s}: {v}")
print("Best CV MSE :", -grid_result.best_score_)

best_model = grid_result.best_estimator_.model_   # Keras model object

### 7. Evaluate on Test Set
We evaluate the best model on the held-out test data to report its final performance, capturing both MSE and MAE and printing the results.

In [None]:
test_mse, test_mae = best_model.evaluate(X_test, y_test, verbose=0)

# Inverse-transform and recompute in original units
y_pred_scaled = best_model.predict(X_test)
y_test_orig   = y_scaler.inverse_transform(y_test)
y_pred_orig   = y_scaler.inverse_transform(y_pred_scaled)

from sklearn.metrics import mean_squared_error, mean_absolute_error

mse_orig = mean_squared_error(y_test_orig, y_pred_orig)
mae_orig = mean_absolute_error(y_test_orig, y_pred_orig)

print("\n📊 Scaled Test MSE :", test_mse)
print("📊 Scaled Test MAE :", test_mae)

print("\n📊 Original-scale Test MSE :", mse_orig)
print("📊 Original-scale Test MAE :", mae_orig)

### 8. Save Artifacts

We persist our results by saving the best Keras model in .keras format and dumping the best hyperparameters to a pickle file. If we applied scaling, we also save the StandardScaler instances so that any new data can be transformed in exactly the same way. If no scaler was used, we create a simple flag file to note that. Finally, we print confirmation that everything has been saved.

In [None]:
best_model.save("xy_to_xy_best.keras")     # v3 format
dump(grid_result.best_params_, "best_params.pkl")

if X_scaler is not None:
    dump(X_scaler, "X_scaler.pkl")
    dump(y_scaler, "y_scaler.pkl")
else:
    open("NO_SCALER_USED.txt", "w").close()

print("\n💾  Model saved to xy_to_xy_best.keras")

## Results

🏆  Best hyper‑parameters:
   • batch_size  : 128
   • modellearning_rate: 0.001 
   • modeln_hidden: 10
   • model__units: 128
Best CV MSE : 0.06771587692012478 std 0.005
75/75 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step

📊 Scaled Test MSE : 0.06131890416145325
📊 Scaled Test MAE : 0.14242953062057495

📊 Original-scale Test MSE : 0.0005780895675723184 meter
📊 Original-scale Test MAE : 0.013890674046940655 meter