# Pose Preprocessing with MediaPipe Tasks API (v2 - Enhanced Features)

This notebook extracts pose landmarks from exercise videos using the **MediaPipe Tasks API**.

## Version 2 Enhancements
- **Raw landmark storage**: Saves normalized (33, 3) landmarks per frame for future feature experiments
- **Enhanced feature set**: 13 angles + 6 distance-based features = 19 total features
- **Distance features for problem exercises**:
  - **Shrugs**: Ear-to-shoulder vertical distance (captures shoulder elevation)
  - **Curl variants**: Wrist-shoulder distance, elbow-hip distance (differentiates arm positions)

In [7]:
import sys
import os
import logging
import numpy as np
from pathlib import Path

# Setup paths
project_root = Path.cwd().parent.parent

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

print(f"Project root: {project_root}")

Project root: /mnt/d/Graduation Project/ai-virtual-coach


## Download MediaPipe Model (First Time Only)

The new MediaPipe Tasks API requires a model file. We're using the **lite** model for faster inference.

In [2]:
import urllib.request
from pathlib import Path

# Model download URL - Using LITE model for faster inference
MODEL_URL = 'https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/latest/pose_landmarker_lite.task'
MODEL_PATH = project_root / 'datasets' / 'pose_landmarker_lite.task'

# Download if not exists
if not MODEL_PATH.exists():
    print(f"Downloading MediaPipe pose model (LITE) to {MODEL_PATH}...")
    MODEL_PATH.parent.mkdir(parents=True, exist_ok=True)
    urllib.request.urlretrieve(MODEL_URL, MODEL_PATH)
    print(f"‚úÖ Model downloaded successfully ({MODEL_PATH.stat().st_size / 1024 / 1024:.1f} MB)")
else:
    print(f"‚úÖ Model already exists at {MODEL_PATH} ({MODEL_PATH.stat().st_size / 1024 / 1024:.1f} MB)")

# NOW import the preprocessing module (after model is available)
print("\nImporting preprocessing module...")
sys.path.insert(0, str(project_root / 'src'))
from preprocessing.preprocess_pose_RGB import (
    extract_pose_estimates,
    extract_raw_pose_landmarks,
    ANGLE_NAMES,
    DISTANCE_NAMES,
    ALL_FEATURE_NAMES
)
print("‚úÖ Module imported successfully!")
print(f"\nFeature sets available:")
print(f"  - Angles ({len(ANGLE_NAMES)}): {ANGLE_NAMES}")
print(f"  - Distances ({len(DISTANCE_NAMES)}): {DISTANCE_NAMES}")
print(f"  - All features ({len(ALL_FEATURE_NAMES)}): {len(ALL_FEATURE_NAMES)} total")

‚úÖ Model already exists at /mnt/d/Graduation Project/ai-virtual-coach/datasets/pose_landmarker_lite.task (5.5 MB)

Importing preprocessing module...
‚úÖ Module imported successfully!

Feature sets available:
  - Angles (13): ['left_elbow', 'right_elbow', 'left_shoulder', 'right_shoulder', 'left_hip', 'right_hip', 'left_knee', 'right_knee', 'torso_lean', 'left_ankle', 'right_ankle', 'left_wrist', 'right_wrist']
  - Distances (6): ['left_ear_shoulder_vert', 'right_ear_shoulder_vert', 'left_wrist_shoulder_dist', 'right_wrist_shoulder_dist', 'left_elbow_hip_dist', 'right_elbow_hip_dist']
  - All features (19): 19 total


## Configuration

In [3]:
# Paths
CLIPS_PATH = project_root / 'datasets' / 'Clips'
OUTPUT_DIR = project_root / 'datasets' / 'Mediapipe pose estimates'

# Create output directory if it doesn't exist
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# Parameters
T_FIXED = 50  # Fixed length for temporal sequences
VIEWS = ['front', 'side']  # Views to process
VERSION = "19features"  # v2: includes raw landmarks + 19 features (13 angles + 6 distances)

print(f"Clips directory: {CLIPS_PATH}")
print(f"Output directory: {OUTPUT_DIR}")
print(f"Fixed temporal length: {T_FIXED} frames")
print(f"Version: {VERSION}")
print(f"\nüìä Feature breakdown:")
print(f"  - 13 joint angles (elbow, shoulder, hip, knee, ankle, wrist, torso)")
print(f"  - 6 distance features (ear-shoulder, wrist-shoulder, elbow-hip)")
print(f"  - Total: 19 features per frame √ó 50 frames = 950 features (flattened)")

Clips directory: /mnt/d/Graduation Project/ai-virtual-coach/datasets/Clips
Output directory: /mnt/d/Graduation Project/ai-virtual-coach/datasets/Mediapipe pose estimates
Fixed temporal length: 50 frames
Version: 19features

üìä Feature breakdown:
  - 13 joint angles (elbow, shoulder, hip, knee, ankle, wrist, torso)
  - 6 distance features (ear-shoulder, wrist-shoulder, elbow-hip)
  - Total: 19 features per frame √ó 50 frames = 950 features (flattened)


## Process Front View (Enhanced - Raw Landmarks + Features)

Extract raw normalized landmarks and compute all feature types (angles, distances, combined).

In [4]:
print("="*60)
print("PROCESSING FRONT VIEW (Enhanced v2)")
print("="*60)

front_output_path = OUTPUT_DIR / 'pose_data_front.npz'

# Use the new extraction function that saves raw landmarks + all features
front_dataset, front_stats, front_failed = extract_raw_pose_landmarks(
    clips_path=str(CLIPS_PATH),
    view='front',
    T_fixed=T_FIXED,
    output_path=str(front_output_path),
    version_tag=VERSION
)

print("\n" + "="*60)
print("FRONT VIEW SUMMARY")
print("="*60)
print(f"Total reps extracted: {front_stats['total_reps']}")
print(f"Unique subjects: {front_stats['unique_subjects']}")
print(f"Unique exercises: {front_stats['unique_exercises']}")
print(f"Videos processed: {front_stats['total_videos_processed']}")
print(f"Total frames extracted: {front_stats['total_frames_extracted']}")
print(f"Failed videos: {front_stats['failed_videos']}")
print(f"\nüìê Feature shapes:")
print(f"  Raw landmarks: {front_stats['landmarks_shape']} (N, T, 33 landmarks, xyz)")
print(f"  Angles only:   {front_stats['angles_shape']} (N, T, 13 angles)")
print(f"  Distances:     {front_stats['distances_shape']} (N, T, 6 distances)")
print(f"  All features:  {front_stats['all_features_shape']} (N, T, 19 features)")
print(f"\n‚è±Ô∏è Tempo Statistics:")
print(f"  Duration (mean/median): {front_stats['tempo_stats']['duration_mean']:.2f}s / {front_stats['tempo_stats']['duration_median']:.2f}s")
print(f"  Frame count (mean/median): {front_stats['tempo_stats']['frame_count_mean']:.1f} / {front_stats['tempo_stats']['frame_count_median']:.0f}")
print(f"  FPS values: {front_stats['tempo_stats']['fps_unique']}")

INFO - Scanning clips directory for front view...


PROCESSING FRONT VIEW (Enhanced v2)


Scanning front videos: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 15/15 [00:04<00:00,  3.56it/s]
INFO - [scan_clips_directory] Scanned front view: 147 samples, 1574 videos, 49 subjects, 15 exercises
INFO - Found 147 sample(s) to process
Extracting front landmarks:   0%|          | 0/147 [00:00<?, ?it/s]INFO - 
Processing: Dumbbell shoulder press / volunteer_001
W0000 00:00:1769242329.677263   15964 landmark_projection_calculator.cc:78] Using NORM_RECT without IMAGE_DIMENSIONS is only supported for the square ROI. Provide IMAGE_DIMENSIONS or use PROJECTION_MATRIX.
INFO -   ‚úì Extracted 11 rep(s)
Extracting front landmarks:   1%|          | 1/147 [00:13<32:36, 13.40s/it]INFO - 
Processing: Dumbbell shoulder press / volunteer_010
INFO -   ‚úì Extracted 10 rep(s)
Extracting front landmarks:   1%|‚ñè         | 2/147 [00:24<29:13, 12.09s/it]INFO - 
Processing: Dumbbell shoulder press / volunteer_002
INFO -   ‚úì Extracted 10 rep(s)
Extracting front landmarks:   2%|‚ñè         | 3/147 [00:39<32:26


FRONT VIEW SUMMARY
Total reps extracted: 1574
Unique subjects: 49
Unique exercises: 15
Videos processed: 1574
Total frames extracted: 136796
Failed videos: 0

üìê Feature shapes:
  Raw landmarks: (1574, 50, 33, 3) (N, T, 33 landmarks, xyz)
  Angles only:   (1574, 50, 13) (N, T, 13 angles)
  Distances:     (1574, 50, 6) (N, T, 6 distances)
  All features:  (1574, 50, 19) (N, T, 19 features)

‚è±Ô∏è Tempo Statistics:
  Duration (mean/median): 2.53s / 2.37s
  Frame count (mean/median): 87.3 / 73
  FPS values: [23.148148, 23.809525, 24.0, 24.019608, 24.038462, 24.074074, 24.09091, 24.107143, 24.122807, 24.152542, 24.180328, 24.404762, 24.468084, 24.479166, 24.489796, 27.894737, 28.891376, 28.947369, 29.085873, 29.111841, 29.186604, 29.290617, 29.355078, 29.401089, 29.425837, 29.438002, 29.473684, 29.496819, 29.504131, 29.513035, 29.519451, 29.550941, 29.56327, 29.570747, 29.587482, 29.605263, 29.645542, 29.650461, 29.653402, 29.65675, 29.665071, 29.67742, 29.70297, 29.707602, 29.721363, 

## Process Side View (Enhanced - Raw Landmarks + Features)

In [9]:
print("="*60)
print("PROCESSING SIDE VIEW (Enhanced v2)")
print("="*60)

side_output_path = OUTPUT_DIR / 'pose_data_side_19_features.npz'

# Use the new extraction function
side_dataset, side_stats, side_failed = extract_raw_pose_landmarks(
    clips_path=str(CLIPS_PATH),
    view='side',
    T_fixed=T_FIXED,
    output_path=str(side_output_path),
    version_tag=VERSION
)

print("\n" + "="*60)
print("SIDE VIEW SUMMARY")
print("="*60)
print(f"Total reps extracted: {side_stats['total_reps']}")
print(f"Unique subjects: {side_stats['unique_subjects']}")
print(f"Unique exercises: {side_stats['unique_exercises']}")
print(f"Videos processed: {side_stats['total_videos_processed']}")
print(f"Total frames extracted: {side_stats['total_frames_extracted']}")
print(f"Failed videos: {side_stats['failed_videos']}")
print(f"\nüìê Feature shapes:")
print(f"  Raw landmarks: {side_stats['landmarks_shape']} (N, T, 33 landmarks, xyz)")
print(f"  Angles only:   {side_stats['angles_shape']} (N, T, 13 angles)")
print(f"  Distances:     {side_stats['distances_shape']} (N, T, 6 distances)")
print(f"  All features:  {side_stats['all_features_shape']} (N, T, 19 features)")
print(f"\n‚è±Ô∏è Tempo Statistics:")
print(f"  Duration (mean/median): {side_stats['tempo_stats']['duration_mean']:.2f}s / {side_stats['tempo_stats']['duration_median']:.2f}s")
print(f"  Frame count (mean/median): {side_stats['tempo_stats']['frame_count_mean']:.1f} / {side_stats['tempo_stats']['frame_count_median']:.0f}")
print(f"  FPS values: {side_stats['tempo_stats']['fps_unique']}")
print(f"\nüìÅ Output File: {side_stats.get('output_file', 'N/A')}")

INFO - Scanning clips directory for side view...


PROCESSING SIDE VIEW (Enhanced v2)


Scanning side videos: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 15/15 [00:04<00:00,  3.71it/s]
INFO - [scan_clips_directory] Scanned side view: 149 samples, 1571 videos, 49 subjects, 15 exercises
INFO - Found 149 sample(s) to process
Extracting side landmarks:   0%|          | 0/149 [00:00<?, ?it/s]INFO - 
Processing: Dumbbell shoulder press / volunteer_001
INFO -   ‚úì Extracted 11 rep(s)
Extracting side landmarks:   1%|          | 1/149 [00:19<48:16, 19.57s/it]INFO - 
Processing: Dumbbell shoulder press / volunteer_010
INFO -   ‚úì Extracted 10 rep(s)
Extracting side landmarks:   1%|‚ñè         | 2/149 [00:40<50:17, 20.53s/it]INFO - 
Processing: Dumbbell shoulder press / volunteer_002
INFO -   ‚úì Extracted 10 rep(s)
Extracting side landmarks:   2%|‚ñè         | 3/149 [00:54<42:41, 17.55s/it]INFO - 
Processing: Dumbbell shoulder press / volunteer_003
INFO -   ‚úì Extracted 11 rep(s)
Extracting side landmarks:   3%|‚ñé         | 4/149 [01:10<40:45, 16.87s/it]INFO - 
Processing: Dumbbell sh


SIDE VIEW SUMMARY
Total reps extracted: 1571
Unique subjects: 49
Unique exercises: 15
Videos processed: 1571
Total frames extracted: 126361
Failed videos: 0

üìê Feature shapes:
  Raw landmarks: (1571, 50, 33, 3) (N, T, 33 landmarks, xyz)
  Angles only:   (1571, 50, 13) (N, T, 13 angles)
  Distances:     (1571, 50, 6) (N, T, 6 distances)
  All features:  (1571, 50, 19) (N, T, 19 features)

‚è±Ô∏è Tempo Statistics:
  Duration (mean/median): 2.53s / 2.37s
  Frame count (mean/median): 82.2 / 73
  FPS values: [28.921568, 29.23077, 29.282297, 29.285715, 29.387754, 29.425837, 29.519451, 29.545454, 29.571428, 29.605263, 29.61039, 29.620564, 29.637096, 29.642857, 29.651163, 29.65909, 29.662077, 29.68421, 29.6875, 29.719625, 29.727272, 29.732143, 29.73913, 29.741379, 29.76704, 29.767443, 29.774437, 29.77612, 29.777779, 29.8, 29.850746, 29.864254, 29.885057, 29.91453, 29.916897, 29.918034, 29.938482, 29.962547, 29.96942, 29.97003, 29.971182, 29.972248, 29.972752, 29.973238, 29.973707, 29.97459

### Failed Videos (Side)

In [10]:
if side_failed:
    print(f"\n‚ö†Ô∏è {len(side_failed)} samples failed to process:")
    for item in side_failed[:10]:  # Show first 10
        print(f"  - {item['exercise']} / {item['subject']}: {item['error']}")
    if len(side_failed) > 10:
        print(f"  ... and {len(side_failed) - 10} more")
else:
    print("‚úÖ All side view samples processed successfully!")

‚úÖ All side view samples processed successfully!


## Combined Summary

In [11]:
print("="*60)
print("COMBINED SUMMARY (FRONT + SIDE)")
print("="*60)

total_reps = front_stats['total_reps'] + side_stats['total_reps']
total_videos = front_stats['total_videos_processed'] + side_stats['total_videos_processed']
total_frames = front_stats['total_frames_extracted'] + side_stats['total_frames_extracted']
total_failed = front_stats['failed_videos'] + side_stats['failed_videos']

# Combined unique counts
all_subjects = set(front_dataset['subject_ids']) | set(side_dataset['subject_ids'])
all_exercises = set(front_dataset['exercise_names']) | set(side_dataset['exercise_names'])

print(f"\nüìÑ Dataset Statistics:")
print(f"  Total reps: {total_reps}")
print(f"  Total videos: {total_videos}")
print(f"  Total frames: {total_frames}")
print(f"  Unique volunteers: {len(all_subjects)}")
print(f"  Unique exercises: {len(all_exercises)}")
print(f"  Failed samples: {total_failed}")

print(f"\nüìÅ Output Files:")
print(f"  Front Temporal: {front_stats.get('temporal_file', 'N/A')}")
print(f"  Side Temporal:  {side_stats.get('temporal_file', 'N/A')}")

print(f"\n‚úÖ Preprocessing complete!")

COMBINED SUMMARY (FRONT + SIDE)

üìÑ Dataset Statistics:
  Total reps: 3145
  Total videos: 3145
  Total frames: 263157
  Unique volunteers: 49
  Unique exercises: 15
  Failed samples: 0

üìÅ Output Files:
  Front Temporal: N/A
  Side Temporal:  N/A

‚úÖ Preprocessing complete!


## Verify Output Files

In [12]:
# Load and inspect saved enhanced NPZ files
print("Verifying saved enhanced NPZ files...\n")

for view in ['front', 'side']:
    stats = front_stats if view == 'front' else side_stats
    
    output_file = stats.get('output_file')
    if output_file and os.path.exists(output_file):
        print(f"\n{'='*60}")
        print(f"{view.upper()} VIEW - ENHANCED FEATURES (v2)")
        print(f"{'='*60}")
        data = np.load(output_file, allow_pickle=True)
        print(f"  File: {os.path.basename(output_file)}")
        print(f"  Keys: {list(data.keys())}")
        print(f"\n  üìä Feature Arrays:")
        print(f"    X_landmarks:    {data['X_landmarks'].shape} - {data['X_landmarks'].dtype}")
        print(f"    X_angles:       {data['X_angles'].shape} - {data['X_angles'].dtype}")
        print(f"    X_distances:    {data['X_distances'].shape} - {data['X_distances'].dtype}")
        print(f"    X_all_features: {data['X_all_features'].shape} - {data['X_all_features'].dtype}")
        print(f"\n  üìù Metadata:")
        print(f"    exercise_names: {data['exercise_names'].shape}")
        print(f"    subject_ids:    {data['subject_ids'].shape}")
        print(f"    view:           {data['view']}")
        print(f"    T_fixed:        {data['T_fixed']}")
        print(f"\n  üìê Feature Names:")
        print(f"    angle_names ({len(data['angle_names'])}): {list(data['angle_names'])}")
        print(f"    distance_names ({len(data['distance_names'])}): {list(data['distance_names'])}")

print("\n‚úÖ All files verified successfully!")

Verifying saved enhanced NPZ files...


FRONT VIEW - ENHANCED FEATURES (v2)
  File: pose_data_front_19features.npz
  Keys: ['X_landmarks', 'X_angles', 'X_distances', 'X_all_features', 'exercise_names', 'subject_ids', 'tempo_duration_sec', 'tempo_frame_count', 'tempo_fps', 'view', 'T_fixed', 'angle_names', 'distance_names', 'all_feature_names']

  üìä Feature Arrays:
    X_landmarks:    (1574, 50, 33, 3) - float32
    X_angles:       (1574, 50, 13) - float32
    X_distances:    (1574, 50, 6) - float32
    X_all_features: (1574, 50, 19) - float32

  üìù Metadata:
    exercise_names: (1574,)
    subject_ids:    (1574,)
    view:           front
    T_fixed:        50

  üìê Feature Names:
    angle_names (13): ['left_elbow', 'right_elbow', 'left_shoulder', 'right_shoulder', 'left_hip', 'right_hip', 'left_knee', 'right_knee', 'torso_lean', 'left_ankle', 'right_ankle', 'left_wrist', 'right_wrist']
    distance_names (6): ['left_ear_shoulder_vert', 'right_ear_shoulder_vert', 'left_wrist_s

## Sample Data Inspection

In [None]:
# Inspect first few samples and new distance features
print("Sample data from front view:\n")
print("First 5 exercises:")
for i in range(min(5, len(front_dataset['exercise_names']))):
    print(f"  {i+1}. {front_dataset['exercise_names'][i]} (Subject {front_dataset['subject_ids'][i]})")

print("\n" + "="*60)
print("SAMPLE: All 19 Features (first rep, first 3 frames)")
print("="*60)
sample_all = front_dataset['X_all_features'][0, :3, :]
print(f"Shape: {sample_all.shape}")
print("\nFeature values:")
for t in range(3):
    print(f"\n  Frame {t}:")
    # Angles (0-12)
    print(f"    Angles: L_elbow={sample_all[t,0]:.1f}¬∞ R_elbow={sample_all[t,1]:.1f}¬∞")
    print(f"            L_shoulder={sample_all[t,2]:.1f}¬∞ R_shoulder={sample_all[t,3]:.1f}¬∞")
    print(f"            torso_lean={sample_all[t,8]:.1f}¬∞")
    # Distances (13-18) - these are normalized by torso length
    print(f"    Distances: L_ear_shoulder={sample_all[t,13]:.3f} R_ear_shoulder={sample_all[t,14]:.3f}")
    print(f"               L_wrist_shoulder={sample_all[t,15]:.3f} R_wrist_shoulder={sample_all[t,16]:.3f}")
    print(f"               L_elbow_hip={sample_all[t,17]:.3f} R_elbow_hip={sample_all[t,18]:.3f}")

print("\n" + "="*60)
print("COMPARISON: Distance Features vs Angles")
print("="*60)
print("\nDistance features are normalized by torso length (scale-invariant):")
print("  - ear_shoulder_vert: Vertical distance from ear to shoulder")
print("    (smaller value = shoulder raised, useful for SHRUGS)")
print("  - wrist_shoulder_dist: 3D distance from wrist to shoulder")
print("    (captures arm extension in curls)")
print("  - elbow_hip_dist: 3D distance from elbow to hip")
print("    (differentiates arm position in CURL variants)")

Sample data from front view:

First 5 exercises:
  1. Dumbbell shoulder press (Subject 1)
  2. Dumbbell shoulder press (Subject 1)
  3. Dumbbell shoulder press (Subject 1)
  4. Dumbbell shoulder press (Subject 1)
  5. Dumbbell shoulder press (Subject 1)

Static features (first rep, first 10 values):
[128.39922   15.510805 101.697075 148.72682   47.029747 131.1648
  19.645264  85.70126  161.67555   75.97429 ]

Temporal features (first rep, first 5 frames):
[[125.94949  131.30444  105.06773   94.97147  160.69958  130.76443
  143.02394  166.71664   16.87413 ]
 [116.06957  129.82344  101.49968   97.248924 164.11037  133.51707
  134.34184  167.17041   16.920328]
 [114.22898  130.19913  100.61363   98.5807   164.7034   135.01562
  132.62128  167.30406   17.278519]
 [114.594    130.73834  100.55692   99.13356  164.43755  135.65314
  132.97652  167.35484   17.50489 ]
 [115.18252  131.93718  100.53263   99.74887  164.18378  136.31961
  133.1552   167.35368   17.760881]]


## Phase 2: Regenerate with Specialized Features

Regenerate NPZ files with Phase 2 specialized features (confusion cluster discrimination).

In [None]:
# Phase 2 Configuration
VERSION_EXTENDED = "37features"  # v3: includes 19 base + 18 specialized features

# Import specialized feature names from main preprocessing module
from preprocessing.preprocess_pose_RGB import (
    SPECIALIZED_FEATURE_NAMES,
    ALL_EXTENDED_FEATURE_NAMES,
)

print(f"Phase 2 Feature Sets:")
print(f"  - Specialized ({len(SPECIALIZED_FEATURE_NAMES)}): {list(SPECIALIZED_FEATURE_NAMES)}")
print(f"\nüìä Full feature breakdown (37 total):")
print(f"  - 13 joint angles")
print(f"  - 6 distance features")
print(f"  - 18 specialized discrimination features")
print(f"  - Total: 37 features per frame √ó 50 frames = 1850 features (flattened)")

In [None]:
# Process Front View with Extended Features (Phase 2)
print("="*60)
print("PROCESSING FRONT VIEW (Phase 2 - Extended Features)")
print("="*60)

front_output_path_v3 = OUTPUT_DIR / 'pose_data_front.npz'

# Use the updated extraction function (now includes extended features)
front_dataset_v3, front_stats_v3, front_failed_v3 = extract_raw_pose_landmarks(
    clips_path=str(CLIPS_PATH),
    view='front',
    T_fixed=T_FIXED,
    output_path=str(front_output_path_v3),
    version_tag=VERSION_EXTENDED
)

print("\n" + "="*60)
print("FRONT VIEW SUMMARY (Phase 2)")
print("="*60)
print(f"Total reps extracted: {front_stats_v3['total_reps']}")
print(f"Unique subjects: {front_stats_v3['unique_subjects']}")
print(f"Unique exercises: {front_stats_v3['unique_exercises']}")
print(f"\nüìê Base Feature shapes:")
print(f"  Raw landmarks: {front_stats_v3['landmarks_shape']}")
print(f"  Angles only:   {front_stats_v3['angles_shape']}")
print(f"  Distances:     {front_stats_v3['distances_shape']}")
print(f"  All features:  {front_stats_v3['all_features_shape']}")

if 'specialized_shape' in front_stats_v3:
    print(f"\nüìê Extended Feature shapes (Phase 2):")
    print(f"  Specialized:  {front_stats_v3['specialized_shape']} (14 features)")
    print(f"  Velocity:     {front_stats_v3['velocity_shape']} (7 features)")
    print(f"  Extended:     {front_stats_v3['extended_shape']} (21 features)")
    print(f"\n‚úÖ Phase 2 extended features successfully computed!")
else:
    print(f"\n‚ö†Ô∏è Extended features not available in output")

In [None]:
# Process Side View with Specialized Features (Phase 2)
print("="*60)
print("PROCESSING SIDE VIEW (Phase 2 - Specialized Features)")
print("="*60)

side_output_path_v3 = OUTPUT_DIR / 'pose_data_side.npz'

side_dataset_v3, side_stats_v3, side_failed_v3 = extract_raw_pose_landmarks(
    clips_path=str(CLIPS_PATH),
    view='side',
    T_fixed=T_FIXED,
    output_path=str(side_output_path_v3),
    version_tag=VERSION_EXTENDED
)

print("\n" + "="*60)
print("SIDE VIEW SUMMARY (Phase 2)")
print("="*60)
print(f"Total reps extracted: {side_stats_v3['total_reps']}")
print(f"Unique subjects: {side_stats_v3['unique_subjects']}")
print(f"Unique exercises: {side_stats_v3['unique_exercises']}")
print(f"\nüìê Base Feature shapes:")
print(f"  Raw landmarks: {side_stats_v3['landmarks_shape']}")
print(f"  Angles:        {side_stats_v3['angles_shape']}")
print(f"  Distances:     {side_stats_v3['distances_shape']}")
print(f"  All features:  {side_stats_v3['all_features_shape']}")

if 'specialized_shape' in side_stats_v3:
    print(f"\nüìê Specialized Feature shapes (Phase 2):")
    print(f"  Specialized:  {side_stats_v3['specialized_shape']}")
    print(f"\n‚úÖ Phase 2 specialized features successfully computed!")
else:
    print(f"\n‚ö†Ô∏è Specialized features not available")

In [None]:
# Verify Phase 2 NPZ files
print("Verifying Phase 2 NPZ files with specialized features...\n")

for view in ['front', 'side']:
    stats = front_stats_v3 if view == 'front' else side_stats_v3
    
    output_file = stats.get('output_file')
    if output_file and os.path.exists(output_file):
        print(f"\n{'='*60}")
        print(f"{view.upper()} VIEW - PHASE 2 SPECIALIZED FEATURES")
        print(f"{'='*60}")
        data = np.load(output_file, allow_pickle=True)
        print(f"  File: {os.path.basename(output_file)}")
        print(f"  Keys: {list(data.keys())}")
        
        print(f"\n  üìä Base Feature Arrays:")
        print(f"    X_landmarks:    {data['X_landmarks'].shape}")
        print(f"    X_all_features: {data['X_all_features'].shape}")
        
        if 'X_specialized' in data:
            print(f"\n  üìä Specialized Feature Arrays (Phase 2):")
            print(f"    X_specialized:  {data['X_specialized'].shape}")
            
            print(f"\n  üìê Feature Names:")
            print(f"    specialized ({len(data['specialized_feature_names'])}): {list(data['specialized_feature_names'])[:5]}...")
        else:
            print(f"\n  ‚ö†Ô∏è Specialized features not found in file")

print("\n‚úÖ Phase 2 NPZ verification complete!")