# Pose Preprocessing with MediaPipe Tasks API

This notebook extracts pose landmarks from exercise videos using the **new MediaPipe Tasks API**.


In [1]:
import sys
import os
import logging
import numpy as np
from pathlib import Path

# Setup paths
project_root = Path.cwd().parent.parent

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

print(f"Project root: {project_root}")

Project root: /mnt/d/Graduation Project/ai-virtual-coach


## Download MediaPipe Model (First Time Only)

The new MediaPipe Tasks API requires a model file. We're using the **lite** model for faster inference.

In [2]:
import urllib.request
from pathlib import Path

# Model download URL - Using LITE model for faster inference
MODEL_URL = 'https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/latest/pose_landmarker_lite.task'
MODEL_PATH = project_root / 'datasets' / 'pose_landmarker_lite.task'

# Download if not exists
if not MODEL_PATH.exists():
    print(f"Downloading MediaPipe pose model (LITE) to {MODEL_PATH}...")
    MODEL_PATH.parent.mkdir(parents=True, exist_ok=True)
    urllib.request.urlretrieve(MODEL_URL, MODEL_PATH)
    print(f"‚úÖ Model downloaded successfully ({MODEL_PATH.stat().st_size / 1024 / 1024:.1f} MB)")
else:
    print(f"‚úÖ Model already exists at {MODEL_PATH} ({MODEL_PATH.stat().st_size / 1024 / 1024:.1f} MB)")

# NOW import the preprocessing module (after model is available)
print("\nImporting preprocessing module...")
sys.path.insert(0, str(project_root / 'src'))
from preprocessing.preprocess_pose_RGB import extract_pose_estimates
print("‚úÖ Module imported successfully!")

‚úÖ Model already exists at /mnt/d/Graduation Project/ai-virtual-coach/datasets/pose_landmarker_lite.task (5.5 MB)

Importing preprocessing module...
‚úÖ Module imported successfully!


## Configuration

In [3]:
# Paths
CLIPS_PATH = project_root / 'datasets' / 'Clips'
OUTPUT_DIR = project_root / 'datasets' / 'Mediapipe pose estimates'

# Create output directory if it doesn't exist
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# Parameters
T_FIXED = 50  # Fixed length for temporal sequences
VIEWS = ['front', 'side']  # Views to process
VERSION = "lite_13_angles"  # Change to v2, v3, etc. for different versions

print(f"Clips directory: {CLIPS_PATH}")
print(f"Output directory: {OUTPUT_DIR}")

print(f"Fixed temporal length: {T_FIXED} frames")
print(f"Version: {VERSION}")

Clips directory: /mnt/d/Graduation Project/ai-virtual-coach/datasets/Clips
Output directory: /mnt/d/Graduation Project/ai-virtual-coach/datasets/Mediapipe pose estimates
Fixed temporal length: 50 frames
Version: lite_13_angles


## Process Front View

In [4]:
print("="*60)
print("PROCESSING FRONT VIEW")
print("="*60)

front_output_path = OUTPUT_DIR / 'pose_data_front.npz'

front_dataset, front_stats, front_failed = extract_pose_estimates(
    clips_path=str(CLIPS_PATH),
    view='front',
    T_fixed=T_FIXED,
    output_path=str(front_output_path),
    version_tag=VERSION
)

print("\n" + "="*60)
print("FRONT VIEW SUMMARY")
print("="*60)
print(f"Total reps extracted: {front_stats['total_reps']}")
print(f"Unique subjects: {front_stats['unique_subjects']}")
print(f"Unique exercises: {front_stats['unique_exercises']}")
print(f"Videos processed: {front_stats['total_videos_processed']}")
print(f"Total frames extracted: {front_stats['total_frames_extracted']}")
print(f"Failed videos: {front_stats['failed_videos']}")
print(f"\nTemporal features shape: {front_dataset['X_temporal'].shape}")
print(f"\n‚è±Ô∏è Tempo Statistics:")
print(f"  Duration (mean/median): {front_stats['tempo_stats']['duration_mean']:.2f}s / {front_stats['tempo_stats']['duration_median']:.2f}s")
print(f"  Frame count (mean/median): {front_stats['tempo_stats']['frame_count_mean']:.1f} / {front_stats['tempo_stats']['frame_count_median']:.0f}")
print(f"  FPS values: {front_stats['tempo_stats']['fps_unique']}")

INFO - Scanning clips directory for front view...


PROCESSING FRONT VIEW


Scanning front videos: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 15/15 [00:04<00:00,  3.55it/s]
INFO - [scan_clips_directory] Scanned front view: 147 samples, 1574 videos, 49 subjects, 15 exercises
INFO - Found 147 sample(s) to process
Extracting front poses:   0%|          | 0/147 [00:00<?, ?it/s]INFO - 
Processing: Dumbbell shoulder press / volunteer_001
W0000 00:00:1769174751.090988   18832 landmark_projection_calculator.cc:78] Using NORM_RECT without IMAGE_DIMENSIONS is only supported for the square ROI. Provide IMAGE_DIMENSIONS or use PROJECTION_MATRIX.
INFO -   ‚úì Extracted 11 rep(s)
Extracting front poses:   1%|          | 1/147 [00:10<24:47, 10.19s/it]INFO - 
Processing: Dumbbell shoulder press / volunteer_010
INFO -   ‚úì Extracted 10 rep(s)
Extracting front poses:   1%|‚ñè         | 2/147 [00:24<30:16, 12.53s/it]INFO - 
Processing: Dumbbell shoulder press / volunteer_002
INFO -   ‚úì Extracted 10 rep(s)
Extracting front poses:   2%|‚ñè         | 3/147 [00:37<30:58, 12.91s/it]INFO


FRONT VIEW SUMMARY
Total reps extracted: 1574
Unique subjects: 49
Unique exercises: 15
Videos processed: 1574
Total frames extracted: 136796
Failed videos: 0

Temporal features shape: (1574, 50, 13)

‚è±Ô∏è Tempo Statistics:
  Duration (mean/median): 2.53s / 2.37s
  Frame count (mean/median): 87.3 / 73
  FPS values: [23.148148, 23.809525, 24.0, 24.019608, 24.038462, 24.074074, 24.09091, 24.107143, 24.122807, 24.152542, 24.180328, 24.404762, 24.468084, 24.479166, 24.489796, 27.894737, 28.891376, 28.947369, 29.085873, 29.111841, 29.186604, 29.290617, 29.355078, 29.401089, 29.425837, 29.438002, 29.473684, 29.496819, 29.504131, 29.513035, 29.519451, 29.550941, 29.56327, 29.570747, 29.587482, 29.605263, 29.645542, 29.650461, 29.653402, 29.65675, 29.665071, 29.67742, 29.70297, 29.707602, 29.721363, 29.736841, 29.739397, 29.748283, 29.757086, 29.764065, 29.776674, 29.779112, 29.78177, 29.794827, 29.799852, 29.807692, 29.813665, 29.824562, 29.834253, 29.838709, 29.840656, 29.848595, 29.856459

## Process Side View

In [5]:
print("="*60)
print("PROCESSING SIDE VIEW")
print("="*60)

side_output_path = OUTPUT_DIR / 'pose_data_side.npz'

side_dataset, side_stats, side_failed = extract_pose_estimates(
    clips_path=str(CLIPS_PATH),
    view='side',
    T_fixed=T_FIXED,
    output_path=str(side_output_path),
    version_tag=VERSION
)

print("\n" + "="*60)
print("SIDE VIEW SUMMARY")
print("="*60)
print(f"Total reps extracted: {side_stats['total_reps']}")
print(f"Unique subjects: {side_stats['unique_subjects']}")
print(f"Unique exercises: {side_stats['unique_exercises']}")
print(f"Videos processed: {side_stats['total_videos_processed']}")
print(f"Total frames extracted: {side_stats['total_frames_extracted']}")
print(f"Failed videos: {side_stats['failed_videos']}")
print(f"\nTemporal features shape: {side_dataset['X_temporal'].shape}")
print(f"\n‚è±Ô∏è Tempo Statistics:")
print(f"  Duration (mean/median): {side_stats['tempo_stats']['duration_mean']:.2f}s / {side_stats['tempo_stats']['duration_median']:.2f}s")
print(f"  Frame count (mean/median): {side_stats['tempo_stats']['frame_count_mean']:.1f} / {side_stats['tempo_stats']['frame_count_median']:.0f}")
print(f"  FPS values: {side_stats['tempo_stats']['fps_unique']}")
print(f"\nüìÅ Output Files:")
print(f"  Temporal: {side_stats.get('temporal_file', 'N/A')}")

INFO - Scanning clips directory for side view...


PROCESSING SIDE VIEW


Scanning side videos: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 15/15 [00:03<00:00,  3.87it/s]
INFO - [scan_clips_directory] Scanned side view: 149 samples, 1571 videos, 49 subjects, 15 exercises
INFO - Found 149 sample(s) to process
Extracting side poses:   0%|          | 0/149 [00:00<?, ?it/s]INFO - 
Processing: Dumbbell shoulder press / volunteer_001
INFO -   ‚úì Extracted 11 rep(s)
Extracting side poses:   1%|          | 1/149 [00:15<38:51, 15.75s/it]INFO - 
Processing: Dumbbell shoulder press / volunteer_010
INFO -   ‚úì Extracted 10 rep(s)
Extracting side poses:   1%|‚ñè         | 2/149 [00:38<48:54, 19.96s/it]INFO - 
Processing: Dumbbell shoulder press / volunteer_002
INFO -   ‚úì Extracted 10 rep(s)
Extracting side poses:   2%|‚ñè         | 3/149 [00:52<41:35, 17.09s/it]INFO - 
Processing: Dumbbell shoulder press / volunteer_003
INFO -   ‚úì Extracted 11 rep(s)
Extracting side poses:   3%|‚ñé         | 4/149 [01:10<42:05, 17.42s/it]INFO - 
Processing: Dumbbell shoulder press / volun


SIDE VIEW SUMMARY
Total reps extracted: 1571
Unique subjects: 49
Unique exercises: 15
Videos processed: 1571
Total frames extracted: 126361
Failed videos: 0

Temporal features shape: (1571, 50, 13)

‚è±Ô∏è Tempo Statistics:
  Duration (mean/median): 2.53s / 2.37s
  Frame count (mean/median): 82.2 / 73
  FPS values: [28.921568, 29.23077, 29.282297, 29.285715, 29.387754, 29.425837, 29.519451, 29.545454, 29.571428, 29.605263, 29.61039, 29.620564, 29.637096, 29.642857, 29.651163, 29.65909, 29.662077, 29.68421, 29.6875, 29.719625, 29.727272, 29.732143, 29.73913, 29.741379, 29.76704, 29.767443, 29.774437, 29.77612, 29.777779, 29.8, 29.850746, 29.864254, 29.885057, 29.91453, 29.916897, 29.918034, 29.938482, 29.962547, 29.96942, 29.97003, 29.971182, 29.972248, 29.972752, 29.973238, 29.973707, 29.974598, 29.97543, 29.97729, 29.979181, 29.983343, 29.98405, 29.984545, 29.996, 29.997, 30.0, 30.003, 30.01305, 30.017, 30.026, 30.027298, 30.031948, 30.033, 30.034, 30.038511, 30.069124, 30.11583, 30.

### Failed Videos (Side)

In [7]:
if side_failed:
    print(f"\n‚ö†Ô∏è {len(side_failed)} samples failed to process:")
    for item in side_failed[:10]:  # Show first 10
        print(f"  - {item['exercise']} / {item['subject']}: {item['error']}")
    if len(side_failed) > 10:
        print(f"  ... and {len(side_failed) - 10} more")
else:
    print("‚úÖ All side view samples processed successfully!")

‚úÖ All side view samples processed successfully!


## Combined Summary

In [None]:
print("="*60)
print("COMBINED SUMMARY (FRONT + SIDE)")
print("="*60)

total_reps = front_stats['total_reps'] + side_stats['total_reps']
total_videos = front_stats['total_videos_processed'] + side_stats['total_videos_processed']
total_frames = front_stats['total_frames_extracted'] + side_stats['total_frames_extracted']
total_failed = front_stats['failed_videos'] + side_stats['failed_videos']

# Combined unique counts
all_subjects = set(front_dataset['subject_ids']) | set(side_dataset['subject_ids'])
all_exercises = set(front_dataset['exercise_names']) | set(side_dataset['exercise_names'])

print(f"\nüìÑ Dataset Statistics:")
print(f"  Total reps: {total_reps}")
print(f"  Total videos: {total_videos}")
print(f"  Total frames: {total_frames}")
print(f"  Unique volunteers: {len(all_subjects)}")
print(f"  Unique exercises: {len(all_exercises)}")
print(f"  Failed samples: {total_failed}")

print(f"\nüìÅ Output Files:")
print(f"  Front Temporal: {front_stats.get('temporal_file', 'N/A')}")
print(f"  Side Temporal:  {side_stats.get('temporal_file', 'N/A')}")

print(f"\n‚úÖ Preprocessing complete!")

COMBINED SUMMARY (FRONT + SIDE)

üìä Dataset Statistics:
  Total reps: 3145
  Total videos: 3145
  Total frames: 263157
  Unique volunteers: 49
  Unique exercises: 15
  Failed samples: 0

üìÅ Output Files:
  Front Static:   /mnt/d/Graduation_Project/ai-virtual-coach/datasets/Mediapipe pose estimates/pose_data_front_static_v2.npz
  Front Temporal: /mnt/d/Graduation_Project/ai-virtual-coach/datasets/Mediapipe pose estimates/pose_data_front_temporal_v2.npz
  Side Static:    /mnt/d/Graduation_Project/ai-virtual-coach/datasets/Mediapipe pose estimates/pose_data_side_static_v2.npz
  Side Temporal:  /mnt/d/Graduation_Project/ai-virtual-coach/datasets/Mediapipe pose estimates/pose_data_side_temporal_v2.npz

‚úÖ Preprocessing complete!


## Verify Output Files

In [None]:
# Load and inspect saved temporal files
print("Verifying saved temporal NPZ files...\n")

for view in ['front', 'side']:
    stats = front_stats if view == 'front' else side_stats
    
    # Verify temporal file
    temporal_file = stats.get('temporal_file')
    if temporal_file and os.path.exists(temporal_file):
        print(f"\n{view.upper()} VIEW - TEMPORAL:")
        data = np.load(temporal_file, allow_pickle=True)
        print(f"  File: {os.path.basename(temporal_file)}")
        print(f"  Keys: {list(data.keys())}")
        print(f"  X_temporal: {data['X_temporal'].shape} - {data['X_temporal'].dtype}")
        print(f"  exercise_names: {data['exercise_names'].shape}")
        print(f"  subject_ids: {data['subject_ids'].shape}")
        print(f"  tempo_duration_sec: {data['tempo_duration_sec'].shape}")
        print(f"  tempo_frame_count: {data['tempo_frame_count'].shape}")
        print(f"  tempo_fps: {data['tempo_fps'].shape}")
        print(f"  view: {data['view']}")
        print(f"  T_fixed: {data['T_fixed']}")
        print(f"  angle_names: {list(data['angle_names'])}")

print("\n‚úÖ All files verified successfully!")

Verifying saved NPZ files...


FRONT VIEW - STATIC:
  File: pose_data_front_static_v2.npz
  Keys: ['X_static', 'exercise_names', 'subject_ids', 'tempo_duration_sec', 'tempo_frame_count', 'tempo_fps', 'view', 'angle_names']
  X_static: (1574, 45) - float32
  exercise_names: (1574,)
  subject_ids: (1574,)
  tempo_duration_sec: (1574,)
  tempo_frame_count: (1574,)
  tempo_fps: (1574,)
  view: front
  angle_names: ['left_elbow', 'right_elbow', 'left_shoulder', 'right_shoulder', 'left_hip', 'right_hip', 'left_knee', 'right_knee', 'torso_lean']

FRONT VIEW - TEMPORAL:
  File: pose_data_front_temporal_v2.npz
  Keys: ['X_temporal', 'exercise_names', 'subject_ids', 'tempo_duration_sec', 'tempo_frame_count', 'tempo_fps', 'view', 'T_fixed', 'angle_names']
  X_temporal: (1574, 80, 9) - float32
  exercise_names: (1574,)
  subject_ids: (1574,)
  tempo_duration_sec: (1574,)
  tempo_frame_count: (1574,)
  tempo_fps: (1574,)
  view: front
  T_fixed: 80
  angle_names: ['left_elbow', 'right_elbow', 'left

## Sample Data Inspection

In [None]:
# Inspect first few samples
print("Sample data from front view:\n")
print("First 5 exercises:")
for i in range(min(5, len(front_dataset['exercise_names']))):
    print(f"  {i+1}. {front_dataset['exercise_names'][i]} (Subject {front_dataset['subject_ids'][i]})")

print("\nTemporal features (first rep, first 5 frames):")
print(front_dataset['X_temporal'][0, :5, :])

Sample data from front view:

First 5 exercises:
  1. Dumbbell shoulder press (Subject 1)
  2. Dumbbell shoulder press (Subject 1)
  3. Dumbbell shoulder press (Subject 1)
  4. Dumbbell shoulder press (Subject 1)
  5. Dumbbell shoulder press (Subject 1)

Static features (first rep, first 10 values):
[128.39922   15.510805 101.697075 148.72682   47.029747 131.1648
  19.645264  85.70126  161.67555   75.97429 ]

Temporal features (first rep, first 5 frames):
[[125.94949  131.30444  105.06773   94.97147  160.69958  130.76443
  143.02394  166.71664   16.87413 ]
 [116.06957  129.82344  101.49968   97.248924 164.11037  133.51707
  134.34184  167.17041   16.920328]
 [114.22898  130.19913  100.61363   98.5807   164.7034   135.01562
  132.62128  167.30406   17.278519]
 [114.594    130.73834  100.55692   99.13356  164.43755  135.65314
  132.97652  167.35484   17.50489 ]
 [115.18252  131.93718  100.53263   99.74887  164.18378  136.31961
  133.1552   167.35368   17.760881]]
