# üèãÔ∏è Fitness-AQA Vision Pipeline (Google Colab)

This notebook extracts **2D pose keypoints** from exercise videos using **MMPose**.

## üìã What This Does:
1. Checks Python version compatibility
2. Installs MMPose and dependencies (with version pinning)
3. Uploads your video (or uses a sample)
4. Extracts 17 COCO keypoints per frame
5. Applies Savitzky-Golay smoothing
6. Normalizes coordinates (optional)
7. Saves output as `.json` for the modeling team

---

## ‚öôÔ∏è Setup Instructions:
1. **Runtime ‚Üí Change runtime type ‚Üí GPU (T4)**
2. Run all cells in order
3. Upload your video when prompted
4. Download the output JSON

---

## üêç Step 1: Check Python Version

MMPose works best with Python 3.8-3.10. Colab typically uses 3.10.

In [None]:
import sys
print(f"Current Python version: {sys.version}")

python_version = sys.version_info
if python_version.major == 3 and 8 <= python_version.minor <= 10:
    print("‚úÖ Python version is compatible with MMPose!")
elif python_version.major == 3 and python_version.minor > 10:
    print("‚ö†Ô∏è  Python version might be too new. Installing compatibility fixes...")
else:
    print("‚ùå Python version incompatible. Please use Python 3.8-3.10.")

## üì¶ Step 2: Install Dependencies

Installing MMPose with version pinning for maximum compatibility.

In [None]:
# Upgrade pip and setuptools
!pip install --upgrade pip setuptools wheel -q

# Install OpenMIM
!pip install -U openmim -q

# Install MMPose stack with version constraints
!mim install mmengine -q
!mim install "mmcv>=2.0.0,<2.2.0" -q
!mim install "mmdet>=3.0.0" -q
!mim install "mmpose>=1.0.0" -q

# Install signal processing libraries
!pip install scipy opencv-python matplotlib -q

# Pin numpy to avoid binary incompatibility
!pip install "numpy<2.0.0" -q

print("‚úÖ All dependencies installed successfully!")

## üì§ Step 3: Upload Your Video

Click "Choose Files" and upload your `.mp4` video.

In [None]:
from google.colab import files
import os

uploaded = files.upload()
video_path = list(uploaded.keys())[0]
print(f"‚úÖ Uploaded: {video_path}")

## üîß Step 4: Define the Vision Pipeline

The same `PoseExtractor` class from your local `video_processor.py`.

In [None]:
import json
import logging
import numpy as np
import cv2
from scipy.signal import savgol_filter
from mmpose.apis import MMPoseInferencer

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(message)s')
logger = logging.getLogger(__name__)

class PoseExtractor:
    def __init__(self, mode='human', device='cuda'):
        logger.info(f"Initializing MMPose (device={device})...")
        self.inferencer = MMPoseInferencer(mode, device=device)

    def smooth_signal(self, keypoints, window_length=5, polyorder=2):
        logger.info("Applying Savitzky-Golay smoothing...")
        if len(keypoints) < window_length:
            logger.warning(f"Not enough frames ({len(keypoints)}) for smoothing. Returning raw.")
            return keypoints
            
        smoothed = np.zeros_like(keypoints)
        for i in range(keypoints.shape[1]):
            smoothed[:, i, 0] = savgol_filter(keypoints[:, i, 0], window_length, polyorder)
            smoothed[:, i, 1] = savgol_filter(keypoints[:, i, 1], window_length, polyorder)
        return smoothed

    def normalize_signal(self, keypoints):
        logger.info("Normalizing based on torso length...")
        normalized = np.zeros_like(keypoints)
        
        for f in range(len(keypoints)):
            frame_kps = keypoints[f]
            mid_shoulder = (frame_kps[5] + frame_kps[6]) / 2
            mid_hip = (frame_kps[11] + frame_kps[12]) / 2
            torso_len = np.linalg.norm(mid_shoulder - mid_hip)
            
            scale = 1.0 if torso_len < 1e-3 else 1.0 / torso_len
            normalized[f] = (frame_kps - mid_hip) * scale
        return normalized

    def process_video(self, video_path, output_path=None):
        if not os.path.exists(video_path):
            raise FileNotFoundError(f"Video {video_path} not found")
            
        logger.info(f"Processing: {video_path}")
        result_generator = self.inferencer(video_path, return_vis=False)
        
        raw_keypoints, scores = [], []
        for result in result_generator:
            preds = result['predictions']
            if preds and len(preds) > 0:
                raw_keypoints.append(preds[0]['keypoints'])
                scores.append(preds[0]['keypoint_scores'])
            else:
                raw_keypoints.append(np.zeros((17, 2)))
                scores.append(np.zeros(17))

        raw_keypoints = np.array(raw_keypoints)
        scores = np.array(scores)
        
        logger.info(f"Extracted {len(raw_keypoints)} frames")
        smoothed = self.smooth_signal(raw_keypoints)
        normalized = self.normalize_signal(smoothed)
        
        data = {
            "video_id": os.path.basename(video_path),
            "frame_count": len(raw_keypoints),
            "raw_keypoints": raw_keypoints.tolist(),
            "smoothed_keypoints": smoothed.tolist(),
            "normalized_keypoints": normalized.tolist(),
            "scores": scores.tolist()
        }
        
        if output_path:
            with open(output_path, 'w') as f:
                json.dump(data, f)
            logger.info(f"Saved to {output_path}")
        return data

print("‚úÖ PoseExtractor loaded!")

## üöÄ Step 5: Run the Pipeline

In [None]:
extractor = PoseExtractor(mode='human', device='cuda')
output_file = 'analysis.json'
result = extractor.process_video(video_path, output_path=output_file)

print(f"\n‚úÖ Processing complete!")
print(f"üìä Frames: {result['frame_count']}")
print(f"üíæ Saved to: {output_file}")

## üìä Step 6: Visualize Results

In [None]:
import matplotlib.pyplot as plt

cap = cv2.VideoCapture(video_path)
ret, frame = cap.read()
cap.release()

if ret:
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    keypoints = np.array(result['smoothed_keypoints'][0])
    
    plt.figure(figsize=(12, 8))
    plt.imshow(frame_rgb)
    plt.scatter(keypoints[:, 0], keypoints[:, 1], c='red', s=100, marker='o', edgecolors='white', linewidths=2)
    
    for i, (x, y) in enumerate(keypoints):
        plt.text(x, y, str(i), color='yellow', fontsize=10, ha='center', va='center', weight='bold')
    
    plt.title("Frame 0 - Detected Keypoints (Smoothed)", fontsize=16)
    plt.axis('off')
    plt.tight_layout()
    plt.show()
    
    print("\nKeypoint Index Reference:")
    for i, name in enumerate(["Nose", "L-Eye", "R-Eye", "L-Ear", "R-Ear", "L-Shoulder", "R-Shoulder",
                               "L-Elbow", "R-Elbow", "L-Wrist", "R-Wrist", "L-Hip", "R-Hip",
                               "L-Knee", "R-Knee", "L-Ankle", "R-Ankle"]):
        print(f"  {i}: {name}")

## üíæ Step 7: Download Output

In [None]:
files.download('analysis.json')
print("‚úÖ Download started! Check your browser downloads.")

## üìà (Optional) Plot Trajectory

In [None]:
# Plot left wrist Y-coordinate over time
left_wrist_idx = 9
raw_y = [kp[left_wrist_idx][1] for kp in result['raw_keypoints']]
smoothed_y = [kp[left_wrist_idx][1] for kp in result['smoothed_keypoints']]

plt.figure(figsize=(14, 6))
plt.plot(raw_y, 'r-', alpha=0.4, linewidth=1, label='Raw (Jittery)')
plt.plot(smoothed_y, 'b-', linewidth=2.5, label='Smoothed')
plt.xlabel('Frame', fontsize=12)
plt.ylabel('Y Coordinate (pixels)', fontsize=12)
plt.title('Left Wrist Movement - Smoothing Effect', fontsize=14)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nüìä Smoothing removes camera shake while preserving actual motion.")

---

## ‚úÖ Complete!

**What you have:**
- ‚úÖ `analysis.json` ready for Vishal
- ‚úÖ Visualization confirming extraction quality
- ‚úÖ Smoothing comparison showing signal processing

**Next steps:**
1. Send `analysis.json` to Vishal
2. Point him to `HANDOFF_TO_VISHAL.md` on GitHub
3. He can now build his model data loader!

**GitHub:** https://github.com/JCHETAN26/Form-Analyser
