# üèãÔ∏è Fitness-AQA Vision Pipeline (Google Colab)

This notebook extracts **2D pose keypoints** from exercise videos using **MMPose**.

## üìã What This Does:
1. Installs MMPose and dependencies
2. Uploads your video (or uses a sample)
3. Extracts 17 COCO keypoints per frame
4. Applies Savitzky-Golay smoothing
5. Normalizes coordinates (optional)
6. Saves output as `.json` for the modeling team

---

## ‚öôÔ∏è Setup Instructions:
1. **Runtime ‚Üí Change runtime type ‚Üí GPU (T4)**
2. Run all cells in order
3. Upload your video when prompted
4. Download the output JSON

---

## üì¶ Step 1: Install Dependencies

This cell installs MMPose, MMDetection, and required libraries.

In [None]:
!pip install -U openmim
!mim install mmengine "mmcv>=2.0.0" "mmdet>=3.0.0" "mmpose>=1.0.0"
!pip install scipy opencv-python matplotlib

## üì§ Step 2: Upload Your Video

Click the "Choose Files" button and upload your `.mp4` video.

In [None]:
from google.colab import files
import os

uploaded = files.upload()
video_path = list(uploaded.keys())[0]
print(f"‚úÖ Uploaded: {video_path}")

## üîß Step 3: Define the Vision Pipeline

This is the same `PoseExtractor` class from your local `video_processor.py`.

In [None]:
import os
import json
import logging
import numpy as np
import cv2
from scipy.signal import savgol_filter
from mmpose.apis import MMPoseInferencer

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

class PoseExtractor:
    def __init__(self, mode='human', device='cuda'):
        logger.info(f"Initializing MMPoseInferencer (mode={mode}, device={device})...")
        self.inferencer = MMPoseInferencer(mode, device=device)

    def smooth_signal(self, keypoints, window_length=5, polyorder=2):
        logger.info("Applying Savitzky-Golay smoothing...")
        if len(keypoints) < window_length:
            logger.warning(f"Not enough frames to smooth (got {len(keypoints)}, need {window_length}). Returning raw.")
            return keypoints
            
        smoothed_keypoints = np.zeros_like(keypoints)
        num_points = keypoints.shape[1]
        
        for i in range(num_points):
            smoothed_keypoints[:, i, 0] = savgol_filter(keypoints[:, i, 0], window_length, polyorder)
            smoothed_keypoints[:, i, 1] = savgol_filter(keypoints[:, i, 1], window_length, polyorder)
            
        return smoothed_keypoints

    def normalize_signal(self, keypoints):
        logger.info("Normalizing signal based on torso length...")
        normalized_keypoints = np.zeros_like(keypoints)
        
        for f in range(len(keypoints)):
            frame_kps = keypoints[f]
            l_shoulder = frame_kps[5]
            r_shoulder = frame_kps[6]
            l_hip = frame_kps[11]
            r_hip = frame_kps[12]
            
            mid_shoulder = (l_shoulder + r_shoulder) / 2
            mid_hip = (l_hip + r_hip) / 2
            torso_len = np.linalg.norm(mid_shoulder - mid_hip)
            
            if torso_len < 1e-3:
                scale = 1.0
            else:
                scale = 1.0 / torso_len
            
            centered = frame_kps - mid_hip
            normalized_keypoints[f] = centered * scale
            
        return normalized_keypoints

    def process_video(self, video_path, output_path=None, visualize=False):
        if not os.path.exists(video_path):
            raise FileNotFoundError(f"Video {video_path} not found.")
            
        logger.info(f"Processing video: {video_path}")
        result_generator = self.inferencer(video_path, return_vis=visualize)
        
        raw_keypoints = []
        scores = []
        
        for result in result_generator:
            preds = result['predictions']
            if preds and len(preds) > 0:
                raw_keypoints.append(preds[0]['keypoints'])
                scores.append(preds[0]['keypoint_scores'])
            else:
                raw_keypoints.append(np.zeros((17, 2)))
                scores.append(np.zeros(17))

        raw_keypoints = np.array(raw_keypoints)
        scores = np.array(scores)
        
        logger.info(f"Raw data shape: {raw_keypoints.shape}")
        smoothed_keypoints = self.smooth_signal(raw_keypoints)
        normalized_keypoints = self.normalize_signal(smoothed_keypoints)
        
        data_packet = {
            "video_id": os.path.basename(video_path),
            "frame_count": len(raw_keypoints),
            "raw_keypoints": raw_keypoints.tolist(),
            "smoothed_keypoints": smoothed_keypoints.tolist(),
            "normalized_keypoints": normalized_keypoints.tolist(),
            "scores": scores.tolist()
        }
        
        if output_path:
            with open(output_path, 'w') as f:
                json.dump(data_packet, f)
            logger.info(f"Saved processed data to {output_path}")
            
        return data_packet

print("‚úÖ PoseExtractor class loaded!")

## üöÄ Step 4: Run the Pipeline

This cell processes your video and saves the output as `analysis.json`.

In [None]:
# Initialize the extractor (uses GPU if available)
extractor = PoseExtractor(mode='human', device='cuda')

# Process the video
output_file = 'analysis.json'
result = extractor.process_video(video_path, output_path=output_file, visualize=False)

print(f"\n‚úÖ Processing complete!")
print(f"üìä Frames processed: {result['frame_count']}")
print(f"üíæ Output saved to: {output_file}")

## üìä Step 5: Preview the Results

Let's visualize a single frame to verify the extraction worked.

In [None]:
import matplotlib.pyplot as plt

# Load the video and extract frame 0
cap = cv2.VideoCapture(video_path)
ret, frame = cap.read()
cap.release()

if ret:
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    
    # Get keypoints for frame 0
    keypoints = np.array(result['smoothed_keypoints'][0])
    
    # Plot
    plt.figure(figsize=(12, 8))
    plt.imshow(frame_rgb)
    plt.scatter(keypoints[:, 0], keypoints[:, 1], c='red', s=50, marker='o')
    
    # Annotate keypoints
    keypoint_names = [
        "Nose", "L-Eye", "R-Eye", "L-Ear", "R-Ear",
        "L-Shoulder", "R-Shoulder", "L-Elbow", "R-Elbow",
        "L-Wrist", "R-Wrist", "L-Hip", "R-Hip",
        "L-Knee", "R-Knee", "L-Ankle", "R-Ankle"
    ]
    
    for i, (x, y) in enumerate(keypoints):
        plt.text(x, y, str(i), color='yellow', fontsize=8, ha='center', va='center')
    
    plt.title("Frame 0 - Detected Keypoints (Smoothed)")
    plt.axis('off')
    plt.tight_layout()
    plt.show()
    
    print("\nüìå Keypoint Reference:")
    for i, name in enumerate(keypoint_names):
        print(f"  {i}: {name}")
else:
    print("‚ùå Failed to read video frame")

## üíæ Step 6: Download the Output

Click the download link to get `analysis.json` for Vishal.

In [None]:
from google.colab import files

files.download('analysis.json')
print("‚úÖ Download started! Check your browser's download folder.")

## üìà (Optional) Step 7: Plot Trajectory

Visualize how a single keypoint moves over time (useful for debugging).

In [None]:
# Plot the Y-coordinate of the left wrist over time
left_wrist_idx = 9
raw_y = [kp[left_wrist_idx][1] for kp in result['raw_keypoints']]
smoothed_y = [kp[left_wrist_idx][1] for kp in result['smoothed_keypoints']]

plt.figure(figsize=(12, 5))
plt.plot(raw_y, 'r-', alpha=0.3, label='Raw (Jittery)')
plt.plot(smoothed_y, 'b-', linewidth=2, label='Smoothed')
plt.xlabel('Frame')
plt.ylabel('Y Coordinate (pixels)')
plt.title('Left Wrist Movement - Raw vs Smoothed')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nüìä This shows how smoothing removes camera jitter while preserving motion.")

---

## ‚úÖ Done!

You now have:
1. ‚úÖ **`analysis.json`** - Ready to send to Vishal
2. ‚úÖ **Visualization** - Confirming the pipeline works
3. ‚úÖ **Smoothing comparison** - Showing signal quality

### üì¨ Next Steps:
- Share `analysis.json` with Vishal
- Point him to `HANDOFF_TO_VISHAL.md` in the GitHub repo
- He can now start building his model data loader!

**GitHub Repo:** https://github.com/JCHETAN26/Form-Analyser
