# <font style="color:blue"> Project Description: Automatic Analysis and Evaluation of Push-Ups in a Video using Pose Estimation</font>

## Project Goal

The purpose of this notebook is to develop a program that leverages a pre-trained pose estimation model (Keypoint R-CNN from the Detectron2 framework) to **automatically count push-ups** performed in a video and **evaluate their correctness**.

The model detects key body joints (e.g., shoulders, elbows) and allows us to track body movement frame-by-frame. Based on this, the system can:
- Count each full push-up repetition

## Application Scope

This project is designed to run **exclusively on a pre-recorded video file** (e.g., `.mp4`). It does not require live input from a webcam. This setup makes the project ideal for:
- Post-session training analysis
- Sports science demonstrations
- Educational use in computer vision and human activity recognition

## Workflow Overview

1. **Setup**: Install and configure Detectron2 in Google Colab.
2. **Import Libraries**:
3. **Model Configuration**: Load a pre-trained Keypoint R-CNN model (trained on COCO).
4. **Helper functions**:

## Requirements

- A suitable push-up video with clear, full-body visibility
- Google Colab (with GPU support recommended for faster inference)
- Interest in combining fitness and AI for movement analysis

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## <font style="color:green">**Setup Code**</font>

To use the detectron2's densepose training module, we will setup the detectron2 code.

In [2]:
# install dependencies
!pip install -U torch torchvision cython
!pip install -U 'git+https://github.com/facebookresearch/fvcore.git' 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
import torch, torchvision
torch.__version__

Collecting git+https://github.com/facebookresearch/fvcore.git
  Cloning https://github.com/facebookresearch/fvcore.git to /tmp/pip-req-build-otgvc5je
  Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/fvcore.git /tmp/pip-req-build-otgvc5je
  Resolved https://github.com/facebookresearch/fvcore.git to commit 3b2d62f06b22ef743ac394e568e1e87ae12b30a8
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI
  Cloning https://github.com/cocodataset/cocoapi.git to /tmp/pip-req-build-clz3ld4u
  Running command git clone --filter=blob:none --quiet https://github.com/cocodataset/cocoapi.git /tmp/pip-req-build-clz3ld4u
  Resolved https://github.com/cocodataset/cocoapi.git to commit 8c9bcc3cf640524c4c20a9c40e89cb6a2f2fa0e9
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: fvcore, pycocotools
  Building wheel for fvcore (setup.py) ... [?25l

'2.8.0+cu126'

In [3]:
!git clone https://github.com/facebookresearch/detectron2 detectron2
!pip install -e detectron2

fatal: destination path 'detectron2' already exists and is not an empty directory.
Obtaining file:///content/detectron2
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pycocotools>=2.0.2 (from detectron2==0.6)
  Using cached pycocotools-2.0.10-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.3 kB)
Collecting fvcore<0.1.6,>=0.1.5 (from detectron2==0.6)
  Using cached fvcore-0.1.5.post20221221-py3-none-any.whl
Using cached pycocotools-2.0.10-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (397 kB)
Installing collected packages: pycocotools, fvcore, detectron2
  Attempting uninstall: pycocotools
    Found existing installation: pycocotools 2.0
    Uninstalling pycocotools-2.0:
      Successfully uninstalled pycocotools-2.0
  Attempting uninstall: fvcore
    Found existing installation: fvcore 0.1.6
    Uninstalling fvcore-0.1.6:
      Successfully uninstalled fvcore-0.1.6
  Attempting uninstall: detectron2
    Found existing installation: det

## <font style="color:green">**Import Libraries**</font>

In [4]:
# You may need to restart your runtime prior to this, to let your installation take effect
# Some basic setup:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import numpy as np
import cv2
import random
import matplotlib.pyplot as plt
import os
import time

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog

## <font style="color:green">**Model Configuration**: Load a pre-trained Keypoint R-CNN model (trained on COCO).</font>

Here, we will import detectron2's Keypoint RCNN model for keypoints detection.

- Import default config
- Import model config file and weights file
- Set threshold for the model as 0.5
- Initiate default predictor object with the above config

In [5]:
start = time.time()
cfg = get_cfg()
# add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
cfg.merge_from_file(model_zoo.get_config_file("COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml"))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
# Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml")
predictor = DefaultPredictor(cfg)
model_load_done = time.time()
print("model_load", model_load_done - start)

[08/22 06:40:07 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from https://dl.fbaipublicfiles.com/detectron2/COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x/137849621/model_final_a6e10b.pkl ...


model_final_a6e10b.pkl: 237MB [00:00, 255MB/s]                           


model_load 2.2648873329162598


## <font style="color:green">**Helper functions**:</font>

## <font style="color:green"> Helper functions</font>

## COCO Keypoint Index Mapping (Used in Detectron2's Keypoint R-CNN)

| Index | Keypoint Name     | Description            |
|-------|-------------------|------------------------|
| 0     | `nose`            | Nose                   |
| 1     | `left_eye`        | Left eye               |
| 2     | `right_eye`       | Right eye              |
| 3     | `left_ear`        | Left ear               |
| 4     | `right_ear`       | Right ear              |
| 5     | `left_shoulder`   | Left shoulder          |
| 6     | `right_shoulder`  | Right shoulder         |
| 7     | `left_elbow`      | Left elbow             |
| 8     | `right_elbow`     | Right elbow            |
| 9     | `left_wrist`      | Left wrist             |
| 10    | `right_wrist`     | Right wrist            |
| 11    | `left_hip`        | Left hip               |
| 12    | `right_hip`       | Right hip              |
| 13    | `left_knee`       | Left knee              |
| 14    | `right_knee`      | Right knee             |
| 15    | `left_ankle`      | Left ankle             |
| 16    | `right_ankle`     | Right ankle            |

### Format:
Each keypoint is represented as a triplet:
- `x`, `y`: pixel coordinates
- `confidence`: detection score between 0.0 and 1.0

In [6]:
def findPersonIndicies(scores):
    return [i for i, s in enumerate(scores) if s > 0.9]

For the selected persons, collects the required key points among 17 key points

    5-Left Shoulder-0
    6-Right Shoulder-1
    7-Left Ellbow-2
    8-Right Ellbow-3
    9-Left Wrist-4
    10-Right Wrist-5

In [7]:
def filterPersons(outputs):
    persons = {}
    pIndicies = findPersonIndicies(outputs["instances"].scores)

    for x in pIndicies:
        desired_kp = outputs["instances"].pred_keypoints[x][5:].to("cpu")
        persons[x] = desired_kp

    return (persons, pIndicies)

In [8]:
def drawLine(image, P1, P2, color):
    # Convert P1 and P2 to integer tuples
    P1 = (int(P1[0]), int(P1[1]))
    P2 = (int(P2[0]), int(P2[1]))
    cv2.line(image, P1, P2, color, thickness=3, lineType=8)

In [21]:
def putTextOnImage(image, text, X, Y, color):
    font = cv2.FONT_HERSHEY_SIMPLEX
    font_scale = 2
    font_thickness = 2

    cv2.putText(image, text,(X, Y),font, font_scale, color, font_thickness,cv2.LINE_AA)

In [10]:
def findSlope(x1, y1, x2, y2):
    return float(y2-y1)/(x2-x1)

In [11]:
import math

def findAngle(x1, y1, x2, y2):
    return math.atan2(y1 - y2, x1 - x2)

In [12]:
def findAngleBtLines(m1, m2):
    PI = 3.14
    angle = math.atan((m2 -  m1)/(1 + m1*m2))

    return (angle*180)/PI

In [13]:
def calculate_angle(a, b, c):
    """Calculate the angle at point b formed by points a-b-c"""
    a = np.array(a)
    b = np.array(b)
    c = np.array(c)

    ba = a - b
    bc = c - b

    cosine_angle = np.dot(ba, bc) / (np.linalg.norm(ba) * np.linalg.norm(bc) + 1e-7)
    angle = np.arccos(np.clip(cosine_angle, -1.0, 1.0))
    return np.degrees(angle)

In [14]:
#5-Left Shoulder-0
#6-Right Shoulder-1
#7-Left Ellbow-2
#8-Right Ellbow-3
#9-Left Wrist-4
#10-Right Wrist-5
kp_mapping = {"Left Shoulder": 0, "Right Shoulder": 1, "Left Ellbow": 2, "Right Ellbow": 3, "Left Wrist": 4, "Right Wrist": 5}

def drawKeypoints(outputs, im):
    persons, pIndicies = filterPersons(outputs)
    img = im.copy()

    angles_output = {}

    for i in pIndicies:
        l_arr1 = persons[i][2]
        l_arr2 = persons[i][4]
        l_arr3 = persons[i][0]
        r_arr1 = persons[i][3]
        r_arr2 = persons[i][5]
        r_arr3 = persons[i][1]

        #print("Left Shoulder", l_arr3)
        #print("Left Ellbow", l_arr1)
        #print("Left Wrist", l_arr2)

        angle_left = calculate_angle(l_arr3, l_arr1, l_arr2)
        angle_right = calculate_angle(r_arr3, r_arr1, r_arr2)

        #print("Angle", angle)

        left_ka_slope = findSlope(l_arr1[0], l_arr1[1], l_arr2[0], l_arr2[1])
        left_kh_slope = findSlope(l_arr3[0], l_arr3[1], l_arr1[0], l_arr1[1])
        right_ka_slope = findSlope(r_arr1[0], r_arr1[1], r_arr2[0], r_arr2[1])
        right_kh_slope = findSlope(r_arr3[0], r_arr3[1], r_arr1[0], r_arr1[1])
        kk_slope = findSlope(r_arr1[0], r_arr1[1], l_arr1[0], l_arr1[1])

        angle_btw_knees = findAngleBtLines(right_ka_slope, left_ka_slope)
        left_hk_angle = findAngleBtLines(kk_slope, left_kh_slope)
        right_hk_angle = findAngleBtLines(right_kh_slope, kk_slope)

        angles_output[i] = [angle_right, angle_left, angle_btw_knees]

        #Considering only one person
        if i == 0:
            if not math.isnan(angle_btw_knees):
                knees_ctr_pt = (np.array(l_arr2) + np.array(r_arr2))/2
                putTextOnImage(img, str(int(angle_btw_knees)), int(knees_ctr_pt[0]) - 10, int(knees_ctr_pt[1]),
                               (0,255,0))

            if not math.isnan(left_hk_angle):
                left_hk_pt = l_arr1
                putTextOnImage(img, str(int(left_hk_angle)), int(left_hk_pt[0]) + 10, int(left_hk_pt[1]),
                               (255,255,0))

            if not math.isnan(right_hk_angle):
                right_hk_pt = r_arr1
                putTextOnImage(img, str(int(right_hk_angle)), int(right_hk_pt[0]) - 40, int(right_hk_pt[1]),
                               (255,255,0))

            ##Draw left knee ankle line
            drawLine(img, (l_arr1[0], l_arr1[1]), (l_arr2[0], l_arr2[1]), (0, 255, 0))

            ##Draw left hip knee line
            drawLine(img, (l_arr3[0], l_arr3[1]), (l_arr1[0], l_arr1[1]), (255, 255, 0))

            ##Draw right knee ankle line
            drawLine(img, (r_arr1[0], r_arr1[1]), (r_arr2[0], r_arr2[1]), (0, 255, 0))

            ##Draw right hip knee line
            drawLine(img, (r_arr3[0], r_arr3[1]), (r_arr1[0], r_arr1[1]), (255, 255, 0))

            ##Draw knees connecting and hips connecting line
            drawLine(img, (r_arr1[0], r_arr1[1]), (l_arr1[0], l_arr1[1]), (255, 255, 0))
            drawLine(img, (r_arr3[0], r_arr3[1]), (l_arr3[0], l_arr3[1]), (255, 0, 0))

    return img, angles_output

In [15]:
def predict(im):
    model_start = time.time()
    outputs = predictor(im)
    model_out = time.time()
    # print("model output time", model_out - model_start)
    out, angles_out = drawKeypoints(outputs, im)
    # print("process and draw output", time.time() - model_out)

    return out, angles_out

## <font style="color:green">Inference on Video</font>



In [19]:
#videoPath = "/content/drive/MyDrive/DLPT/Project5/PushUp2.mp4"
#videoPath = "/content/drive/MyDrive/DLPT/Project5/IMG_8838.MOV"
videoPath = "/content/drive/MyDrive/DLPT/Project5/IMG_8835.MOV"

In [22]:
def inferenceOnVideo(videoPath):
    cap = cv2.VideoCapture(videoPath)
    cnt = 0
    n_frame = 2
    state = "up"

    output_frames = []
    prev_val = -1
    push_up_cnt = 0
    prev_squat_frame = 0
    process_start = time.time()

    while True:
        ret, im = cap.read()

        if not ret:
            break

        if cnt%n_frame == 0:
            output, angles_output = predict(im)
            temp_val = 0

            # Check if person 0 exists and angles_output is not empty
            if 0 in angles_output and angles_output[0]:
                person_out = angles_output[0]

                # Check if the angle values are not NaN before converting to int
                if not math.isnan(person_out[0]) and not math.isnan(person_out[1]):
                    if (int(person_out[0]) < 30) and (int(person_out[1]) < 30):
                        # Also check if person_out[2] is not NaN before comparing
                        if not math.isnan(person_out[2]) and person_out[2] >= 0:
                            temp_val = 1
                else:
                    # If angles are NaN, we cannot evaluate the push-up for this frame
                    # Optionally, you could set temp_val to a specific value or log this event
                    pass # Currently, we just skip the evaluation for this frame


                #if (prev_val == 0 and temp_val == 1) and (cnt - prev_squat_frame > 10):
                #    squat_cnt = squat_cnt + 1
                #    prev_squat_frame = cnt


            putTextOnImage(output, "Push Up Count: " + str(push_up_cnt), 50, 50, (0,255,0))

            #print(person_out[0])
            #print(person_out[1])
            #if person_out[0] < 70 and state == "up":
            if person_out[1] < 100 and state == "up":
                state = "down"
            #elif person_out[0] > 160 and state == "down":
            elif person_out[1] > 160 and state == "down":
                state = "up"
                push_up_cnt = push_up_cnt + 1

            putTextOnImage(output, "Push Up State: " + state, 50, 100, (0,255,0))

            output_frames.append(output)

            prev_val = temp_val

        cnt = cnt + 1

    vid_write_start = time.time()
    print("total processing time", vid_write_start - process_start)
    # Check if output_frames is not empty before accessing its elements
    if output_frames:
        height, width, _ = output_frames[0].shape
        size = (width,height)
        out = cv2.VideoWriter("push_ups_out.mp4",cv2.VideoWriter_fourcc(*'mp4v'), 10, size)

        for i in range(len(output_frames)):
            out.write(output_frames[i])

        print("video writing time", time.time() - vid_write_start)

        out.release()
    else:
        print("No frames were processed or added to output_frames.")

In [23]:
start= time.time()
inferenceOnVideo(videoPath)
print(time.time() - start)

total processing time 46.960577726364136
video writing time 3.9445924758911133
50.98910617828369
