# MediaPipe for Bicep Curls

This notebook is used to help illustrate how mediapipe will work in order to grab our pose estimation tools. The goal, at the end of the day, is two things: Have a pose estimation model, one that is ready to go, in order to help us track the poses we want.
2. Train our **own** model that will help detect how our form is currently doing. 
    1. To do that, we'll probably add some points along someone's back to correctly detect if the back is straight


Steps for deadlift form checker:

1. Load video --
2. Instantiate Pose Model --
3. Label Pictures: Pose values & lifting stage.
    * Use www.cvat.ai in order to label each 
4. Normalize video resolutions
    * This will prevent videos in different resolutions from giving different points in our point plane.
5. Extract pose model joint positions.
    * For each image in the video sequence, match joint locations to image
    * Classify as **DOWN**, **LIFTING**, and **UP**
    * Classify as **GOOD_FORM** or **BAD_FORM**
6. Feed this to model. 
    * If lots of data: `Pytorch`
    * If not, `scikit learn`
7. Evaluate performance with/without weighted back features. 

## Evaluation

1. Do we need more data?
2. Do we want to weight certain joints higher than others
3. Do we need any angle calculations?



Problems:
1. **How to add additional joints to back?**
    * Do we really NEED additional joints?
2. How can we make a model from this? Or should it just be AI?


# 0. Install and Import Dependencies

In [None]:
!pip install mediapipe opencv-python

In [1]:
import cv2
import mediapipe as mp
import numpy as np

* `mp_drawing` will help is visualize our poses. It contains the entire drawing utility needed for this.
* `mp_pose` is the actual **pose estimation model**

In [2]:
mp_drawing = mp.solutions.drawing_utils
mp_pose = mp.solutions.pose

Now, let's build a video feed using the webcamera to test out the pose model

In [5]:
# Open video feed
capture = cv2.VideoCapture(1)
while capture.isOpened():
    _, frame = capture.read()
    cv2.imshow('Mediapipe Feed', frame)

    # For exiting the live feed
    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

capture.release()
cv2.destroyAllWindows()

Cool. Our code works and we know our Logitech C390 camera is used in VideoCapture `1`. `0` is actually our EVGA capture for the Wii U

# 1. Make Detections

We are gonna build up on our CV2 code above and now use this to make the needed detections using the MediaPipe model.

In [6]:
# Setup Video capture device
capture = cv2.VideoCapture(1)

## Setup Mediapipe instance
with mp_pose.Pose(min_detection_confidence=0.5,
                  min_tracking_confidence=0.5) as pose:
    while capture.isOpened():
        _, frame = capture.read()

        # Recolor out image from BGR to RGB. MediaPose takes in RGB
        # and be default, cv2 outputs as a BGR.
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image.flags.writeable = False # Prevent video feed from being written to when converting

        ## Make Detection. Get detections and store in `results`
        results = pose.process(image)

        ## Recolor back to BGR to show on the feed
        image.flags.writeable = True # Allow the current frame to be written over, now that we have detections
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

        ## Render our detections onto the image
        mp_drawing.draw_landmarks(image,
                                  results.pose_landmarks, # Pass specific landmark coordinates
                                  mp_pose.POSE_CONNECTIONS,
                                  mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=2),
                                  mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)) # Pass connections(right shoulder --> right elbow)

        cv2.imshow('Mediapipe Feed', image)

        # For exiting the live feed
        if cv2.waitKey(10) & 0xFF == ord('q'):
            break

    capture.release()
    cv2.destroyAllWindows()

# 2. Determining Joints

<img src="images/MediaPipe-pose-BlazePose-Topology.jpg" width=80%>

Looking at this model code, let's now only select certain joints we want to work with. With our plan of a gym tracker, we'll probably want 11-25 or so. Maybe less. But, for this tutorial, we'll extract only a few.

And we will be building upon our already used code step by step.

In [21]:
# Setup Video capture device
capture = cv2.VideoCapture(1)

## Setup Mediapipe instance
with mp_pose.Pose(min_detection_confidence=0.5,
                  min_tracking_confidence=0.5) as pose:
    while capture.isOpened():
        _, frame = capture.read()

        # Recolor out image from BGR to RGB. MediaPose takes in RGB
        # and be default, cv2 outputs as a BGR.
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image.flags.writeable = False # Prevent video feed from being written to when converting

        ## Make Detection. Get detections and store in `results`
        results = pose.process(image)

        ## Recolor back to BGR to show on the feed
        image.flags.writeable = True # Allow the current frame to be written over, now that we have detections
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

        ### Extract Landmarks if present
        try:
            landmarks = results.pose_landmarks.landmark
        except:
            pass


        ## Render our detections onto the image
        mp_drawing.draw_landmarks(image,
                                  results.pose_landmarks, # Pass specific landmark coordinates
                                  mp_pose.POSE_CONNECTIONS,
                                  mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=2),
                                  mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)) # Pass connections(right shoulder --> right elbow)

        cv2.imshow('Mediapipe Feed', image)

        # For exiting the live feed
        if cv2.waitKey(10) & 0xFF == ord('q'):
            break

    capture.release()
    cv2.destroyAllWindows()

[x: 0.3004933297634125
y: 0.6538435220718384
z: -1.7436082363128662
visibility: 0.9998043179512024
, x: 0.33097073435783386
y: 0.5685827732086182
z: -1.6803086996078491
visibility: 0.9997573494911194
, x: 0.35128259658813477
y: 0.5668536424636841
z: -1.6802794933319092
visibility: 0.9997733235359192
, x: 0.3712272346019745
y: 0.565565824508667
z: -1.6803932189941406
visibility: 0.9997946619987488
, x: 0.259085476398468
y: 0.5628026723861694
z: -1.680360198020935
visibility: 0.9996547698974609
, x: 0.23475810885429382
y: 0.5583418607711792
z: -1.6793134212493896
visibility: 0.999568521976471
, x: 0.21109294891357422
y: 0.5563806891441345
z: -1.6792024374008179
visibility: 0.9994754195213318
, x: 0.40008267760276794
y: 0.5880357027053833
z: -1.127402663230896
visibility: 0.9998425245285034
, x: 0.17777195572853088
y: 0.5798267126083374
z: -1.1023958921432495
visibility: 0.9995558857917786
, x: 0.34136074781417847
y: 0.7356288433074951
z: -1.5239176750183105
visibility: 0.999862551689148


Check it: It attempts to find all landmarks

In [8]:
len(landmarks)

33

So here, we want to extract our `left_shoulder`, `left_elbow` and `left_wrist` to help determine our bicep curl. So, using our map values, let's grab all three

In [22]:
landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value]

x: 0.4351707100868225
y: 0.8366254568099976
z: -0.5627273917198181
visibility: 0.998735249042511

In [23]:
landmarks[mp_pose.PoseLandmark.LEFT_ELBOW.value]

x: 0.8439351320266724
y: 0.8652452826499939
z: -1.2163678407669067
visibility: 0.9650174379348755

In [24]:
landmarks[mp_pose.PoseLandmark.LEFT_WRIST.value]

x: 0.7855621576309204
y: 0.27017301321029663
z: -1.888045072555542
visibility: 0.9947842955589294

# 3. Calculate Angles

Based on iOS code that does the same thing, but he rewrote it in python. So, we'll make a function that will take the extracted points and begin calculating the angle between them.

In [16]:
def calculate_angle(a,b,c):
    """
    Calculate angle between three joints
    a: first joint: default = 11(left shoulder)
    b: second joint: default = 13(left elbow)
    c: third joint: default = 15 (left wrist)
    """
    a = np.array(a)
    b = np.array(b)
    c = np.array(c)

    # Convert to radians to be able to calculate our angle
    radians = np.arctan2(c[1]-b[1], c[0]-b[0]) - np.arctan2(a[1]-b[1], a[0]-b[0])
    angle = np.abs(radians*(180.0/np.pi))

    # Prevent angle above 180, because we are only human
    if angle > 180:
        angle = 360 - 180
    
    return angle

Lets test it now!

In [25]:
left_shoulder = [landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value].x, landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value].y]
left_elbow = [landmarks[mp_pose.PoseLandmark.LEFT_ELBOW.value].x, landmarks[mp_pose.PoseLandmark.LEFT_ELBOW.value].y]
left_wrist = [landmarks[mp_pose.PoseLandmark.LEFT_WRIST.value].x, landmarks[mp_pose.PoseLandmark.LEFT_WRIST.value].y]

calculate_angle(left_shoulder,
                left_elbow,
                left_wrist)

80.39250195790153

In [31]:
# Setup Video capture device
capture = cv2.VideoCapture(1)

## Setup Mediapipe instance
with mp_pose.Pose(min_detection_confidence=0.5,
                  min_tracking_confidence=0.5) as pose:
    while capture.isOpened():
        _, frame = capture.read()

        # Recolor out image from BGR to RGB. MediaPose takes in RGB
        # and be default, cv2 outputs as a BGR.
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image.flags.writeable = False # Prevent video feed from being written to when converting

        ## Make Detection. Get detections and store in `results`
        results = pose.process(image)

        ## Recolor back to BGR to show on the feed
        image.flags.writeable = True # Allow the current frame to be written over, now that we have detections
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

        ### Extract Landmarks if present
        try:
            landmarks = results.pose_landmarks.landmark

            #### Get Coordinates
            left_shoulder = [landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value].x, landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value].y]
            left_elbow = [landmarks[mp_pose.PoseLandmark.LEFT_ELBOW.value].x, landmarks[mp_pose.PoseLandmark.LEFT_ELBOW.value].y]
            left_wrist = [landmarks[mp_pose.PoseLandmark.LEFT_WRIST.value].x, landmarks[mp_pose.PoseLandmark.LEFT_WRIST.value].y]
        
            #### Calculate angle
            angle = calculate_angle(left_shoulder, left_elbow, left_wrist)

            #### Visualize our value to the screen
            cv2.putText(image, str(angle),
                        tuple(np.multiply(left_elbow, [capture.get(cv2.CAP_PROP_FRAME_WIDTH),
                                                       capture.get(cv2.CAP_PROP_FRAME_HEIGHT)]).astype(int)),
                        cv2.FONT_HERSHEY_COMPLEX, 0.5, (255,255,255), 2, cv2.LINE_AA)
        
        except:
            pass


        ## Render our detections onto the image
        mp_drawing.draw_landmarks(image,
                                  results.pose_landmarks, # Pass specific landmark coordinates
                                  mp_pose.POSE_CONNECTIONS,
                                  mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=2),
                                  mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)) # Pass connections(right shoulder --> right elbow)

        cv2.imshow('Mediapipe Feed', image)

        # For exiting the live feed
        if cv2.waitKey(10) & 0xFF == ord('q'):
            break

    capture.release()
    cv2.destroyAllWindows()

# 4. Curl Counter

Now, some logic to know when we are curling.

In [38]:
# Setup Video capture device
capture = cv2.VideoCapture(1)

##### Curl counter variables
counter = 0
stage = None

## Setup Mediapipe instance
with mp_pose.Pose(min_detection_confidence=0.5,
                  min_tracking_confidence=0.5) as pose:
    while capture.isOpened():
        _, frame = capture.read()

        # Recolor out image from BGR to RGB. MediaPose takes in RGB
        # and be default, cv2 outputs as a BGR.
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image.flags.writeable = False # Prevent video feed from being written to when converting

        ## Make Detection. Get detections and store in `results`
        results = pose.process(image)

        ## Recolor back to BGR to show on the feed
        image.flags.writeable = True # Allow the current frame to be written over, now that we have detections
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

        ### Extract Landmarks if present
        try:
            landmarks = results.pose_landmarks.landmark

            #### Get Coordinates
            left_shoulder = [landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value].x, landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value].y]
            left_elbow = [landmarks[mp_pose.PoseLandmark.LEFT_ELBOW.value].x, landmarks[mp_pose.PoseLandmark.LEFT_ELBOW.value].y]
            left_wrist = [landmarks[mp_pose.PoseLandmark.LEFT_WRIST.value].x, landmarks[mp_pose.PoseLandmark.LEFT_WRIST.value].y]
        
            #### Calculate angle
            angle = calculate_angle(left_shoulder, left_elbow, left_wrist)

            #### Visualize our value to the screen
            cv2.putText(image, str(angle),
                        tuple(np.multiply(left_elbow, [capture.get(cv2.CAP_PROP_FRAME_WIDTH),
                                                       capture.get(cv2.CAP_PROP_FRAME_HEIGHT)]).astype(int)),
                        cv2.FONT_HERSHEY_COMPLEX, 0.5, (255,255,255), 2, cv2.LINE_AA)
        except:
            pass

        if angle > 130:
            stage = "down"
        elif angle <45 and stage == 'down': #Check if we are coming back from a down position
            stage = "up"
            counter += 1
            print(counter)


        ## Render our detections onto the image
        mp_drawing.draw_landmarks(image,
                                  results.pose_landmarks, # Pass specific landmark coordinates
                                  mp_pose.POSE_CONNECTIONS,
                                  mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=2),
                                  mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)) # Pass connections(right shoulder --> right elbow)

        ##### Render our text to the screen
        # Create setup box for the counter text
        cv2.rectangle(image,
                      (0,0), (255, 73), # Coordinates
                      (245,117,16), #color
                      -1) #Fill box with color
        # Render the Counter
        cv2.putText(image, "REPS:",
                    (15,12),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,0), 1, cv2.LINE_AA)
        cv2.putText(image, str(counter),
                    (10,60),
                    cv2.FONT_HERSHEY_SIMPLEX, 2, (255,255,255), 2, cv2.LINE_AA)

        # Render the Counter
        cv2.putText(image, "STAGE:",
                    (65,12),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,0), 1, cv2.LINE_AA)
        cv2.putText(image, str(stage),
                    (60,60),
                    cv2.FONT_HERSHEY_SIMPLEX, 2, (255,255,255), 2, cv2.LINE_AA)

        mp_drawing.draw_landmarks(image,
                                  results.pose_landmarks, # Pass specific landmark coordinates
                                  mp_pose.POSE_CONNECTIONS,
                                  mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=2),
                                  mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)) # Pass connections(right shoulder --> right elbow)

        cv2.imshow('Mediapipe Feed', image)

        # For exiting the live feed
        if cv2.waitKey(10) & 0xFF == ord('q'):
            break

    capture.release()
    cv2.destroyAllWindows()

1
2
3
4
5
6
7
8
9
