# [Click here for workshop instructions](https://msu-ai.notion.site/Workshop-Instructions-06d5c272f263455a97ec9795ff1c7704)

# Step 1: Load an image
Before we can do any pose estimation, we need an image of a person. Run the following command to download a photo of MSU's own Tom Izzo:

In [None]:
!curl -o tom-izzo.webp 'https://cdn.vox-cdn.com/thumbor/-5PVeiyaxPciGcZFfS21PkNxbwQ=/0x0:4152x2880/1820x1213/filters:focal(1762x144:2426x808):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/63298666/usa_today_12318130.0.jpg'

After running the above command, the file `tom-izzo.webp` should exist in the files tab on the left. Let's use some python code to display it. You'll need to update this code to make it read the file:

In [None]:
import cv2
import matplotlib.pyplot as plt

# Load the image:
# (Do you remember how to read an image file?
# Consult last week's instructions if you need help.)
image = # ???

# Convert it to RGB form because that's what every other package uses
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Display the image using matplotlib
plt.imshow(image)

# Step 2: Perform pose estimation
Amazing! Now we want to look at the image of Tom and determine where all his limbs are. This is called pose estimation, and luckily for us, there's a python library called [mediapipe](https://google.github.io/mediapipe/solutions/pose.html) that can really come in handy. Mediapipe is made by Google, and it contains many pre-trained neural networks for performing all kinds of computer vision tasks.

Run the following command to install the mediapipe package:

In [None]:
%pip install mediapipe

Great! Now that mediapipe is installed, let's use it to compute Tom's pose and display the results. **The following code to compute Tom's pose has a bug. Can you fix it?**

In [None]:
import cv2
import mediapipe as mp
import matplotlib.pyplot as plt

# Load the image and convert to RGB format
image = cv2.imread("tom-izzo.webp")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Compute the pose
pose = mp.solutions.pose.Pose()
results = pose.process(img)

# Draw the pose on the image
mp.solutions.drawing_utils.draw_landmarks(
    image,
    results.pose_landmarks,
    mp.solutions.pose.POSE_CONNECTIONS
)

# Display the image
plt.imshow(image)

Amazing! If all went well, you should see lines drawn on Tom Izzo's body indicating his pose.

As you can see, mediapipe has done a lot of heavy lifting for us. We created a `pose` object and asked it to process our image, and it gave us some `results` (in the form of cryptic numbers). Then, mediapipe even helped us draw the results onto the image so we can see what the computer is thinking!

# Part 3: Understand the results
Just drawing the results isn't enough. If we really want to do anything useful, we need cold, hard numbers. So it's time to dive into the `results` ourselves.

Run this line of code to investigate the `results` variable that we computed above:

In [None]:
print(results)

**Hmm... 🤔** This isn't helpful at all. The print statement is just telling us that we have an instance of the class SolutionOutputs. We need to dive deeper.

In the code before, we used `results.pose_landmarks`. Try editing the following to print out `results.pose_landmarks` so we can take a look inside:

In [None]:
print() # Edit me

**Aha!** This is much better. `pose_landmarks` is a big list of landmark points on the image. As you can see, there are many landmarks, and each landmark has an x, y, z, and visibility.

To understand what these mean, take a look at the following diagram, provided by mediapipe:

![Pose landmark map](https://google.github.io/mediapipe/images/mobile/pose_tracking_full_body_landmarks.png)

As you can see, each "landmark" (special point) on the body is assigned a number. The nose is number 0, the right thumb is number 22, and so on. `results.pose_landmarks` contains these landmarks in order.

Strangely, `results.pose_landmarks` is an object, and to get the actual list, you have to write `results.pose_landmarks.landmark`. So if you want to get the location of the left eye (which is landmark number 2), you would write `results.pose_landmarks.landmark[2]`.

Try printing out the location of Tom Izzo's left wrist landmark:

In [None]:
print() # Edit me

You should get `x: 0.66499` and `y: 0.78431`. Do these numbers make sense? Let's check back in on the image of Tom we generated earlier:

In [None]:
# Display the image we generated earlier of Tom and his pose
plt.imshow(image)

Hmmm... Since we're facing Tom, his left wrist is actually the hand on the right side of the picture. So based on the coordinates shown, it seems like his wrist has `x: 1250` and `y: 950` (give or take). But that's not what was computed!

**Question:** Why `x: 0.66499` and `y: 0.78431`?

# Part 4: Convert the results to useful coordinates

Hopefully you've taken a moment to think about the previous question. What you might have noticed is that these decimal values are sort of like percentages across the width and height of the image. Tom's left wrist is indeed about 66.4% of the way from left to right (so a bit right of center) and about 78.4% from top to bottom (so pretty close to the bottom of the image).

This percentage format is cute, but it isn't very helpful to us. To do anything useful, we need to convert it to the standard pixel coordinates.

Your job is to write a function to do just that. (Hint: You'll need to use `image.shape`.)

In [None]:
# Your job is to complete the toPixelCoordinates function
# To convert x and y from decimals to pixel numbers

def toPixelCoordinates(x, y, image):
  # Compute the pixel coordiantes:
  new_x = # ???
  new_y = # ???

  # Round to the nearest integer (important for later)
  new_x = int(new_x)
  new_y = int(new_y)

  return (new_x, new_y)


# Try using our new function to make sure it works:

left_wrist = results.pose_landmarks.landmark[15]
print(toPixelCoordinates(left_wrist.x, left_wrist.y, image))

# This *should* print (1210, 951)

Make sure you get `(1210, 951)`. Once your function is working, you can use OpenCV to draw a circle on a particular landmark you're interested in. Edit the following code to draw the circle at the location of the left wrist. (Right now it just draws a circle at `(100, 100)`, which is not what we want.)

In [None]:
import cv2
import mediapipe as mp
import matplotlib.pyplot as plt

# Load the image and convert to RGB format
image = cv2.imread("tom-izzo.webp")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Compute the pose
pose = mp.solutions.pose.Pose()
results = pose.process(image)

# Draw a circle on one particular landmark
left_wrist = results.pose_landmarks.landmark[15]
pixel_coords = toPixelCoordinates(left_wrist.x, left_wrist.y, image)
print("Left wrist coordinates:", pixel_coords)

# THIS IS WRONG
# Change it to print the circle in the correct location...
cv2.circle(image, center=(100, 100), radius=50, color=(255, 0, 0), thickness=-1)

# Display the image
plt.imshow(image)

Once you have the above code drawing a circle on the left wrist, try changing it to plot other landmarks. Can you give him a clown nose? (You can consult [the landmark map](https://google.github.io/mediapipe/images/mobile/pose_tracking_full_body_landmarks.png) to figure out which number to use.)

Hopefully you can successfully plot any landmark!

# Step 5: Begin building a game
So far we've been hanging out in this python notebook, and it has been a good time. But, like a bird learning to fly, it's time to leave the nest and make something real.

<center>
<img src="http://blogs.bu.edu/bioaerial2012/files/2012/10/Baby-Bird-Learning-to-Fly1.jpg" alt="Mother bird agressively yeets baby out of the nest" width="200" />
</center>

This is necessary because we want to use the webcam at a high framerate, and that can only happen by running directly on your own machine, not in Google Colab.

So your job is to create a python file on your computer and copy + paste in the following code. (If you need help with this, consult the FAQ page.)

```python
import cv2
import mediapipe as mp

def toPixelCoordinates(x, y, image):
  # FILL THIS IN!
  # ??????
  return (new_x, new_y)

cap = cv2.VideoCapture(0)

with mp.solutions.pose.Pose() as pose:
    while cap.isOpened():
        success, image = cap.read()
        if not success:
            print("Ignoring empty camera frame.")
            continue

        # Convert the image from BGR to RGB format:
        image = # ???

        # Flip the image horizontally:
        # (Without this, the game is very confusing to play.)
        image = cv2.flip(image, 1)

        # Compute the pose results:
        # (Note that the `pose` variable already exists.)
        results = # ???

        # Convert the image back to BGR format:
        image = # ???

        # Draw the pose annotation on the image.
        mp.solutions.drawing_utils.draw_landmarks(
            # ?????
        )

        # Highlight the wrists
        # Left wrist
        left_wrist = results.pose_landmarks.landmark[15]
        left_point = toPixelCoordinates(left_wrist.x, left_wrist.y, image)
        cv2.circle(image, center=left_point, radius=50, color=(255, 0, 0), thickness=-1)

        # Right wrist
        right_wrist = # ???
        right_point = # ???
        cv2.circle() # ???

        # Display the image:
        cv2.imshow("Game", image)

        if cv2.waitKey(1) & 0xFF == ord("q"):  # Press "q" to quit.
            break

cap.release()
```

Notice that the code contains a few missing pieces, indicated by question marks (`???`). Using what you've learned from the notebook, you should be able to fill them in.

Once you fill in all the gaps, you should have a program that displays your webcam on screen and highlights your wrists with circles.

# Part 6: Add targets to hit

Just detecting your hand motion is nice, but it isn't really a *game*. Let's add a goal by creating targets for the player to hit.

First, copy + paste the following code into your python file near the top, just below all the imports:

```python
from random import random


class Target:
    def __init__(self, image):
        # Give this target a randomized x, y, and radius
        self.radius = random() * 50 + 50
        self.x = random() * image.shape[1]
        self.y = random() * image.shape[0]

    def draw(self, image):
        # Draw this target on the screen
        cv2.circle(image, (int(self.x), int(self.y)),
                   int(self.radius), (255, 0, 0), -1)

    def point_is_within_me(self, x, y):
        # Use the distance formula to determine if the point is within the circle
        return (x - self.x) ** 2 + (y - self.y) ** 2 < self.radius ** 2

    def hit_by_points(self, points):
        for point in points:
            if self.point_is_within_me(point[0], point[1]):
                return True
        return False
```

This code creates a new `Target` class that allows us to add new targets any time we want.

Then, add `targets = []` before the main program loop to create an empty list of targets to hit.

Next, immediately after the code that converts the image back from RGB to BGR, add the following loops to display the targets and make sure there are always at least 5 of them on screen:

```python
# Make sure there are at least 5 targets:
while len(targets) < 5:
    targets.append(Target(image))

# Display the targets:
for target in targets:
    target.draw(image)
```

Now when you run the code, you should see 5 targets on screen. But how can we allow you to hit them?

This is where the pose detection comes in. In the code that highlights the wrists with circles, we already have `left_point` and `right_point`, the coordinates of the wrists in the image. After that wrist highlighting code, add this loop which checks each target to see if it is currently being hit by a wrist:

```python
# Delete targets that were hit:
for target in targets:
    if target.hit_by_points([left_point, right_point]):
        targets.remove(target)
```

Once that's done, you'll be able to play the game!

# Part 7: Upgrade the game

Now we have a working game! From here on, we just want to make upgrades that make it more fun.

## Upgrade 1: Keep track of points
Try creating a "points" variable to track how many targets you've hit. Display your score on-screen using OpenCV. (Refer to [last week's OpenCV workshop](https://msu-ai.notion.site/Workshop-Instructions-71ae82f6d8d9452586e6626ffb48e1b9) if needed, or use Google for help with drawing text on screen.)

## Upgrade 2: Play with your feet
Right now the game is played using your hands, by tracking the location of your wrists. Try changing the game to be played with your feet instead!

## Upgrade 3 (challenge): Set a time limit
Try adding a time limit to the game! When time runs out, the game should stop.

## Upgrade 4 (challenge): Create moving targets
Right now, the targets appear but they never move. Consider making moving targets instead!

You might find it helpful to add `self.vel_x` and `self.vel_y` variables to the target so that you can keep track of its velocity in the x and y directions. Then, each frame, create a method on the target that updates the x and y positions accordingly:
```python
self.x += self.vel_x
self.y += self.vel_y
```

## Upgrade 5 (challenge): Add sound effects
Every game is better with sound.