# How to Infere & Extract Data from a Pre-Trained YOLO Detection Model - Glove Framing Tracking
---
If you have any questions, please contact the authors of the repository.

## Pre-work

Let's make sure that we have access to GPU. We can use `nvidia-smi` command to do that. In case of any problems navigate to `Edit` -> `Notebook settings` -> `Hardware accelerator`, set it to `GPU`, and then click `Save`.

In [1]:
!nvidia-smi

Thu Oct 17 03:50:12 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA L4                      Off | 00000000:00:03.0 Off |                    0 |
| N/A   40C    P8              12W /  72W |      1MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

## Clone BaseballCV Repo, set as Current Directory and Install Requirements

In [2]:
!git clone https://github.com/dylandru/BaseballCV.git
%cd BaseballCV
!pip install -r requirements.txt

Cloning into 'BaseballCV'...
remote: Enumerating objects: 773, done.[K
remote: Counting objects: 100% (206/206), done.[K
remote: Compressing objects: 100% (182/182), done.[K
remote: Total 773 (delta 65), reused 68 (delta 20), pack-reused 567 (from 1)[K
Receiving objects: 100% (773/773), 335.65 MiB | 21.04 MiB/s, done.
Resolving deltas: 100% (294/294), done.
/content/BaseballCV
Collecting bs4==0.0.2 (from -r requirements.txt (line 1))
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Collecting pip==24.0 (from -r requirements.txt (line 4))
  Downloading pip-24.0-py3-none-any.whl.metadata (3.6 kB)
Collecting pybaseball==2.2.7 (from -r requirements.txt (line 5))
  Downloading pybaseball-2.2.7-py3-none-any.whl.metadata (11 kB)
Collecting pytest==8.3.2 (from -r requirements.txt (line 6))
  Downloading pytest-8.3.2-py3-none-any.whl.metadata (7.5 kB)
Collecting ultralytics>=8.2.90 (from -r requirements.txt (line 7))
  Downloading ultralytics-8.3.15-py3-none-any.whl.metadat

## Import required libraries

In [3]:
from ultralytics import YOLO
import cv2
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_agg import FigureCanvasAgg
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
import torch
import moviepy.editor as mpy
from baseballcv.functions import LoadTools

# Initialize LoadTools class
load_tools = LoadTools()

Creating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.


  if event.key is 'enter':



## Define the glove movement tracking function

In [4]:
def track_glove_movement(model, video_path, output_path='glove_tracking.mp4'):
    cap = cv2.VideoCapture(video_path)

    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))

    # Perspective transformation points

    # Use basic source points based on broadcast - will update to be more dynamic
    src_points = np.array([
        [frame_width * 0.3, frame_height * 0.8],
        [frame_width * 0.7, frame_height * 0.8],
        [frame_width * 0.7, frame_height * 0.2],
        [frame_width * 0.3, frame_height * 0.2]
    ], dtype=np.float32)

    dst_height = frame_height
    dst_width = int(dst_height * 0.4)
    dst_points = np.array([
        [0, dst_height - 1],
        [dst_width - 1, dst_height - 1],
        [dst_width - 1, 0],
        [0, 0]
    ], dtype=np.float32)

    M = cv2.getPerspectiveTransform(src_points, dst_points)

    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter(output_path, fourcc, fps, (frame_width, frame_height))

    glove_positions = []
    glove_boxes = []

    play_id = ''

    # Set up plot figure
    fig = plt.figure(figsize=(16, 6))
    gs = fig.add_gridspec(1, 4, width_ratios=[1, 0.01, 0.4, 0.020])

    ax1 = fig.add_subplot(gs[0])
    ax2 = fig.add_subplot(gs[2])
    fig.add_subplot(gs[3]).set_visible(False)
    fig.add_subplot(gs[1]).set_visible(False)

    plt.ion()
    fig.suptitle(
        f'Glove Movement throughout Play {play_id}',
        fontsize=16,
        fontweight='bold'
    )

    glove_img = plt.imread('/content/BaseballCV/assets/baseball_glove.png')

    while True:
        ret, frame = cap.read()
        if not ret:
            break

        # YOLO inference
        results = model(frame, device='cuda' if torch.cuda.is_available() else 'cpu')

        for r in results:
            boxes = r.boxes
            for box in boxes:
                if box.cls == 0:  # Assuming 0 is the glove class
                    x1, y1, x2, y2 = box.xyxy[0].tolist()
                    glove_center = ((x1 + x2) / 2, (y1 + y2) / 2)

                    flat_glove_point = cv2.perspectiveTransform(np.array([[glove_center]]), M)[0][0]

                    glove_positions.append(flat_glove_point)
                    glove_boxes.append((x1, y1, x2, y2))

                    centroid_x = int((x1 + x2) / 2)
                    centroid_y = int((y1 + y2) / 2)

                    cv2.circle(frame, (centroid_x, centroid_y), 5, (0, 0, 255), -1)  # Red dot
                    cv2.rectangle(frame, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0), 2)

        ax1.clear()
        ax1.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
        ax1.axis('off')

        ax2.clear()
        ax2.set_facecolor('blue')
        ax2.add_patch(plt.Rectangle((0, 0), dst_width, dst_height, fill=True, color='blue'))
        if glove_positions:
            x = [pos[0] for pos in glove_positions]
            y = [pos[1] for pos in glove_positions]
            ax2.plot(x, y, 'r-')

            im = OffsetImage(glove_img, zoom=0.04)
            ab = AnnotationBbox(im, (x[-1], y[-1]), xycoords='data', frameon=False)
            ax2.add_artist(ab)

        ax2.set_xlim(0, dst_width)
        ax2.set_ylim(dst_height, 0)  # Invert y-axis
        ax2.set_xticks([])
        ax2.set_yticks([])

        plt.tight_layout()

        # Convert plot to image
        canvas = FigureCanvasAgg(fig)
        canvas.draw()
        plot_image = np.frombuffer(canvas.tostring_rgb(), dtype='uint8')
        plot_image = plot_image.reshape(canvas.get_width_height()[::-1] + (3,))

        plot_image = cv2.resize(plot_image, (frame_width, frame_height))
        plot_image = cv2.cvtColor(plot_image, cv2.COLOR_RGB2BGR)

        out.write(plot_image)

    cap.release()
    out.release()
    plt.ioff()
    plt.close(fig)

    return glove_positions, glove_boxes

## Define the video to be infered, the model to be used and the variables to receive the glove postions and boxes

In [5]:
SOURCE_VIDEO_PATH = '/content/BaseballCV/assets/example_broadcast_video.mp4'

# Load the model
model = YOLO(load_tools.load_model('glove_tracking'))

# Run the glove tracking function
glove_positions, glove_boxes = track_glove_movement(model, SOURCE_VIDEO_PATH)

Downloading glove_tracking.pt: 100%|██████████| 114M/114M [00:03<00:00, 33.6MiB/s]


Model downloaded to models/glove_tracking/model_weights/glove_tracking.pt

0: 384x640 (no detections), 92.1ms
Speed: 14.8ms preprocess, 92.1ms inference, 124.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 (no detections), 19.3ms
Speed: 1.9ms preprocess, 19.3ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 (no detections), 19.6ms
Speed: 2.0ms preprocess, 19.6ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 (no detections), 21.8ms
Speed: 1.7ms preprocess, 21.8ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 (no detections), 20.2ms
Speed: 2.1ms preprocess, 20.2ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 (no detections), 19.8ms
Speed: 2.2ms preprocess, 19.8ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 (no detections), 19.8ms
Speed: 2.0ms preprocess, 19.8ms inference, 0.6ms postprocess per image at shape (1, 3, 38

  plt.tight_layout()




0: 384x640 1 glove, 2 homeplates, 1 baseball, 20.0ms
Speed: 1.8ms preprocess, 20.0ms inference, 1.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 glove, 2 homeplates, 1 baseball, 1 rubber, 19.0ms
Speed: 1.7ms preprocess, 19.0ms inference, 1.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 glove, 2 homeplates, 1 baseball, 1 rubber, 19.0ms
Speed: 1.8ms preprocess, 19.0ms inference, 1.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 glove, 2 homeplates, 1 baseball, 1 rubber, 19.5ms
Speed: 1.7ms preprocess, 19.5ms inference, 1.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 glove, 2 homeplates, 1 baseball, 1 rubber, 19.2ms
Speed: 1.7ms preprocess, 19.2ms inference, 1.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 glove, 2 homeplates, 1 baseball, 1 rubber, 19.3ms
Speed: 1.8ms preprocess, 19.3ms inference, 1.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 glove, 2 homeplates, 1 basebal

**NOTE:** If you want to run inference using your own file as input, simply upload video to Google Colab and update `SOURCE_VIDEO_PATH` with the path leading to your file.

## Visualize the created video file

In [6]:
# Display the video using moviepy after processing
video = mpy.VideoFileClip("glove_tracking.mp4")
resized_video = video.resize((640, 360))  # Resize as needed
mpy.ipython_display(resized_video)

Moviepy - Building video __temp__.mp4.
Moviepy - Writing video __temp__.mp4




                                                               

Moviepy - Done !
Moviepy - video ready __temp__.mp4




## Remember you obtained the coordinates from the glove for your desired use.

In [7]:
print(glove_positions)

[array([     148.19,      296.05]), array([     148.15,      295.37]), array([     146.89,      294.75]), array([     147.39,      292.52]), array([     146.42,      292.52]), array([     145.49,      291.04]), array([     145.96,      292.39]), array([     145.96,      289.83]), array([     145.45,      291.39]), array([     144.99,      292.48]), array([     144.37,      292.33]), array([     143.75,      294.51]), array([     143.49,      294.36]), array([     143.58,      294.15]), array([     143.74,      294.18]), array([     143.81,      293.53]), array([     143.83,      293.57]), array([     143.81,      293.23]), array([     143.75,       292.8]), array([     143.52,      293.48]), array([     143.77,      294.77]), array([     143.38,      294.62]), array([     143.87,      296.26]), array([     143.81,       296.5]), array([     143.82,      296.27]), array([     144.04,      295.75]), array([     144.34,       294.9]), array([     144.59,      296.55]), array([     144.69,

In [8]:
print(glove_boxes)

[(627.7081298828125, 305.0785827636719, 669.0428466796875, 338.6786804199219), (628.3784790039062, 303.881103515625, 668.2197875976562, 339.056640625), (626.6162719726562, 303.99658203125, 665.4815063476562, 338.19580078125), (627.3460083007812, 300.97906494140625, 666.5180053710938, 338.53521728515625), (625.0267944335938, 301.61724853515625, 665.3932495117188, 337.89239501953125), (622.7520141601562, 300.3912353515625, 664.3495483398438, 337.343994140625), (623.8116455078125, 301.7233581542969, 664.9798583984375, 337.6318664550781), (624.090576171875, 299.39141845703125, 664.6806640625, 336.89117431640625), (622.2506103515625, 299.78912353515625, 664.6990966796875, 338.37042236328125), (621.5975341796875, 299.81494140625, 663.72607421875, 339.64990234375), (619.966552734375, 298.9136962890625, 663.142333984375, 340.37109375), (618.9852294921875, 301.3663330078125, 661.8973388671875, 340.54095458984375), (618.4698486328125, 300.95977783203125, 661.4981689453125, 340.76025390625), (618