# OKD-NOKD Dataset and Pose Detection Use-Case
---
### Exploring the classification of catcher positioning (one-knee down vs. both knees down) by utilizing the pose of the catcher for datapoints with a classification model.



## Pre-work

Let's make sure that we have access to GPU. We can use `nvidia-smi` command to do that. In case of any problems navigate to `Edit` -> `Notebook settings` -> `Hardware accelerator`, set it to `GPU`, and then click `Save`.

In [1]:
!nvidia-smi

Sat Sep 28 03:54:24 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

## Clone BaseballCV Repo, set as Current Directory and Install Requirements

In [2]:
!git clone https://github.com/dylandru/BaseballCV.git
%cd BaseballCV
!pip install -r requirements.txt

Cloning into 'BaseballCV'...
remote: Enumerating objects: 617, done.[K
remote: Counting objects: 100% (50/50), done.[K
remote: Compressing objects: 100% (42/42), done.[K
remote: Total 617 (delta 19), reused 23 (delta 8), pack-reused 567 (from 1)[K
Receiving objects: 100% (617/617), 306.75 MiB | 25.43 MiB/s, done.
Resolving deltas: 100% (248/248), done.
/content/BaseballCV
Collecting bs4==0.0.2 (from -r requirements.txt (line 1))
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Collecting pip==24.0 (from -r requirements.txt (line 4))
  Downloading pip-24.0-py3-none-any.whl.metadata (3.6 kB)
Collecting pybaseball==2.2.7 (from -r requirements.txt (line 5))
  Downloading pybaseball-2.2.7-py3-none-any.whl.metadata (11 kB)
Collecting pytest==8.3.2 (from -r requirements.txt (line 6))
  Downloading pytest-8.3.2-py3-none-any.whl.metadata (7.5 kB)
Collecting ultralytics>=8.2.90 (from -r requirements.txt (line 7))
  Downloading ultralytics-8.2.102-py3-none-any.whl.metadata (3


## Data Prep from Pose Points for OKD/NOKD Classification

- Import required libraries


In [3]:
import cv2
import os
import pandas as pd
from ultralytics import YOLO
from scripts.load_tools import load_dataset


Creating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.


- Load pose model and dataset OKD_NOKD



In [6]:
pose_model = YOLO("yolov8l-pose.pt")

load_dataset("okd_nokd")

# input folder with OKD and NOKD classification folders
input_folder = "OKD_NOKD/data/"

Downloading OKD_NOKD: 100%|██████████| 113M/113M [00:05<00:00, 21.4MiB/s]


Dataset downloaded and extracted to OKD_NOKD.


- Apply YOLO pose detection (using large v8 model) for keypoints


In [9]:

# Create empty pose data list
pose_data = []

# columns for df
columns = ['filename', 'OKD', 'NOKD']
for i in range(17):  # Assuming 17 keypoints
    columns.extend([f'pose_x_{i}', f'pose_y_{i}'])

# Process each subfolder
for subfolder in ['OKD', 'NOKD']:
    subfolder_path = os.path.join(input_folder, subfolder)

    for filename in os.listdir(subfolder_path):
        if filename.lower().endswith(('.jpg', '.jpeg')):
            input_path = os.path.join(subfolder_path, filename)

            img = cv2.imread(input_path)
            if img is None:
                print(f"{input_path} failed to load.")
                continue

            # run pose with 10% confidence min threshold
            pose_results = pose_model(img, device='cuda', verbose=True, conf=0.1)[0]
            pose_points = pose_results.keypoints[0].xyn[0].cpu().numpy().tolist()
            pose_points = [(float(x), float(y)) for x, y in pose_points]

            # Determine OKD or NOKD based on classification folder
            okd = 1 if subfolder == 'OKD' else 0
            nokd = 1 - okd

            row = [filename, okd, nokd]

            for i in range(17):
                if i < len(pose_points):
                    row.extend(pose_points[i])
                else:
                    row.extend([None, None])  # None if missing

            pose_data.append(row)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
0: 640x640 3 persons, 37.0ms
Speed: 2.4ms preprocess, 37.0ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 6 persons, 37.9ms
Speed: 2.9ms preprocess, 37.9ms inference, 1.6ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 5 persons, 36.1ms
Speed: 3.1ms preprocess, 36.1ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 13 persons, 37.7ms
Speed: 2.3ms preprocess, 37.7ms inference, 1.8ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 4 persons, 37.6ms
Speed: 2.5ms preprocess, 37.6ms inference, 1.6ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 3 persons, 36.3ms
Speed: 2.4ms preprocess, 36.3ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 14 persons, 37.7ms
Speed: 2.4ms preprocess, 37.7ms inference, 1.9ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 8 persons, 36.9ms
Speed: 2.4ms pr

- Create dataframe for analysis

In [10]:

df = pd.DataFrame(pose_data, columns=columns)
print(f"Rows: {df.shape[0]} | Columns: {df.shape[1]}")
df.head()


Rows: 2816 | Columns: 37


Unnamed: 0,filename,OKD,NOKD,pose_x_0,pose_y_0,pose_x_1,pose_y_1,pose_x_2,pose_y_2,pose_x_3,...,pose_x_12,pose_y_12,pose_x_13,pose_y_13,pose_x_14,pose_y_14,pose_x_15,pose_y_15,pose_x_16,pose_y_16
0,002460.jpg,1,0,0.65148,0.088128,0.666378,0.072404,0.0,0.0,0.711939,...,0.788483,0.39163,0.72863,0.515787,0.725813,0.550276,0.771564,0.733959,0.809213,0.687199
1,005002.jpg,1,0,0.402279,0.342783,0.418348,0.332589,0.397017,0.330485,0.444214,...,0.409693,0.593397,0.627751,0.542191,0.29084,0.633584,0.713278,0.650721,0.440319,0.643927
2,004809.jpg,1,0,0.465831,0.33206,0.483737,0.315631,0.453661,0.314799,0.512542,...,0.457235,0.586898,0.622098,0.538328,0.384958,0.663989,0.594824,0.702214,0.50147,0.702765
3,005628.jpg,1,0,0.464616,0.303069,0.482255,0.28356,0.449289,0.288816,0.516029,...,0.457281,0.622881,0.654365,0.601267,0.352847,0.715918,0.628763,0.721192,0.494106,0.723305
4,000293.jpg,1,0,0.285973,0.038199,0.302692,0.017082,0.264254,0.018667,0.329398,...,0.181959,0.290155,0.405181,0.411296,0.097271,0.436836,0.420163,0.594281,0.04198,0.64631


### Fill NaN values with 0, print value counts for OKD (should be even-split with 1408 of each class)

In [11]:
df.fillna(0, inplace=True)
df[['OKD']].value_counts()


Unnamed: 0_level_0,count
OKD,Unnamed: 1_level_1
0,1408
1,1408


## Train AutoML Classification Instance

### Explore different types of sci-kit learn models to find the best model for this specific use-case

- Install and Import libraries


In [13]:
!pip install flaml

Collecting flaml
  Downloading FLAML-2.3.1-py3-none-any.whl.metadata (16 kB)
Downloading FLAML-2.3.1-py3-none-any.whl (313 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.3/313.3 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: flaml
Successfully installed flaml-2.3.1


In [14]:

from flaml import AutoML
from sklearn.model_selection import train_test_split
from sklearn.exceptions import ConvergenceWarning
from sklearn.metrics import accuracy_score, classification_report
import warnings

Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.



- Create an 80/20 train/test split based on feature points and target OKD


In [15]:
warnings.filterwarnings('ignore', category=ConvergenceWarning) #ignore warnings about iterations of non-converging models

features = df.drop(columns=['filename', 'OKD', 'NOKD']) #keypoint data
target = df['OKD'] #train to predict OKD

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=22, stratify=target, shuffle=True) #80/20 train/test split

- Train AutoML classifier for 6 minutes optimizing for accuracy


In [16]:
model = AutoML()

model.fit(X_train, y_train, task='classification', metric='accuracy', time_budget=360) #train classifier for 6 minutes optimizing for accuracy


print(f"Estimator: {model.best_estimator}")
print(f"Config: {model.best_config}")



[flaml.automl.logger: 09-28 04:04:01] {1728} INFO - task = classification
[flaml.automl.logger: 09-28 04:04:01] {1739} INFO - Evaluation method: cv
[flaml.automl.logger: 09-28 04:04:01] {1838} INFO - Minimizing error metric: 1-accuracy
[flaml.automl.logger: 09-28 04:04:01] {1955} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost', 'extra_tree', 'xgb_limitdepth', 'sgd', 'lrl1']
[flaml.automl.logger: 09-28 04:04:01] {2258} INFO - iteration 0, current learner lgbm
[flaml.automl.logger: 09-28 04:04:01] {2393} INFO - Estimated sufficient time budget=1180s. Estimated necessary time budget=27s.
[flaml.automl.logger: 09-28 04:04:01] {2442} INFO -  at 0.2s,	estimator lgbm's best error=0.3117,	best estimator lgbm's best error=0.3117
[flaml.automl.logger: 09-28 04:04:01] {2258} INFO - iteration 1, current learner lgbm
[flaml.automl.logger: 09-28 04:04:01] {2442} INFO -  at 0.3s,	estimator lgbm's best error=0.3095,	best estimator lgbm's best error=0.3095
[flaml.automl.logger: 09-28

INFO:flaml.tune.searcher.blendsearch:No low-cost partial config given to the search algorithm. For cost-frugal search, consider providing low-cost values for cost-related hps via 'low_cost_partial_config'. More info can be found at https://microsoft.github.io/FLAML/docs/FAQ#about-low_cost_partial_config-in-tune


[flaml.automl.logger: 09-28 04:04:01] {2442} INFO -  at 0.6s,	estimator sgd's best error=0.4840,	best estimator lgbm's best error=0.2949
[flaml.automl.logger: 09-28 04:04:01] {2258} INFO - iteration 5, current learner lgbm
[flaml.automl.logger: 09-28 04:04:01] {2442} INFO -  at 0.8s,	estimator lgbm's best error=0.2949,	best estimator lgbm's best error=0.2949
[flaml.automl.logger: 09-28 04:04:01] {2258} INFO - iteration 6, current learner lgbm
[flaml.automl.logger: 09-28 04:04:02] {2442} INFO -  at 0.9s,	estimator lgbm's best error=0.2673,	best estimator lgbm's best error=0.2673
[flaml.automl.logger: 09-28 04:04:02] {2258} INFO - iteration 7, current learner lgbm
[flaml.automl.logger: 09-28 04:04:02] {2442} INFO -  at 1.0s,	estimator lgbm's best error=0.2673,	best estimator lgbm's best error=0.2673
[flaml.automl.logger: 09-28 04:04:02] {2258} INFO - iteration 8, current learner lgbm
[flaml.automl.logger: 09-28 04:04:02] {2442} INFO -  at 1.0s,	estimator lgbm's best error=0.2611,	best es

INFO:flaml.tune.searcher.blendsearch:No low-cost partial config given to the search algorithm. For cost-frugal search, consider providing low-cost values for cost-related hps via 'low_cost_partial_config'. More info can be found at https://microsoft.github.io/FLAML/docs/FAQ#about-low_cost_partial_config-in-tune


[flaml.automl.logger: 09-28 04:08:17] {2442} INFO -  at 256.2s,	estimator lrl1's best error=0.3632,	best estimator lgbm's best error=0.1976
[flaml.automl.logger: 09-28 04:08:17] {2258} INFO - iteration 160, current learner lrl1
[flaml.automl.logger: 09-28 04:08:17] {2442} INFO -  at 256.7s,	estimator lrl1's best error=0.3632,	best estimator lgbm's best error=0.1976
[flaml.automl.logger: 09-28 04:08:17] {2258} INFO - iteration 161, current learner lrl1
[flaml.automl.logger: 09-28 04:08:19] {2442} INFO -  at 257.9s,	estimator lrl1's best error=0.3579,	best estimator lgbm's best error=0.1976
[flaml.automl.logger: 09-28 04:08:19] {2258} INFO - iteration 162, current learner sgd
[flaml.automl.logger: 09-28 04:08:19] {2442} INFO -  at 258.0s,	estimator sgd's best error=0.4187,	best estimator lgbm's best error=0.1976
[flaml.automl.logger: 09-28 04:08:19] {2258} INFO - iteration 163, current learner xgboost
[flaml.automl.logger: 09-28 04:08:20] {2442} INFO -  at 258.8s,	estimator xgboost's bes

- Print Accuracy and Classification Report

In [17]:
y_test_predict = model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_test_predict)
print(f"Test Accuracy: {test_accuracy:.3f}")
print(f"Classification: {classification_report(y_test, y_test_predict)}")


Test Accuracy: 0.794
Classification:               precision    recall  f1-score   support

           0       0.81      0.77      0.79       282
           1       0.78      0.82      0.80       282

    accuracy                           0.79       564
   macro avg       0.79      0.79      0.79       564
weighted avg       0.79      0.79      0.79       564



## Process Example Video to test for OKD Predictions

### Creates overall function for processing and predicting

- Import libraries


In [18]:
import numpy as np
from tqdm import tqdm
from scripts.load_tools import load_model
import matplotlib.pyplot as plt

warnings.filterwarnings("ignore", message=r".*X does not have valid feature names.*")

- Define processing function

In [19]:
# create function to process individual video for OKD given models
def process_okd_video(video_path, pose_model, phc_model, model, output_path=None, batch_size=4) -> list[int]:
    cap = cv2.VideoCapture(video_path)
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

    if output_path: # save video with predictions if output path is specified
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))

    okd_predictions = []
    frames = []

    for _ in tqdm(range(0, total_frames, batch_size), desc="Processing batches"): # process video in batches of frames for quicker processing
        batch_frames = []
        for _ in range(batch_size):
            ret, frame = cap.read()
            if not ret:
                break
            batch_frames.append(frame)

        if not batch_frames:
            break

        # perform detections on batch
        phc_results = phc_model(batch_frames, device='mps', verbose=False)

        for i, frame in enumerate(batch_frames):
            catcher_box = None
            for box in phc_results[i].boxes:
                cls = int(box.cls)
                if cls == 2:
                    catcher_box = box.xyxy[0].cpu().numpy() # extract catcher box coordinates
                    break

            if catcher_box is None:
                okd_predictions.append(0)
                frames.append(frame)
                continue

             # predict pose within catcher's box
            x1, y1, x2, y2 = map(int, catcher_box)
            catcher_frame = frame[y1:y2, x1:x2]
            pose_results = pose_model(catcher_frame, device='mps', verbose=False, conf=0.5)[0]

            pose_points = []
            for keypoints in pose_results.keypoints:
                for point in keypoints.xyn[0].cpu().numpy():
                    pose_points.extend(point)


            # pad pose points for expected length
            pose_points = pose_points[:34] + [0] * (34 - len(pose_points))

            okd_pred = model.predict(np.array(pose_points).reshape(1, -1))[0] # predict with classifier model
            okd_predictions.append(okd_pred)

            if output_path:
                cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)
                cv2.putText(frame, "Catcher", (x1, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1)
                cv2.putText(frame, f"OKD: {'Yes' if okd_pred == 1 else 'No'}", (10, 30),
                            cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0) if okd_pred == 1 else (0, 0, 255), 2)

            frames.append(frame)

    if output_path:
        for frame in frames:
            out.write(frame)

    cap.release()
    if output_path:
        out.release()

    return okd_predictions

- Load Pose, PHC, and Classifier models



In [20]:
pose_model = YOLO("yolov8l-pose.pt")
phc_model = YOLO(load_model("phc_detector"))


Downloading models/pitcher_hitter_catcher_detector/model_weights/pitcher_hitter_catcher_detector_v3.pt: 100%|██████████| 87.6M/87.6M [00:04<00:00, 19.5MiB/s]


Model downloaded successfully: models/pitcher_hitter_catcher_detector/model_weights/pitcher_hitter_catcher_detector_v3.pt


- Process video in frame batches for quicker processing
- Predict OKD for a given frame (futue iterations need to identify where pitch starts)
- Save video with predictions (if given an output path)

In [21]:
video_path = "assets/example_broadcast_video.mp4"
output_path = "test_okd.mp4"

okd_predictions = process_okd_video(video_path, pose_model, phc_model, model, output_path, batch_size=4)

okd_count = sum(okd_predictions)
total_frames = len(okd_predictions)

Processing batches:   0%|          | 0/98 [00:00<?, ?it/s]



Processing batches: 100%|██████████| 98/98 [07:05<00:00,  4.35s/it]


- Print percentage of frames for video predicted as OKD

In [23]:
print(f"Predicted OKD in {okd_count/total_frames:.1%} of {total_frames} frames.")

Predicted OKD in 69.4% of 392 frames.


##**CONGRATS!** You utilized the OKD / NOKD datase and pose estimation to train a classifier to predict if a catcher is in a one-knee down position!

### The classifier model and it's relavent information can be found in the models/okd_nokd_classifier folder.