# OKD-NOKD Dataset and Pose Detection Use-Case
---
### Exploring the classification of catcher positioning (one-knee down vs. both knees down) by utilizing the pose of the catcher for datapoints with a classification model.



## Pre-work

Let's make sure that we have access to GPU. We can use `nvidia-smi` command to do that. In case of any problems navigate to `Edit` -> `Notebook settings` -> `Hardware accelerator`, set it to `GPU`, and then click `Save`.

In [1]:
!nvidia-smi

Thu Oct 17 04:12:31 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   48C    P8              10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

## Clone BaseballCV Repo, set as Current Directory and Install Requirements

In [2]:
!git clone https://github.com/dylandru/BaseballCV.git
%cd BaseballCV
!pip install -r requirements.txt

Cloning into 'BaseballCV'...
remote: Enumerating objects: 781, done.[K
remote: Counting objects: 100% (214/214), done.[K
remote: Compressing objects: 100% (190/190), done.[K
remote: Total 781 (delta 69), reused 68 (delta 20), pack-reused 567 (from 1)[K
Receiving objects: 100% (781/781), 349.30 MiB | 35.37 MiB/s, done.
Resolving deltas: 100% (297/297), done.
/content/BaseballCV
Collecting bs4==0.0.2 (from -r requirements.txt (line 1))
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Collecting pip==24.0 (from -r requirements.txt (line 4))
  Downloading pip-24.0-py3-none-any.whl.metadata (3.6 kB)
Collecting pybaseball==2.2.7 (from -r requirements.txt (line 5))
  Downloading pybaseball-2.2.7-py3-none-any.whl.metadata (11 kB)
Collecting pytest==8.3.2 (from -r requirements.txt (line 6))
  Downloading pytest-8.3.2-py3-none-any.whl.metadata (7.5 kB)
Collecting ultralytics>=8.2.90 (from -r requirements.txt (line 7))
  Downloading ultralytics-8.3.15-py3-none-any.whl.metadat


## Data Prep from Pose Points for OKD/NOKD Classification

- Import required libraries


In [3]:
import cv2
import os
import pandas as pd
from ultralytics import YOLO
from baseballcv.functions import LoadTools

# Initialize LoadTools class
load_tools = LoadTools()


Creating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.


- Load pose model and dataset OKD_NOKD



In [4]:
pose_model = YOLO("yolov8l-pose.pt")

load_tools.load_dataset("okd_nokd")

# input folder with OKD and NOKD classification folders
input_folder = "OKD_NOKD/data/"

Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8l-pose.pt to 'yolov8l-pose.pt'...


100%|██████████| 85.3M/85.3M [00:00<00:00, 113MB/s]
Downloading OKD_NOKD: 100%|██████████| 113M/113M [00:02<00:00, 43.9MiB/s]


Dataset downloaded and extracted to OKD_NOKD


- Apply YOLO pose detection (using large v8 model) for keypoints


In [5]:

# Create empty pose data list
pose_data = []

# columns for df
columns = ['filename', 'OKD', 'NOKD']
for i in range(17):  # Assuming 17 keypoints
    columns.extend([f'pose_x_{i}', f'pose_y_{i}'])

# Process each subfolder
for subfolder in ['OKD', 'NOKD']:
    subfolder_path = os.path.join(input_folder, subfolder)

    for filename in os.listdir(subfolder_path):
        if filename.lower().endswith(('.jpg', '.jpeg')):
            input_path = os.path.join(subfolder_path, filename)

            img = cv2.imread(input_path)
            if img is None:
                print(f"{input_path} failed to load.")
                continue

            # run pose with 10% confidence min threshold
            pose_results = pose_model(img, device='cuda', verbose=True, conf=0.1)[0]
            pose_points = pose_results.keypoints[0].xyn[0].cpu().numpy().tolist()
            pose_points = [(float(x), float(y)) for x, y in pose_points]

            # Determine OKD or NOKD based on classification folder
            okd = 1 if subfolder == 'OKD' else 0
            nokd = 1 - okd

            row = [filename, okd, nokd]

            for i in range(17):
                if i < len(pose_points):
                    row.extend(pose_points[i])
                else:
                    row.extend([None, None])  # None if missing

            pose_data.append(row)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
0: 640x640 4 persons, 39.5ms
Speed: 2.3ms preprocess, 39.5ms inference, 1.6ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 10 persons, 37.1ms
Speed: 2.3ms preprocess, 37.1ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 6 persons, 37.3ms
Speed: 2.9ms preprocess, 37.3ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 5 persons, 39.1ms
Speed: 2.6ms preprocess, 39.1ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 4 persons, 36.7ms
Speed: 2.8ms preprocess, 36.7ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 3 persons, 38.1ms
Speed: 2.3ms preprocess, 38.1ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 6 persons, 38.6ms
Speed: 2.1ms preprocess, 38.6ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 3 persons, 37.1ms
Speed: 2.8ms pre

- Create dataframe for analysis

In [6]:

df = pd.DataFrame(pose_data, columns=columns)
print(f"Rows: {df.shape[0]} | Columns: {df.shape[1]}")
df.head()


Rows: 2816 | Columns: 37


Unnamed: 0,filename,OKD,NOKD,pose_x_0,pose_y_0,pose_x_1,pose_y_1,pose_x_2,pose_y_2,pose_x_3,...,pose_x_12,pose_y_12,pose_x_13,pose_y_13,pose_x_14,pose_y_14,pose_x_15,pose_y_15,pose_x_16,pose_y_16
0,000081.jpg,1,0,0.117915,0.113611,0.127501,0.099366,0.101761,0.098568,0.0,...,0.072171,0.403215,0.186927,0.543835,0.08575,0.579695,0.119016,0.708583,0.0,0.713879
1,002852.jpg,1,0,0.466619,0.331017,0.483839,0.317607,0.445003,0.315951,0.509351,...,0.436511,0.622762,0.687442,0.678112,0.30495,0.538863,0.630653,0.726693,0.345687,0.714473
2,005277.jpg,1,0,0.489538,0.323467,0.509563,0.309117,0.468071,0.307666,0.539809,...,0.441234,0.616522,0.709722,0.678872,0.312557,0.562543,0.556522,0.732833,0.411327,0.71587
3,004041.jpg,1,0,0.209192,0.071663,0.0,0.0,0.196286,0.060132,0.0,...,0.181014,0.367919,0.23509,0.54865,0.166215,0.572018,0.194912,0.710301,0.102112,0.743998
4,000697.jpg,1,0,0.497751,0.359748,0.51193,0.346465,0.481172,0.347219,0.535631,...,0.46259,0.588794,0.613503,0.655703,0.367108,0.532407,0.617004,0.677573,0.429643,0.672072


### Fill NaN values with 0, print value counts for OKD (should be even-split with 1408 of each class)

In [7]:
df.fillna(0, inplace=True)
df[['OKD']].value_counts()


Unnamed: 0_level_0,count
OKD,Unnamed: 1_level_1
0,1408
1,1408


## Train AutoML Classification Instance

### Explore different types of sci-kit learn models to find the best model for this specific use-case

- Install and Import libraries


In [8]:
!pip install flaml

Collecting flaml
  Downloading FLAML-2.3.1-py3-none-any.whl.metadata (16 kB)
Downloading FLAML-2.3.1-py3-none-any.whl (313 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.3/313.3 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: flaml
Successfully installed flaml-2.3.1


In [9]:

from flaml import AutoML
from sklearn.model_selection import train_test_split
from sklearn.exceptions import ConvergenceWarning
from sklearn.metrics import accuracy_score, classification_report
import warnings

Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.



- Create an 80/20 train/test split based on feature points and target OKD


In [10]:
warnings.filterwarnings('ignore', category=ConvergenceWarning) #ignore warnings about iterations of non-converging models

features = df.drop(columns=['filename', 'OKD', 'NOKD']) #keypoint data
target = df['OKD'] #train to predict OKD

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=22, stratify=target, shuffle=True) #80/20 train/test split

- Train AutoML classifier for 6 minutes optimizing for accuracy


In [11]:
model = AutoML()

model.fit(X_train, y_train, task='classification', metric='accuracy', time_budget=360) #train classifier for 6 minutes optimizing for accuracy


print(f"Estimator: {model.best_estimator}")
print(f"Config: {model.best_config}")



[flaml.automl.logger: 10-17 04:18:37] {1728} INFO - task = classification
[flaml.automl.logger: 10-17 04:18:37] {1739} INFO - Evaluation method: cv
[flaml.automl.logger: 10-17 04:18:37] {1838} INFO - Minimizing error metric: 1-accuracy
[flaml.automl.logger: 10-17 04:18:37] {1955} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost', 'extra_tree', 'xgb_limitdepth', 'sgd', 'lrl1']
[flaml.automl.logger: 10-17 04:18:37] {2258} INFO - iteration 0, current learner lgbm
[flaml.automl.logger: 10-17 04:18:37] {2393} INFO - Estimated sufficient time budget=1185s. Estimated necessary time budget=27s.
[flaml.automl.logger: 10-17 04:18:37] {2442} INFO -  at 0.2s,	estimator lgbm's best error=0.3104,	best estimator lgbm's best error=0.3104
[flaml.automl.logger: 10-17 04:18:37] {2258} INFO - iteration 1, current learner lgbm
[flaml.automl.logger: 10-17 04:18:37] {2442} INFO -  at 0.2s,	estimator lgbm's best error=0.3104,	best estimator lgbm's best error=0.3104
[flaml.automl.logger: 10-17

INFO:flaml.tune.searcher.blendsearch:No low-cost partial config given to the search algorithm. For cost-frugal search, consider providing low-cost values for cost-related hps via 'low_cost_partial_config'. More info can be found at https://microsoft.github.io/FLAML/docs/FAQ#about-low_cost_partial_config-in-tune


[flaml.automl.logger: 10-17 04:18:37] {2442} INFO -  at 0.4s,	estimator sgd's best error=0.4667,	best estimator lgbm's best error=0.2922
[flaml.automl.logger: 10-17 04:18:37] {2258} INFO - iteration 4, current learner lgbm
[flaml.automl.logger: 10-17 04:18:37] {2442} INFO -  at 0.6s,	estimator lgbm's best error=0.2669,	best estimator lgbm's best error=0.2669
[flaml.automl.logger: 10-17 04:18:37] {2258} INFO - iteration 5, current learner lgbm
[flaml.automl.logger: 10-17 04:18:37] {2442} INFO -  at 0.7s,	estimator lgbm's best error=0.2669,	best estimator lgbm's best error=0.2669
[flaml.automl.logger: 10-17 04:18:37] {2258} INFO - iteration 6, current learner lgbm
[flaml.automl.logger: 10-17 04:18:37] {2442} INFO -  at 0.8s,	estimator lgbm's best error=0.2651,	best estimator lgbm's best error=0.2651
[flaml.automl.logger: 10-17 04:18:37] {2258} INFO - iteration 7, current learner lgbm
[flaml.automl.logger: 10-17 04:18:37] {2442} INFO -  at 0.8s,	estimator lgbm's best error=0.2651,	best es

INFO:flaml.tune.searcher.blendsearch:No low-cost partial config given to the search algorithm. For cost-frugal search, consider providing low-cost values for cost-related hps via 'low_cost_partial_config'. More info can be found at https://microsoft.github.io/FLAML/docs/FAQ#about-low_cost_partial_config-in-tune


[flaml.automl.logger: 10-17 04:20:29] {2442} INFO -  at 112.0s,	estimator lrl1's best error=0.3543,	best estimator xgboost's best error=0.2047
[flaml.automl.logger: 10-17 04:20:29] {2258} INFO - iteration 188, current learner lrl1
[flaml.automl.logger: 10-17 04:20:29] {2442} INFO -  at 112.4s,	estimator lrl1's best error=0.3543,	best estimator xgboost's best error=0.2047
[flaml.automl.logger: 10-17 04:20:29] {2258} INFO - iteration 189, current learner lrl1
[flaml.automl.logger: 10-17 04:20:30] {2442} INFO -  at 113.4s,	estimator lrl1's best error=0.3543,	best estimator xgboost's best error=0.2047
[flaml.automl.logger: 10-17 04:20:30] {2258} INFO - iteration 190, current learner xgboost
[flaml.automl.logger: 10-17 04:20:31] {2442} INFO -  at 114.6s,	estimator xgboost's best error=0.2047,	best estimator xgboost's best error=0.2047
[flaml.automl.logger: 10-17 04:20:31] {2258} INFO - iteration 191, current learner lrl1
[flaml.automl.logger: 10-17 04:20:32] {2442} INFO -  at 115.5s,	estima

- Print Accuracy and Classification Report

In [12]:
y_test_predict = model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_test_predict)
print(f"Test Accuracy: {test_accuracy:.3f}")
print(f"Classification: {classification_report(y_test, y_test_predict)}")


Test Accuracy: 0.817
Classification:               precision    recall  f1-score   support

           0       0.85      0.77      0.81       282
           1       0.79      0.86      0.83       282

    accuracy                           0.82       564
   macro avg       0.82      0.82      0.82       564
weighted avg       0.82      0.82      0.82       564



## Process Example Video to test for OKD Predictions

### Creates overall function for processing and predicting

- Import libraries


In [13]:
import numpy as np
from tqdm import tqdm

import matplotlib.pyplot as plt

warnings.filterwarnings("ignore", message=r".*X does not have valid feature names.*")

- Define processing function

In [14]:
# create function to process individual video for OKD given models
def process_okd_video(video_path, pose_model, phc_model, model, output_path=None, batch_size=4) -> list[int]:
    cap = cv2.VideoCapture(video_path)
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

    if output_path: # save video with predictions if output path is specified
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))

    okd_predictions = []
    frames = []

    for _ in tqdm(range(0, total_frames, batch_size), desc="Processing batches"): # process video in batches of frames for quicker processing
        batch_frames = []
        for _ in range(batch_size):
            ret, frame = cap.read()
            if not ret:
                break
            batch_frames.append(frame)

        if not batch_frames:
            break

        # perform detections on batch
        phc_results = phc_model(batch_frames, device='mps', verbose=False)

        for i, frame in enumerate(batch_frames):
            catcher_box = None
            for box in phc_results[i].boxes:
                cls = int(box.cls)
                if cls == 2:
                    catcher_box = box.xyxy[0].cpu().numpy() # extract catcher box coordinates
                    break

            if catcher_box is None:
                okd_predictions.append(0)
                frames.append(frame)
                continue

             # predict pose within catcher's box
            x1, y1, x2, y2 = map(int, catcher_box)
            catcher_frame = frame[y1:y2, x1:x2]
            pose_results = pose_model(catcher_frame, device='mps', verbose=False, conf=0.5)[0]

            pose_points = []
            for keypoints in pose_results.keypoints:
                for point in keypoints.xyn[0].cpu().numpy():
                    pose_points.extend(point)


            # pad pose points for expected length
            pose_points = pose_points[:34] + [0] * (34 - len(pose_points))

            okd_pred = model.predict(np.array(pose_points).reshape(1, -1))[0] # predict with classifier model
            okd_predictions.append(okd_pred)

            if output_path:
                cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)
                cv2.putText(frame, "Catcher", (x1, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1)
                cv2.putText(frame, f"OKD: {'Yes' if okd_pred == 1 else 'No'}", (10, 30),
                            cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0) if okd_pred == 1 else (0, 0, 255), 2)

            frames.append(frame)

    if output_path:
        for frame in frames:
            out.write(frame)

    cap.release()
    if output_path:
        out.release()

    return okd_predictions

- Load Pose, PHC, and Classifier models



In [15]:
pose_model = YOLO("yolov8l-pose.pt")
phc_model = YOLO(load_tools.load_model("phc_detector"))


Downloading pitcher_hitter_catcher_detector_v4.pt: 100%|██████████| 87.6M/87.6M [00:02<00:00, 42.2MiB/s]


Model downloaded to models/pitcher_hitter_catcher_detector/model_weights/pitcher_hitter_catcher_detector_v4.pt


- Process video in frame batches for quicker processing
- Predict OKD for a given frame (futue iterations need to identify where pitch starts)
- Save video with predictions (if given an output path)

In [16]:
video_path = "assets/example_broadcast_video.mp4"
output_path = "test_okd.mp4"

okd_predictions = process_okd_video(video_path, pose_model, phc_model, model, output_path, batch_size=4)

okd_count = sum(okd_predictions)
total_frames = len(okd_predictions)

Processing batches:   0%|          | 0/98 [00:00<?, ?it/s]



Processing batches: 100%|██████████| 98/98 [07:55<00:00,  4.85s/it]


- Print percentage of frames for video predicted as OKD

In [17]:
print(f"Predicted OKD in {okd_count/total_frames:.1%} of {total_frames} frames.")

Predicted OKD in 75.3% of 392 frames.


##**CONGRATS!** You utilized the OKD / NOKD datase and pose estimation to train a classifier to predict if a catcher is in a one-knee down position!

### The classifier model and it's relavent information can be found in the models/okd_nokd_classifier folder.