<a href="https://colab.research.google.com/github/codingcat101/TeamAnM_GameSense_AdobeGenSolve_GFG/blob/main/GameSense_AnM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**GameSense**




Final Project for the AdobeGenSolve Hackathon



Created by : Aarchi Kothari & Mukul J (IIT Roorkee)

Crafted with ❤️ for the game of Tennis

**Abstract**

Our project aims to provide automated sports insights using computer vision
and AI/ML algorithms. Centered around two-player sport tennis, the project aims to achieve several key objectives: tracking player movements,detecting pivotal game events, automating score-keeping, and providing detailed
metrics such as rally lengths and player activity.




**How to run**

Prepare a video file with resolution 1280x720

Recommended to run on GPU



---



1. **Importing Requirements**

In [2]:
!pip install catboost




In [3]:
import cv2
import numpy as np
import torch
import torchvision
import pandas as pd
import catboost as ctb
import matplotlib.pyplot as plt
import torch.nn as nn # provides tools to build and train neural networks
from scipy.spatial import distance
from tqdm import tqdm
from scipy.interpolate import CubicSpline
from sympy import Line
from sympy.geometry.point import Point2D






---



2. **Ball Tracking**: Accurately track the location/speed/trajectory of the shuttlecock
or ball throughout the game.

a. **TrackNet Implementation for Tennis Ball Tracking**

TrackNet: A Deep Learning Network for Tracking High-speed and Tiny Objects in Sports Applications

Reference : https://arxiv.org/abs/1907.03698

In [4]:

# Define a Convolutional Block that consists of a Conv2D layer, ReLU activation, and Batch Normalization
class ConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, pad=1, stride=1, bias=True):
        """
        Initialize a convolutional block with Conv2D, ReLU, and BatchNorm.

        Parameters:
        - in_channels: Number of input channels (e.g., 3 for RGB images)
        - out_channels: Number of output channels (filters in the Conv2D layer)
        - kernel_size: Size of the convolution kernel (default is 3)
        - pad: Padding applied to the input (default is 1 to preserve spatial dimensions)
        - stride: Stride of the convolution (default is 1 for regular convolution)
        - bias: Whether to include a bias term in the Conv2D layer (default is True)
        """
        super().__init__()
        self.block = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size, stride=stride, padding=pad, bias=bias),
            nn.ReLU(),
            nn.BatchNorm2d(out_channels)  # Batch normalization to stabilize training
        )

    def forward(self, x):
        """
        Forward pass of the ConvBlock.

        Parameters:
        - x: Input tensor

        Returns:
        - Output tensor after passing through the Conv2D, ReLU, and BatchNorm layers
        """
        return self.block(x)

# Define a deep neural network for ball tracking
class BallTrackerNet(nn.Module):
    def __init__(self, input_channels=3, out_channels=14):
        """
        Initialize the ball tracking network with a series of ConvBlocks, MaxPooling, and Upsampling layers.

        Parameters:
        - input_channels: Number of input channels (default is 3, e.g., RGB image)
        - out_channels: Number of output channels (e.g., for detecting multiple key points)
        """
        super().__init__()
        self.out_channels = out_channels
        self.input_channels = input_channels

        # Define the convolutional layers of the network
        self.conv1 = ConvBlock(in_channels=self.input_channels, out_channels=64)
        self.conv2 = ConvBlock(in_channels=64, out_channels=64)
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)  # Downsample the spatial dimensions by 2
        self.conv3 = ConvBlock(in_channels=64, out_channels=128)
        self.conv4 = ConvBlock(in_channels=128, out_channels=128)
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv5 = ConvBlock(in_channels=128, out_channels=256)
        self.conv6 = ConvBlock(in_channels=256, out_channels=256)
        self.conv7 = ConvBlock(in_channels=256, out_channels=256)
        self.pool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv8 = ConvBlock(in_channels=256, out_channels=512)
        self.conv9 = ConvBlock(in_channels=512, out_channels=512)
        self.conv10 = ConvBlock(in_channels=512, out_channels=512)

        # Define the upsampling layers for increasing spatial resolution
        self.ups1 = nn.Upsample(scale_factor=2)  # Upsample by a factor of 2
        self.conv11 = ConvBlock(in_channels=512, out_channels=256)
        self.conv12 = ConvBlock(in_channels=256, out_channels=256)
        self.conv13 = ConvBlock(in_channels=256, out_channels=256)
        self.ups2 = nn.Upsample(scale_factor=2)
        self.conv14 = ConvBlock(in_channels=256, out_channels=128)
        self.conv15 = ConvBlock(in_channels=128, out_channels=128)
        self.ups3 = nn.Upsample(scale_factor=2)
        self.conv16 = ConvBlock(in_channels=128, out_channels=64)
        self.conv17 = ConvBlock(in_channels=64, out_channels=64)
        self.conv18 = ConvBlock(in_channels=64, out_channels=self.out_channels)  # Final layer

        self._init_weights()  # Initialize the weights of the network

    def forward(self, x):
        """
        Forward pass of the BallTrackerNet.

        Parameters:
        - x: Input tensor (e.g., image or video frame)

        Returns:
        - Output tensor with the same spatial resolution as the input (after upsampling)
        """
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.pool1(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.pool2(x)
        x = self.conv5(x)
        x = self.conv6(x)
        x = self.conv7(x)
        x = self.pool3(x)
        x = self.conv8(x)
        x = self.conv9(x)
        x = self.conv10(x)
        x = self.ups1(x)
        x = self.conv11(x)
        x = self.conv12(x)
        x = self.conv13(x)
        x = self.ups2(x)
        x = self.conv14(x)
        x = self.conv15(x)
        x = self.ups3(x)
        x = self.conv16(x)
        x = self.conv17(x)
        x = self.conv18(x)
        return x

    def _init_weights(self):
        """
        Initialize weights of the network. Conv2D layers are initialized with uniform distribution.
        BatchNorm layers are initialized with constants.
        """
        for module in self.modules():
            if isinstance(module, nn.Conv2d):
                nn.init.uniform_(module.weight, -0.05, 0.05)  # Uniform initialization
                if module.bias is not None:
                    nn.init.constant_(module.bias, 0)

            elif isinstance(module, nn.BatchNorm2d):
                nn.init.constant_(module.weight, 1)  # Initialize BatchNorm weight to 1
                nn.init.constant_(module.bias, 0)   # Initialize BatchNorm bias to 0


b. **Ball Detection using TrackNet Trained Model**

In [5]:


class BallDetector:
    """
    A class to perform ball detection using a pretrained model.

    Attributes:
        model : BallTrackerNet
            A deep learning model for ball tracking.
        device : str
            The device to run the model on ('cuda' or 'cpu').
        width : int
            Width of the resized frame for inference.
        height : int
            Height of the resized frame for inference.

    Methods:
        infer_model(frames):
            Run the pretrained model on consecutive frames to detect ball positions.
        postprocess(feature_map, prev_pred, scale=2, max_dist=80):
            Process the model output to extract ball coordinates from feature maps.
    """

    def __init__(self, path_model=None, device='cuda'):
        """
        Initializes the BallDetector class by loading a pretrained model if a path is provided.

        Parameters:
            path_model : str, optional
                Path to the pretrained model weights.
            device : str, optional
                The device for running the model ('cuda' or 'cpu').
        """
        self.model = BallTrackerNet(input_channels=9, out_channels=256)  # Initialize the ball tracking model
        self.device = device  # Set the device (cuda or cpu)

        # Load model weights if path is provided
        if path_model:
            self.model.load_state_dict(torch.load(path_model, map_location=device, weights_only=True))
            self.model = self.model.to(device)  # Move the model to the specified device
            self.model.eval()  # Set the model to evaluation mode

        # Set default frame dimensions for resizing
        self.width = 640
        self.height = 360

    def infer_model(self, frames):
        """
        Run the pretrained model on a consecutive list of frames.

        Parameters:
            frames : list
                List of consecutive video frames.

        Returns:
            ball_track : list
                List of detected ball positions as (x, y) coordinates for each frame.
        """
        ball_track = [(None, None)]*2  # Initialize tracking result with placeholders for first two frames
        prev_pred = [None, None]  # Store previous prediction

        # Loop through the list of frames starting from the 3rd frame
        for num in tqdm(range(2, len(frames))):
            # Resize the current, previous, and two-frame prior frames
            img = cv2.resize(frames[num], (self.width, self.height))
            img_prev = cv2.resize(frames[num-1], (self.width, self.height))
            img_preprev = cv2.resize(frames[num-2], (self.width, self.height))

            # Concatenate frames to form a single input
            imgs = np.concatenate((img, img_prev, img_preprev), axis=2)
            imgs = imgs.astype(np.float32)/255.0  # Normalize pixel values to [0,1]
            imgs = np.rollaxis(imgs, 2, 0)  # Change the axes order for the model input format

            # Prepare the input tensor
            inp = np.expand_dims(imgs, axis=0)

            # Pass input through the model and get the output
            out = self.model(torch.from_numpy(inp).float().to(self.device))
            output = out.argmax(dim=1).detach().cpu().numpy()  # Get the class prediction from model output

            # Postprocess the output to obtain the predicted ball position
            x_pred, y_pred = self.postprocess(output, prev_pred)
            prev_pred = [x_pred, y_pred]  # Update previous prediction
            ball_track.append((x_pred, y_pred))  # Store the result

        return ball_track  # Return list of detected ball positions

    def postprocess(self, feature_map, prev_pred, scale=2, max_dist=80):
        """
        Postprocess the output feature map from the model to get ball coordinates.

        Parameters:
            feature_map : np.array
                Output feature map from the model with shape (1,360,640).
            prev_pred : list
                Previous ball prediction [x, y].
            scale : int, optional
                Scale factor to convert to original frame size (default is 2).
            max_dist : int, optional
                Maximum allowable distance from previous detection to remove outliers (default is 80).

        Returns:
            x, y : float
                The coordinates of the detected ball position in the frame.
        """
        feature_map *= 255  # Scale feature map to [0, 255] for binary thresholding
        feature_map = feature_map.reshape((self.height, self.width))  # Reshape to original height and width
        feature_map = feature_map.astype(np.uint8)  # Convert to unsigned 8-bit integer

        # Apply binary thresholding to isolate potential ball regions
        ret, heatmap = cv2.threshold(feature_map, 127, 255, cv2.THRESH_BINARY)

        # Detect circles using Hough Transform
        circles = cv2.HoughCircles(heatmap, cv2.HOUGH_GRADIENT, dp=1, minDist=1, param1=50, param2=2, minRadius=2, maxRadius=7)
        x, y = None, None

        # If any circles are detected
        if circles is not None:
            # If there is a previous prediction
            if prev_pred[0]:
                # Iterate over detected circles and select one close to the previous prediction
                for i in range(len(circles[0])):
                    x_temp = circles[0][i][0] * scale  # Scale detected x coordinate
                    y_temp = circles[0][i][1] * scale  # Scale detected y coordinate
                    dist = distance.euclidean((x_temp, y_temp), prev_pred)  # Calculate distance from previous prediction

                    # If the distance is within the allowed range, update the prediction
                    if dist < max_dist:
                        x, y = x_temp, y_temp
                        break
            else:
                # If no previous prediction, take the first detected circle
                x = circles[0][0][0] * scale
                y = circles[0][0][1] * scale

        return x, y  # Return the detected coordinates




---



3.  **Bounce Detection with CatBoost**









In [6]:


class BounceDetector:
    """
    A class to detect ball bounces in a sequence of frames using a CatBoost regression model.

    Attributes:
        model : CatBoostRegressor
            A CatBoost regression model for detecting bounces.
        threshold : float
            A threshold value for classifying a bounce.

    Methods:
        load_model(path_model):
            Load a pretrained CatBoost model from a file.
        prepare_features(x_ball, y_ball):
            Prepare the input features from the ball's x and y coordinates.
        predict(x_ball, y_ball, smooth=True):
            Predict bounces using the model, with an optional smoothing step.
        smooth_predictions(x_ball, y_ball):
            Apply smoothing to the ball's position coordinates to fill missing values.
        extrapolate(x_coords, y_coords):
            Extrapolate missing ball coordinates using cubic spline interpolation.
        postprocess(ind_bounce, preds):
            Post-process the bounce predictions to filter consecutive bounce detections.
    """

    def __init__(self, path_model=None):
        """
        Initializes the BounceDetector class and loads a model if a path is provided.

        Parameters:
            path_model : str, optional
                Path to the pretrained CatBoost model file.
        """
        self.model = ctb.CatBoostRegressor()  # Initialize CatBoost regressor
        self.threshold = 0.45  # Set bounce prediction threshold

        if path_model:
            self.load_model(path_model)  # Load model if the path is provided

    def load_model(self, path_model):
        """
        Load a pretrained CatBoost model from a file.

        Parameters:
            path_model : str
                Path to the pretrained model file.
        """
        self.model.load_model(path_model)

    def prepare_features(self, x_ball, y_ball):
        """
        Prepare features for the model using the ball's x and y coordinates.

        Parameters:
            x_ball : list
                List of x coordinates of the ball.
            y_ball : list
                List of y coordinates of the ball.

        Returns:
            features : DataFrame
                A DataFrame containing the prepared features for prediction.
            list of frames : list
                List of frame numbers corresponding to the coordinates.
        """
        # Create a DataFrame to store frame number, x, and y coordinates
        labels = pd.DataFrame({'frame': range(len(x_ball)), 'x-coordinate': x_ball, 'y-coordinate': y_ball})

        # Generate lagged features to capture previous and next coordinates
        num = 3  # Number of lag frames to generate
        eps = 1e-15  # Small constant to avoid division by zero
        for i in range(1, num):
            # Create lag features for x and y coordinates
            labels['x_lag_{}'.format(i)] = labels['x-coordinate'].shift(i)
            labels['x_lag_inv_{}'.format(i)] = labels['x-coordinate'].shift(-i)
            labels['y_lag_{}'.format(i)] = labels['y-coordinate'].shift(i)
            labels['y_lag_inv_{}'.format(i)] = labels['y-coordinate'].shift(-i)

            # Calculate differences and ratios for lagged features
            labels['x_diff_{}'.format(i)] = abs(labels['x_lag_{}'.format(i)] - labels['x-coordinate'])
            labels['y_diff_{}'.format(i)] = labels['y_lag_{}'.format(i)] - labels['y-coordinate']
            labels['x_diff_inv_{}'.format(i)] = abs(labels['x_lag_inv_{}'.format(i)] - labels['x-coordinate'])
            labels['y_diff_inv_{}'.format(i)] = labels['y_lag_inv_{}'.format(i)] - labels['y-coordinate']
            labels['x_div_{}'.format(i)] = abs(labels['x_diff_{}'.format(i)]/(labels['x_diff_inv_{}'.format(i)] + eps))
            labels['y_div_{}'.format(i)] = labels['y_diff_{}'.format(i)]/(labels['y_diff_inv_{}'.format(i)] + eps)

        # Remove rows with missing values due to shifting
        for i in range(1, num):
            labels = labels[labels['x_lag_{}'.format(i)].notna()]
            labels = labels[labels['x_lag_inv_{}'.format(i)].notna()]
        labels = labels[labels['x-coordinate'].notna()]

        # Define feature column names for x and y coordinates
        colnames_x = ['x_diff_{}'.format(i) for i in range(1, num)] + \
                     ['x_diff_inv_{}'.format(i) for i in range(1, num)] + \
                     ['x_div_{}'.format(i) for i in range(1, num)]
        colnames_y = ['y_diff_{}'.format(i) for i in range(1, num)] + \
                     ['y_diff_inv_{}'.format(i) for i in range(1, num)] + \
                     ['y_div_{}'.format(i) for i in range(1, num)]
        colnames = colnames_x + colnames_y

        # Extract features for prediction
        features = labels[colnames]
        return features, list(labels['frame'])

    def predict(self, x_ball, y_ball, smooth=True):
        """
        Predict the frames where a bounce occurs using the model.

        Parameters:
            x_ball : list
                List of x coordinates of the ball.
            y_ball : list
                List of y coordinates of the ball.
            smooth : bool, optional
                Whether to smooth the ball coordinates before prediction (default is True).

        Returns:
            set
                A set of frame numbers where a bounce is detected.
        """
        # Apply smoothing to the ball's coordinates if required
        if smooth:
            x_ball, y_ball = self.smooth_predictions(x_ball, y_ball)

        # Prepare features for prediction
        features, num_frames = self.prepare_features(x_ball, y_ball)

        # Predict bounce probabilities using the model
        preds = self.model.predict(features)

        # Find frames where bounce probability exceeds the threshold
        ind_bounce = np.where(preds > self.threshold)[0]

        # Post-process bounce predictions to remove consecutive detections
        if len(ind_bounce) > 0:
            ind_bounce = self.postprocess(ind_bounce, preds)

        # Get the frame numbers where bounces are predicted
        frames_bounce = [num_frames[x] for x in ind_bounce]
        return set(frames_bounce)

    def smooth_predictions(self, x_ball, y_ball):
        """
        Smooth ball coordinates by filling missing values using extrapolation.

        Parameters:
            x_ball : list
                List of x coordinates of the ball.
            y_ball : list
                List of y coordinates of the ball.

        Returns:
            x_ball : list
                Smoothed x coordinates.
            y_ball : list
                Smoothed y coordinates.
        """
        is_none = [int(x is None) for x in x_ball]  # Flag missing values
        interp = 5  # Number of previous frames to use for extrapolation
        counter = 0

        # Iterate through frames to smooth missing values
        for num in range(interp, len(x_ball)-1):
            if not x_ball[num] and sum(is_none[num-interp:num]) == 0 and counter < 3:
                # Extrapolate missing values using previous frames
                x_ext, y_ext = self.extrapolate(x_ball[num-interp:num], y_ball[num-interp:num])
                x_ball[num], y_ball[num] = x_ext, y_ext  # Update with extrapolated values
                is_none[num] = 0

                # If the next value exists, check if extrapolated value is valid
                if x_ball[num+1]:
                    dist = distance.euclidean((x_ext, y_ext), (x_ball[num+1], y_ball[num+1]))
                    if dist > 80:
                        # Mark next frame as None if distance is too large
                        x_ball[num+1], y_ball[num+1], is_none[num+1] = None, None, 1
                counter += 1
            else:
                counter = 0
        return x_ball, y_ball

    def extrapolate(self, x_coords, y_coords):
        """
        Extrapolate missing ball coordinates using cubic spline interpolation.

        Parameters:
            x_coords : list
                List of x coordinates for interpolation.
            y_coords : list
                List of y coordinates for interpolation.

        Returns:
            x_ext : float
                Extrapolated x coordinate.
            y_ext : float
                Extrapolated y coordinate.
        """
        xs = list(range(len(x_coords)))  # Frame indices for the coordinates

        # Apply cubic spline interpolation to x and y coordinates
        func_x = CubicSpline(xs, x_coords, bc_type='natural')
        func_y = CubicSpline(xs, y_coords, bc_type='natural')

        # Extrapolate to the next frame
        x_ext = func_x(len(x_coords))
        y_ext = func_y(len(x_coords))

        return float(x_ext), float(y_ext)

    def postprocess(self, ind_bounce, preds):
        """
        Post-process bounce predictions to filter out consecutive detections.

        Parameters:
            ind_bounce : list
                List of indices where bounces are predicted.
            preds : list
                List of prediction probabilities for each frame.

        Returns:
            ind_bounce_filtered : list
                List of filtered bounce prediction indices.
        """
        ind_bounce_filtered = [ind_bounce[0]]  # Keep the first bounce

        # Iterate through bounce predictions
        for i in range(1, len(ind_bounce)):
            if (ind_bounce[i] - ind_bounce[i-1]) != 1:
                # Keep non-consecutive bounce predictions
                ind_bounce_filtered.append(ind_bounce[i])
            elif preds[ind_bounce[i]] > preds[ind_bounce[i-1]]:
                # Keep the bounce with higher probability for consecutive frames
                ind_bounce_filtered[-1] = ind_bounce[i]

        return ind_bounce_filtered




---



4. **Play Area Marking**: Accurately identify and mark play area boundaries, including
the net, court boundaries, and other relevant critical game areas.

a. **Line Detection and Refinement in Images**

In [7]:

# This script provides functions for detecting lines in an image, merging close lines,
# and refining key points by finding the intersection of the detected lines.


def line_intersection(line1, line2):
    """
    Find the intersection point of two lines.

    Parameters:
    - line1: List or tuple containing four elements representing the coordinates (x1, y1, x2, y2) of the first line.
    - line2: List or tuple containing four elements representing the coordinates (x1, y1, x2, y2) of the second line.

    Returns:
    - point: The intersection point as a tuple (x, y) if the lines intersect, otherwise None.
    """
    l1 = Line((line1[0], line1[1]), (line1[2], line1[3]))
    l2 = Line((line2[0], line2[1]), (line2[2], line2[3]))

    intersection = l1.intersection(l2)
    point = None
    if len(intersection) > 0:
        if isinstance(intersection[0], Point2D):
            point = intersection[0].coordinates
    return point

def refine_kps(img, x_ct, y_ct, crop_size=40):
    """
    Refines the key points of an image based on detected lines.

    Parameters:
    - img: The input image in which key points are to be refined.
    - x_ct, y_ct: The initial x and y coordinates of the key point.
    - crop_size: The size of the crop around the key point for line detection (default is 40).

    Returns:
    - refined_y_ct, refined_x_ct: The refined coordinates of the key point.
    """
    refined_x_ct, refined_y_ct = x_ct, y_ct

    img_height, img_width = img.shape[:2]
    x_min = max(x_ct - crop_size, 0)
    x_max = min(img_height, x_ct + crop_size)
    y_min = max(y_ct - crop_size, 0)
    y_max = min(img_width, y_ct + crop_size)

    img_crop = img[x_min:x_max, y_min:y_max]
    lines = detect_lines(img_crop)

    if len(lines) > 1:
        lines = merge_lines(lines)
        if len(lines) == 2:
            inters = line_intersection(lines[0], lines[1])
            if inters:
                new_x_ct = int(inters[1])
                new_y_ct = int(inters[0])
                if new_x_ct > 0 and new_x_ct < img_crop.shape[0] and new_y_ct > 0 and new_y_ct < img_crop.shape[1]:
                    refined_x_ct = x_min + new_x_ct
                    refined_y_ct = y_min + new_y_ct
    return refined_y_ct, refined_x_ct

def detect_lines(image):
    """
    Detects lines in an image using the Hough Line Transform.

    Parameters:
    - image: The input image for line detection.

    Returns:
    - lines: A list of detected lines in the format [[x1, y1, x2, y2], ...].
    """
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    gray = cv2.threshold(gray, 155, 255, cv2.THRESH_BINARY)[1]
    lines = cv2.HoughLinesP(gray, 1, np.pi / 180, 30, minLineLength=10, maxLineGap=30)
    lines = np.squeeze(lines)
    if len(lines.shape) > 0:
        if len(lines) == 4 and not isinstance(lines[0], np.ndarray):
            lines = [lines]
    else:
        lines = []
    return lines

def merge_lines(lines):
    """
    Merges lines that are close to each other based on their end points.

    Parameters:
    - lines: A list of lines in the format [[x1, y1, x2, y2], ...].

    Returns:
    - new_lines: A list of merged lines.
    """
    lines = sorted(lines, key=lambda item: item[0])
    mask = [True] * len(lines)
    new_lines = []

    for i, line in enumerate(lines):
        if mask[i]:
            for j, s_line in enumerate(lines[i + 1:]):
                if mask[i + j + 1]:
                    x1, y1, x2, y2 = line
                    x3, y3, x4, y4 = s_line
                    dist1 = distance.euclidean((x1, y1), (x3, y3))
                    dist2 = distance.euclidean((x2, y2), (x4, y4))
                    if dist1 < 20 and dist2 < 20:
                        line = np.array([int((x1 + x3) / 2), int((y1 + y3) / 2), int((x2 + x4) / 2), int((y2 + y4) / 2)])
                        mask[i + j + 1] = False
            new_lines.append(line)
    return new_lines



**Court Reference Model**

In [8]:
# **Court Reference Model**

# This script defines a `CourtReference` class that models a court layout,
# providing functions to build a court reference image, get important lines,
# save court configurations, and generate court masks.



class CourtReference:
    """
    A class to represent a tennis or similar sports court and provide methods to manipulate its layout.
    """
    def __init__(self):
        """
        Initialize the court reference model with court lines, key points, and configurations.
        """
        self.baseline_top = ((286, 561), (1379, 561))
        self.baseline_bottom = ((286, 2935), (1379, 2935))
        self.net = ((286, 1748), (1379, 1748))
        self.left_court_line = ((286, 561), (286, 2935))
        self.right_court_line = ((1379, 561), (1379, 2935))
        self.left_inner_line = ((423, 561), (423, 2935))
        self.right_inner_line = ((1242, 561), (1242, 2935))
        self.middle_line = ((832, 1110), (832, 2386))
        self.top_inner_line = ((423, 1110), (1242, 1110))
        self.bottom_inner_line = ((423, 2386), (1242, 2386))
        self.top_extra_part = (832.5, 580)
        self.bottom_extra_part = (832.5, 2910)

        # Define key points and configurations
        self.key_points = [*self.baseline_top, *self.baseline_bottom,
                          *self.left_inner_line, *self.right_inner_line,
                          *self.top_inner_line, *self.bottom_inner_line,
                          *self.middle_line]

        self.border_points = [*self.baseline_top, *self.baseline_bottom[::-1]]

        # Configurations for different court layouts
        self.court_conf = {1: [*self.baseline_top, *self.baseline_bottom],
                           2: [self.left_inner_line[0], self.right_inner_line[0], self.left_inner_line[1],
                               self.right_inner_line[1]],
                           3: [self.left_inner_line[0], self.right_court_line[0], self.left_inner_line[1],
                               self.right_court_line[1]],
                           4: [self.left_court_line[0], self.right_inner_line[0], self.left_court_line[1],
                               self.right_inner_line[1]],
                           5: [*self.top_inner_line, *self.bottom_inner_line],
                           6: [*self.top_inner_line, self.left_inner_line[1], self.right_inner_line[1]],
                           7: [self.left_inner_line[0], self.right_inner_line[0], *self.bottom_inner_line],
                           8: [self.right_inner_line[0], self.right_court_line[0], self.right_inner_line[1],
                               self.right_court_line[1]],
                           9: [self.left_court_line[0], self.left_inner_line[0], self.left_court_line[1],
                               self.left_inner_line[1]],
                           10: [self.top_inner_line[0], self.middle_line[0], self.bottom_inner_line[0],
                                self.middle_line[1]],
                           11: [self.middle_line[0], self.top_inner_line[1], self.middle_line[1],
                                self.bottom_inner_line[1]],
                           12: [*self.bottom_inner_line, self.left_inner_line[1], self.right_inner_line[1]]}

        # Court dimensions
        self.line_width = 1
        self.court_width = 1117
        self.court_height = 2408
        self.top_bottom_border = 549
        self.right_left_border = 274
        self.court_total_width = self.court_width + self.right_left_border * 2
        self.court_total_height = self.court_height + self.top_bottom_border * 2
        self.court = self.build_court_reference()

    def build_court_reference(self):
        """
        Create an image of the court reference using the defined line positions.

        Returns:
        - court: A binary image representing the court with lines drawn.
        """
        court = np.zeros((self.court_height + 2 * self.top_bottom_border, self.court_width + 2 * self.right_left_border), dtype=np.uint8)
        cv2.line(court, *self.baseline_top, 1, self.line_width)
        cv2.line(court, *self.baseline_bottom, 1, self.line_width)
        cv2.line(court, *self.net, 1, self.line_width)
        cv2.line(court, *self.top_inner_line, 1, self.line_width)
        cv2.line(court, *self.bottom_inner_line, 1, self.line_width)
        cv2.line(court, *self.left_court_line, 1, self.line_width)
        cv2.line(court, *self.right_court_line, 1, self.line_width)
        cv2.line(court, *self.left_inner_line, 1, self.line_width)
        cv2.line(court, *self.right_inner_line, 1, self.line_width)
        cv2.line(court, *self.middle_line, 1, self.line_width)
        court = cv2.dilate(court, np.ones((5, 5), dtype=np.uint8))
        # court = cv2.dilate(court, np.ones((7, 7), dtype=np.uint8))
        # plt.imsave('court_configurations/court_reference.png', court, cmap='gray')
        # self.court = court
        return court
    def get_important_lines(self):
        """
        Returns all the important lines that define the court layout.

        Returns:
        - lines: A list of all court line coordinates.
        """
        lines = [*self.baseline_top, *self.baseline_bottom, *self.net, *self.left_court_line, *self.right_court_line,
                 *self.left_inner_line, *self.right_inner_line, *self.middle_line,
                 *self.top_inner_line, *self.bottom_inner_line]
        return lines

    def get_extra_parts(self):
        """
        Returns the extra parts of the court that are outside the main playing area.

        Returns:
        - parts: A list of coordinates representing extra parts of the court.
        """
        parts = [self.top_extra_part, self.bottom_extra_part]
        return parts

    def save_all_court_configurations(self):
        """
        Save all configurations of 4 points on the court reference image.

        This function generates and saves images for different court configurations defined in `self.court_conf`.
        """
        for i, conf in self.court_conf.items():
            c = cv2.cvtColor(255 - self.court, cv2.COLOR_GRAY2BGR)
            for p in conf:
                c = cv2.circle(c, p, 15, (0, 0, 255), 30)  # Mark points with circles
            cv2.imwrite(f'court_configurations/court_conf_{i}.png', c)

    def get_court_mask(self, mask_type=0):
        """
        Generate a mask of the court based on the specified type.

        Parameters:
        - mask_type: An integer indicating the type of mask to create:
                     0 - Full court
                     1 - Bottom half court
                     2 - Top half court
                     3 - Court without margins

        Returns:
        - mask: A binary mask of the court.
        """
        mask = np.ones_like(self.court)
        if mask_type == 1:  # Bottom half court
            mask[:self.net[0][1], :] = 0
        elif mask_type == 2:  # Top half court
            mask[self.net[0][1]:, :] = 0
        elif mask_type == 3:  # Court without margins
            mask[:self.baseline_top[0][1], :] = 0
            mask[self.baseline_bottom[0][1]:, :] = 0
            mask[:, :self.left_court_line[0][0]] = 0
            mask[:, self.right_court_line[0][0]:] = 0
        return mask

# **Usage Example**
if __name__ == '__main__':
    c = CourtReference()
    c.build_court_reference()
    # You can now call various methods on the `c` object to manipulate the court image.




---



5. **Player Detection and Tracking on Court**

In [9]:
# Define the PersonDetector class
class PersonDetector:
    """
    Detects and tracks people on the sports court using a pre-trained Faster R-CNN model.

    This class detects persons in images and tracks their movements over time, focusing on the top
    and bottom halves of the court.
    """
    def __init__(self, dtype=torch.FloatTensor):
        # Load pre-trained Faster R-CNN model
        self.detection_model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
        self.detection_model = self.detection_model.to(dtype)
        self.detection_model.eval()
        self.dtype = dtype
        self.court_ref = CourtReference()
        # Obtain court masks for top and bottom halves
        self.ref_top_court = self.court_ref.get_court_mask(2)
        self.ref_bottom_court = self.court_ref.get_court_mask(1)
        self.point_person_top = None
        self.point_person_bottom = None
        self.counter_top = 0
        self.counter_bottom = 0
        self.top_distances = []
        self.bottom_distances = []
        self.active_time_top = 0
        self.active_time_bottom = 0

    def detect(self, image, person_min_score=0.85):
        """
        Detect persons in a given image using the pre-trained Faster R-CNN model.

        Parameters:
        image : np.array
            The input image where persons need to be detected.
        person_min_score : float
            The minimum confidence score for considering a detection as a person.

        Returns:
        persons_boxes : list
            List of detected bounding boxes for persons.
        probs : list
            List of corresponding probabilities for each detection.
        """
        PERSON_LABEL = 1
        frame_tensor = image.transpose((2, 0, 1)) / 255
        frame_tensor = torch.from_numpy(frame_tensor).unsqueeze(0).float().to(self.dtype)

        with torch.no_grad():
            preds = self.detection_model(frame_tensor)

        persons_boxes = []
        probs = []
        for box, label, score in zip(preds[0]['boxes'], preds[0]['labels'], preds[0]['scores']):
            if label == PERSON_LABEL and score > person_min_score:
                persons_boxes.append(box.detach().cpu().numpy())
                probs.append(score.detach().cpu().numpy())
        return persons_boxes, probs

    def detect_top_and_bottom_players(self, image, inv_matrix, filter_players=False):
        """
        Detect players on the top and bottom halves of the court.

        Parameters:
        image : np.array
            The input image to detect players.
        inv_matrix : np.array
            Inverse transformation matrix for perspective correction.
        filter_players : bool
            Whether to filter out multiple detected players to find the key player.

        Returns:
        person_bboxes_top : list
            Detected bounding boxes of players in the top half.
        person_bboxes_bottom : list
            Detected bounding boxes of players in the bottom half.
        """
        matrix = cv2.invert(inv_matrix)[1]
        mask_top_court = cv2.warpPerspective(self.ref_top_court, matrix, image.shape[1::-1])
        mask_bottom_court = cv2.warpPerspective(self.ref_bottom_court, matrix, image.shape[1::-1])
        person_bboxes_top, person_bboxes_bottom = [], []

        bboxes, probs = self.detect(image, person_min_score=0.85)
        if len(bboxes) > 0:
            person_points = [[int((bbox[2] + bbox[0]) / 2), int(bbox[3])] for bbox in bboxes]
            person_bboxes = list(zip(bboxes, person_points))

            person_bboxes_top = [pt for pt in person_bboxes if mask_top_court[pt[1][1]-1, pt[1][0]] == 1]
            person_bboxes_bottom = [pt for pt in person_bboxes if mask_bottom_court[pt[1][1] - 1, pt[1][0]] == 1]

            if filter_players:
                person_bboxes_top, person_bboxes_bottom = self.filter_players(person_bboxes_top, person_bboxes_bottom, matrix)

        return person_bboxes_top, person_bboxes_bottom

    def filter_players(self, person_bboxes_top, person_bboxes_bottom, matrix):
        """
        Filter out multiple detected players to identify key players based on proximity to court center.
        """
        players_top_pts = [player[1] for player in person_bboxes_top]
        players_bottom_pts = [player[1] for player in person_bboxes_bottom]

        if players_top_pts:
            players_top_pts_court = cv2.perspectiveTransform(np.array([players_top_pts], dtype=np.float32), matrix)[0]
            dist_top = distance.cdist(players_top_pts_court, [[self.court_ref.top_extra_part]])
            person_bboxes_top = [person_bboxes_top[np.argmin(dist_top)]]

        if players_bottom_pts:
            players_bottom_pts_court = cv2.perspectiveTransform(np.array([players_bottom_pts], dtype=np.float32), matrix)[0]
            dist_bottom = distance.cdist(players_bottom_pts_court, [[self.court_ref.bottom_extra_part]])
            person_bboxes_bottom = [person_bboxes_bottom[np.argmin(dist_bottom)]]

        return person_bboxes_top, person_bboxes_bottom
    def track_players(self, frames, matrix_all, filter_players=False):
        # tracks the players for the duration of the frames and returns lists of the players' coordinates
        persons_top = []
        persons_bottom = []
        min_len = min(len(frames), len(matrix_all))
        for num_frame in tqdm(range(min_len)):
            img = frames[num_frame]
            if matrix_all[num_frame] is not None:
                inv_matrix = matrix_all[num_frame]
                person_top, person_bottom = self.detect_top_and_bottom_players(img, inv_matrix, filter_players)
            else:
                person_top, person_bottom = [], []


            persons_top.append(person_top)
            persons_bottom.append(person_bottom)

        return persons_top, persons_bottom




---



6. **Homography Matrix**

In [10]:

# Initialize the court reference
court_ref = CourtReference()

# Reshape court reference key points for perspective transformation
refer_kps = np.array(court_ref.key_points, dtype=np.float32).reshape((-1, 1, 2))
# Create a mapping of court configuration indices to corresponding key point indices
court_conf_ind = {}
for i in range(len(court_ref.court_conf)):
    conf = court_ref.court_conf[i+1]
    inds = []
    for j in range(4):
        inds.append(court_ref.key_points.index(conf[j]))
    court_conf_ind[i+1] = inds


def get_trans_matrix(points):
    """
    Determine the best homography (transformation) matrix from the given points and court configuration.

    This function computes a homography matrix based on the court's configuration and
    compares distances to find the optimal transformation that aligns the points with the court layout.

    Parameters:
    points : list
        List of points representing detected key points from the input image.

    Returns:
    matrix_trans : np.array or None
        The best homography matrix for perspective transformation, or None if no valid transformation is found.
    """
    matrix_trans = None
    dist_max = np.Inf  # Initialize with a large value for distance comparison

    # Iterate through court configurations to find the best transformation
    for conf_ind in range(1, 13):  # Loop through 12 court configurations
        conf = court_ref.court_conf[conf_ind]  # Retrieve the court configuration for this index
        inds = court_conf_ind[conf_ind]  # Get corresponding key point indices

        # Gather intersection points based on the indices
        inters = [points[inds[0]], points[inds[1]], points[inds[2]], points[inds[3]]]

        if None not in inters:  # Ensure that all intersection points are available
            # Compute homography matrix using key points and court reference points
            matrix, _ = cv2.findHomography(np.float32(conf), np.float32(inters), method=0)

            # Apply the transformation to court reference key points
            trans_kps = cv2.perspectiveTransform(refer_kps, matrix).squeeze(1)

            # Calculate the Euclidean distances for key points not in the indices
            dists = []
            for i in range(12):
                if i not in inds and points[i] is not None:
                    dists.append(distance.euclidean(points[i], trans_kps[i]))

            # Compute the mean distance to compare different configurations
            dist_median = np.mean(dists)

            # Update the transformation matrix if a better (smaller) median distance is found
            if dist_median < dist_max:
                matrix_trans = matrix
                dist_max = dist_median

    return matrix_trans  # Return the best transformation matrix


Court Detection using pre-trained model

In [11]:


import torch.nn.functional as F

class CourtDetectorNet:
    """
    A class for detecting and tracking key points on a sports court using a neural network model.

    This class uses a pre-trained neural network model to infer the positions of key points (e.g., court lines and net)
    from a sequence of frames and estimates the transformation matrix for perspective correction.
    """

    def __init__(self, path_model=None, device='cuda'):
        """
        Initialize the CourtDetectorNet class with the provided model and device.

        Parameters:
        path_model : str or None
            Path to the pre-trained model weights (if available). If None, a new model is initialized.
        device : str
            The device to run the model on ('cuda' for GPU or 'cpu' for CPU).
        """
        # Initialize the BallTrackerNet model with 15 output channels (corresponding to key points)
        self.model = BallTrackerNet(out_channels=15)
        self.device = device

        # Load pre-trained model weights if provided
        if path_model:
            self.model.load_state_dict(torch.load(path_model, map_location=device, weights_only=True))
            self.model = self.model.to(device)
            self.model.eval()

    def infer_model(self, frames):
        """
        Perform inference on a sequence of frames to detect court key points and transformation matrices.

        Parameters:
        frames : list of np.array
            A list of video frames (images) to process.

        Returns:
        matrixes_res : list
            A list of transformation matrices for each frame.
        kps_res : list
            A list of key points detected in each frame.
        """
        output_width = 640  # Width of the resized frame for model input
        output_height = 360  # Height of the resized frame for model input
        scale = 2  # Scale factor to adjust the key points to the original image size

        kps_res = []  # List to store the detected key points for each frame
        matrixes_res = []  # List to store the transformation matrices for each frame

        # Process each frame in the input sequence
        for num_frame, image in enumerate(tqdm(frames)):
            # Resize the frame to the specified input size
            img = cv2.resize(image, (output_width, output_height))

            # Normalize the image for the model input
            inp = (img.astype(np.float32) / 255.)
            inp = torch.tensor(np.rollaxis(inp, 2, 0))  # Change image axis order to (channels, height, width)
            inp = inp.unsqueeze(0)  # Add a batch dimension

            # Perform inference with the neural network
            out = self.model(inp.float().to(self.device))[0]
            pred = F.sigmoid(out).detach().cpu().numpy()  # Apply sigmoid activation and convert to numpy

            points = []  # List to store the detected key points for the current frame

            # Detect key points from heatmaps
            for kps_num in range(14):
                heatmap = (pred[kps_num] * 255).astype(np.uint8)  # Convert the heatmap to an 8-bit image
                ret, heatmap = cv2.threshold(heatmap, 170, 255, cv2.THRESH_BINARY)  # Apply threshold to binarize

                # Use Hough Circles to detect circular key points in the heatmap
                circles = cv2.HoughCircles(heatmap, cv2.HOUGH_GRADIENT, dp=1, minDist=20, param1=50, param2=2,
                                           minRadius=10, maxRadius=25)
                if circles is not None:
                    # Calculate the predicted key point location in the original image scale
                    x_pred = circles[0][0][0] * scale
                    y_pred = circles[0][0][1] * scale

                    # Refine the key point location for certain key points
                    if kps_num not in [8, 12, 9]:
                        x_pred, y_pred = refine_kps(image, int(y_pred), int(x_pred), crop_size=40)

                    points.append((x_pred, y_pred))  # Append the detected key point
                else:
                    points.append(None)  # Append None if no key point is detected

            # Estimate the transformation matrix using the detected key points
            matrix_trans = get_trans_matrix(points)
            points = None

            # If a valid transformation matrix is found, apply perspective transformation to key points
            if matrix_trans is not None:
                points = cv2.perspectiveTransform(refer_kps, matrix_trans)
                matrix_trans = cv2.invert(matrix_trans)[1]

            # Append the results (key points and transformation matrix) for the current frame
            kps_res.append(points)
            matrixes_res.append(matrix_trans)

        return matrixes_res, kps_res  # Return the transformation matrices and key points for all frames




---



Utilities

In [12]:

# Install the PySceneDetect library for scene detection in videos
!pip install scenedetect



In [13]:
from scenedetect.video_manager import VideoManager
from scenedetect.scene_manager import SceneManager
from scenedetect.stats_manager import StatsManager
from scenedetect.detectors import ContentDetector

def scene_detect(path_video):
    """
    Detect and split the video into disjoint fragments based on scene changes using color histograms.

    This function uses PySceneDetect to analyze the video and detects scene boundaries based on content changes.
    It then returns the frame numbers for each detected scene.

    Parameters:
    path_video : str
        Path to the input video file.

    Returns:
    scenes : list of lists
        A list where each element is a list containing the start and end frame numbers for each detected scene.
    """
    # Create a VideoManager object to handle the video file
    video_manager = VideoManager([path_video])

    # Initialize StatsManager to store statistics of the video (e.g., scene list, content values)
    stats_manager = StatsManager()

    # Initialize SceneManager to manage scene detection logic
    scene_manager = SceneManager(stats_manager)

    # Add a content-based scene detector with a custom threshold
    scene_manager.add_detector(ContentDetector(threshold=30.0))  # Adjust threshold as needed

    # Get the base timecode of the video (used to track frame positions)
    base_timecode = video_manager.get_base_timecode()

    # Downscale the video for faster processing if needed
    video_manager.set_downscale_factor(2)  # Adjust downscale factor as needed

    # Start video manager and load video frames for analysis
    video_manager.start()

    # Perform scene detection by analyzing video frames
    scene_manager.detect_scenes(frame_source=video_manager)

    # Retrieve the list of detected scenes, with each scene defined by start and end timecodes
    scene_list = scene_manager.get_scene_list(base_timecode)

    # If no scenes were detected, return the whole video as a single scene
    if not scene_list:
        scene_list = [(video_manager.get_base_timecode(), video_manager.get_current_timecode())]

    # Convert the scene list to frame numbers (start and end frames) and return it
    scenes = [[x[0].frame_num, x[1].frame_num] for x in scene_list]

    # Release resources
    video_manager.release()

    return scenes


7. **Score Detection**

In [14]:
class ScoreTracker:
    def __init__(self):
        self.player_top_points = 0
        self.player_bottom_points = 0
        self.top_side = "top"
        self.bottom_side = "bottom"

    def update_score(self, last_bounce_side, current_bounce_side, ball_out_of_bounds):
        if ball_out_of_bounds:
            # If ball is out of bounds, the player on the opposite side wins the point
            if last_bounce_side == self.top_side:
                self.player_bottom_points += 1
            else:
                self.player_top_points += 1
        elif last_bounce_side == current_bounce_side:
            # If the ball bounces twice on the same side, the opponent wins the point
            if current_bounce_side == self.top_side:
                self.player_bottom_points += 1
            else:
                self.player_top_points += 1

    def get_score(self):
        return f"Player 1 Points: {self.player_top_points}, Player 2 Points: {self.player_bottom_points}"

def check_if_ball_is_out(ball_point, court_boundaries):
    """
    Check if the ball is out of the court boundaries.
    ball_point: (x, y) position of the ball.
    court_boundaries: List of court points after homography transformation.
    """
    x, y = ball_point
    left_bound = min(court_boundaries, key=lambda p: p[0])[0]
    right_bound = max(court_boundaries, key=lambda p: p[0])[0]
    top_bound = min(court_boundaries, key=lambda p: p[1])[1]
    bottom_bound = max(court_boundaries, key=lambda p: p[1])[1]

    # Check if the ball is inside or outside court bounds
    return not (left_bound <= x <= right_bound and top_bound <= y <= bottom_bound)

def get_bounce_side(ball_point, court_midline):
    """
    Determine which side of the court the ball bounced on based on the y-coordinate.
    """

    # Compare y-coordinate with court midline
    if ball_point[1] < court_midline:
        return "top"
    else:
        return "bottom"



---



In [15]:
#!ffmpeg -i input_videooo.mp4 -vf scale=480:-1 output_video_resized.mp4


Final : **Main Function**( also containes functions for **speed** and **distance** calulation for generating player metrics)

In [16]:
#from court_detection_net import CourtDetectorNet

#from court_reference import CourtReference
#from bounce_detector import BounceDetector
#from person_detector import PersonDetector
#from ball_detector import BallDetector
#from utils import scene_detect
import argparse

def read_video(path_video):
    cap = cv2.VideoCapture(path_video)
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    frames = []
    while cap.isOpened():
        ret, frame = cap.read()
        if ret:
            frames.append(frame)
        else:
            break
    cap.release()
    return frames, fps



def get_court_img():
     court_reference = CourtReference()
     court = court_reference.build_court_reference()
     court = cv2.dilate(court, np.ones((10, 10), dtype=np.uint8))
     court_img = (np.stack((court, court, court), axis=2)*255).astype(np.uint8)
     return court_img

import cv2
import numpy as np

def main(frames, scenes, bounces, ball_track, homography_matrices, kps_court, persons_top, persons_bottom, fps,
         draw_trace=False, trace=7):
    """
    :params
        frames: list of original images
        scenes: list of beginning and ending of video fragment
        bounces: list of image numbers where the ball touches the ground
        ball_track: list of (x,y) ball coordinates
        homography_matrices: list of homography matrices
        kps_court: list of 14 key points of tennis court
        persons_top: list of person bboxes located in the top of tennis court
        persons_bottom: list of person bboxes located in the bottom of tennis court
        fps: frames per second
        draw_trace: whether to draw ball trace
        trace: the length of ball trace
    :return
        imgs_res: list of resulting images
    """
    imgs_res = []
    width_minimap = 166
    height_minimap = 350
    is_track = [x is not None for x in homography_matrices]

    # Initialize score tracker
    score_tracker = ScoreTracker()
    court_ref = CourtReference()

    # Estimate court midline from keypoints
    court_midline = court_ref.net[0][1]

    # Initialize variables
    last_bounce_side = None
    rally_lengths = []  # To store the lengths of each rally
    current_rally_length = 0  # To track the current rally length
    rally_count = 0  # To count the number of rallies

    # Loop over scenes
    for num_scene in range(len(scenes)):
        sum_track = sum(is_track[scenes[num_scene][0]:scenes[num_scene][1]])
        len_track = scenes[num_scene][1] - scenes[num_scene][0]

        eps = 1e-15
        scene_rate = sum_track / (len_track + eps)

        if scene_rate > 0.5:
            court_img = get_court_img()

            # Variables to store distances and active times
            distance_top, distance_bottom = 0, 0
            active_time_top, active_time_bottom = 0, 0

            # Loop over frames in the current scene
            for i in range(scenes[num_scene][0], scenes[num_scene][1]):
                img_res = frames[i].copy()  # Copy the current frame for processing
                inv_mat = homography_matrices[i]

                # Draw ball trajectory
                if ball_track[i][0]:
                    if draw_trace:
                        for j in range(trace):
                            if i - j >= 0 and ball_track[i - j][0]:
                                draw_x = int(ball_track[i - j][0])
                                draw_y = int(ball_track[i - j][1])
                                img_res = cv2.circle(img_res, (draw_x, draw_y), radius=3, color=(0, 255, 0), thickness=2)
                    else:
                        img_res = cv2.circle(img_res, (int(ball_track[i][0]), int(ball_track[i][1])), radius=5, color=(0, 255, 0), thickness=2)
                        img_res = cv2.putText(img_res, 'ball', org=(int(ball_track[i][0]) + 8, int(ball_track[i][1]) + 8),
                                              fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, thickness=2, color=(0, 255, 0))

                # Draw court keypoints
                if kps_court[i] is not None:
                    for j in range(len(kps_court[i])):
                        img_res = cv2.circle(img_res, (int(kps_court[i][j][0, 0]), int(kps_court[i][j][0, 1])), radius=0, color=(0, 0, 255), thickness=10)

                height, width, _ = img_res.shape

                # Draw bounce in minimap
                # Handle ball bounces
                if i in bounces and inv_mat is not None:
                    ball_point = ball_track[i]
                    ball_point = np.array(ball_point, dtype=np.float32).reshape(1, 1, 2)
                    ball_point = cv2.perspectiveTransform(ball_point, inv_mat)

                    # Draw bounce on the minimap
                    court_img = cv2.circle(court_img, (int(ball_point[0, 0, 0]), int(ball_point[0, 0, 1])),
                                           radius=0, color=(0, 255, 255), thickness=50)

                    # Check if ball is out of bounds
                    court_boundaries = np.array(court_ref.key_points, dtype=np.float32).squeeze()
                    ball_out_of_bounds = check_if_ball_is_out(ball_point.squeeze(), court_boundaries)

                    # Determine which side the ball bounced on
                    current_bounce_side = get_bounce_side(ball_point.squeeze(), court_midline)

                    # Update the score based on bounce information
                    score_tracker.update_score(last_bounce_side, current_bounce_side, ball_out_of_bounds)

                    # Update rally length
                    current_rally_length += 1

                    # If the ball is out, or a new rally starts, store and reset the rally length
                    if ball_out_of_bounds or (last_bounce_side==current_bounce_side):
                        rally_lengths.append(current_rally_length)
                        rally_count += 1
                        current_rally_length = 0  # Reset for the new rally

                    # Update last bounce side
                    last_bounce_side = current_bounce_side

                # Display current rally length
                img_res = cv2.putText(img_res, f'Rally Length: {current_rally_length}', (50, 100),
                                      cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)

                minimap = court_img.copy()

                # Draw persons and calculate distances
                persons = persons_top[i] + persons_bottom[i]
                for j,person in enumerate(persons):
                    if len(person[0]) > 0:
                        person_bbox = list(person[0])
                        img_res = cv2.rectangle(img_res, (int(person_bbox[0]), int(person_bbox[1])),
                                                (int(person_bbox[2]), int(person_bbox[3])), [255, 0, 0], 2)
                        img_res = cv2.putText(img_res, f"Player ID: {j+1}",
                                              (int(person_bbox[0]), int(person_bbox[1] - 10)),
                                              cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 0, 0), 2)

                        # Transform the foot position to the top-down view using perspectiveTransform
                        person_point = list(person[1])
                        person_point = np.array(person_point, dtype=np.float32).reshape(1, 1, 2)
                        person_point = cv2.perspectiveTransform(person_point, inv_mat)
                        minimap = cv2.circle(minimap, (int(person_point[0, 0, 0]), int(person_point[0, 0, 1])),
                                             radius=0, color=(255, 0, 0), thickness=80)

                 # Calculate distances and speeds for both players
                if i > 0:  # Avoid calculation on the first frame
                    # Top player distance and speed
                    if len(persons_top[i]) > 0 and len(persons_top[i - 1]) > 0:
                        prev_top = np.array(persons_top[i - 1][0][1], dtype=np.float32).reshape(1, 1, 2)
                        curr_top = np.array(persons_top[i][0][1], dtype=np.float32).reshape(1, 1, 2)
                        prev_position_top = cv2.perspectiveTransform(prev_top, inv_mat)[0][0]
                        curr_position_top = cv2.perspectiveTransform(curr_top, inv_mat)[0][0]
                        distance_top += (np.linalg.norm(np.array(prev_position_top) - np.array(curr_position_top)))/101.27
                        speed_top= ((np.linalg.norm(np.array(prev_position_top) - np.array(curr_position_top)))/101.27)*fps
                        active_time_top += 1/fps

                    # Bottom player distance and speed
                    if len(persons_bottom[i]) > 0 and len(persons_bottom[i - 1]) > 0:
                        prev_bottom = np.array(persons_bottom[i - 1][0][1], dtype=np.float32).reshape(1, 1, 2)
                        curr_bottom = np.array(persons_bottom[i][0][1], dtype=np.float32).reshape(1, 1, 2)
                        prev_position_bottom = cv2.perspectiveTransform(prev_bottom, inv_mat)[0][0]
                        curr_position_bottom = cv2.perspectiveTransform(curr_bottom, inv_mat)[0][0]
                        distance_bottom += (np.linalg.norm(np.array(prev_position_bottom) - np.array(curr_position_bottom)))/101.27
                        speed_bottom= ((np.linalg.norm(np.array(prev_position_bottom) - np.array(curr_position_bottom)))/101.27)*fps
                        active_time_bottom += 1/fps

                # Display player metrics
                if active_time_top*fps > 1 and active_time_bottom*fps > 1:
                    img_res = cv2.putText(img_res, f'Player 1: Distance: {distance_top:.2f} m, Active Time: {round(active_time_top,2)} s, Speed: {speed_top:.2f} m/s',
                                          org=(690, 600), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.4, color=(255, 255, 255), thickness=2)
                    img_res = cv2.putText(img_res, f'Player 2: Distance: {distance_bottom:.2f} m, Active Time: {round(active_time_bottom,2)} s, Speed: {speed_bottom:.2f} m/s',
                                          org=(690, 620), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.4, color=(255, 255, 255), thickness=2)


                # Add minimap to result image
                minimap = cv2.resize(minimap, (width_minimap, height_minimap))
                img_res[30:(30 + height_minimap), (width - 30 - width_minimap):(width - 30), :] = minimap

                # Store the resulting frame
                imgs_res.append(img_res)
                score_text = score_tracker.get_score()
                img_res = cv2.putText(img_res, score_text, (50, 50),
                                      cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)

        else:
            imgs_res.extend(frames[scenes[num_scene][0]:scenes[num_scene][1]])
    # Final score output
    final_score = score_tracker.get_score()
    print(f"Final Score: {final_score}")

    # Print rally information
    print(f"Total number of rallies: {rally_count}")
    for idx, rally_length in enumerate(rally_lengths, 1):
        print(f"Rally {idx}: {rally_length} shots")

    return imgs_res





def write(imgs_res, fps, path_output_video):
    height, width = imgs_res[0].shape[:2]
    out = cv2.VideoWriter(path_output_video, cv2.VideoWriter_fourcc(*'DIVX'), fps, (width, height))
    for num in range(len(imgs_res)):
        frame = imgs_res[num]
        out.write(frame)
    out.release()


if __name__ == '__main__':
    # Check if running in a Jupyter notebook or a script
    try:
        parser = argparse.ArgumentParser()
        parser.add_argument('--path_ball_track_model', type=str, help='path to pretrained model for ball detection')
        parser.add_argument('--path_court_model', type=str, help='path to pretrained model for court detection')
        parser.add_argument('--path_bounce_model', type=str, help='path to pretrained model for bounce detection')
        parser.add_argument('--path_input_video', type=str, help='path to input video')
        parser.add_argument('--path_output_video', type=str, help='path to output video')
        args = parser.parse_args()

    except SystemExit:
        # If run in a Jupyter notebook, set the arguments manually
        args = argparse.Namespace(
            path_ball_track_model='/content/model_best.pt',  # replace with the actual path
            path_court_model='/content/model_tennis_court_det.pt',  # replace with the actual path
            path_bounce_model='/content/ctb_regr_bounce.cbm',  # replace with the actual path
            path_input_video='/content/input_video.mp4',  # replace with the actual video path
            path_output_video='path_to_output_video.mp4'  # replace with the desired output path
        )

    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    frames, fps = read_video(args.path_input_video)
    scenes = scene_detect(args.path_input_video)

    print('ball detection')
    ball_detector = BallDetector(args.path_ball_track_model, device)
    ball_track = ball_detector.infer_model(frames)

    print('court detection')
    court_detector = CourtDetectorNet(args.path_court_model, device)
    homography_matrices, kps_court = court_detector.infer_model(frames)

    print('person detection')
    person_detector = PersonDetector(device)
    persons_top, persons_bottom = person_detector.track_players(frames, homography_matrices, filter_players=False)

    # bounce detection
    bounce_detector = BounceDetector(args.path_bounce_model)
    x_ball = [x[0] for x in ball_track]
    y_ball = [x[1] for x in ball_track]
    bounces = bounce_detector.predict(x_ball, y_ball)

    imgs_res = main(frames, scenes, bounces, ball_track, homography_matrices, kps_court, persons_top, persons_bottom, fps,
                    draw_trace=True)

    write(imgs_res, fps, args.path_output_video)







usage: colab_kernel_launcher.py [-h] [--path_ball_track_model PATH_BALL_TRACK_MODEL]
                                [--path_court_model PATH_COURT_MODEL]
                                [--path_bounce_model PATH_BOUNCE_MODEL]
                                [--path_input_video PATH_INPUT_VIDEO]
                                [--path_output_video PATH_OUTPUT_VIDEO]
colab_kernel_launcher.py: error: unrecognized arguments: -f /root/.local/share/jupyter/runtime/kernel-1a6d0663-abfe-4676-99fb-8216f95edb7a.json
ERROR:pyscenedetect:VideoManager is deprecated and will be removed.
INFO:pyscenedetect:Loaded 1 video, framerate: 30.000 FPS, resolution: 1280 x 720
INFO:pyscenedetect:Downscale factor set to 5, effective resolution: 256 x 144
INFO:pyscenedetect:Detecting scenes...
ERROR:pyscenedetect:`base_timecode` argument is deprecated and has no effect.


ball detection


100%|██████████| 212/212 [00:24<00:00,  8.62it/s]


court detection


100%|██████████| 214/214 [00:28<00:00,  7.63it/s]




person detection


100%|██████████| 214/214 [00:34<00:00,  6.14it/s]


Final Score: Player 1 Points: 0, Player 2 Points: 0
Total number of rallies: 0




---



Thankyou<3