**Coverage percentage**

In this code, we first compute the **Coverage %** of a penalty kill for a specific penalty kill.
How we define **Coverage %** first needs to be introduced by an intuitive set of ideas for what constitutes of good in-zone defensive coverage.

1.   The overall amount of space available for the "attacking team" is reduced by good defensive coverage.
2.   The closer the defender to a player on the attacking team, the less space the player on the attacking team has.  

Thus, by staying close to players in the offensive zone, taking away space for each player on and off the puck, a good zonal coverage should hope to lead to favourable defensive situations.

We define **Coverage %** of a given frame as:

Coverage% = Area(ConvHull(K) ∩ ConvHull(P))/Area(ConvHull(P))

Where P, K = set of power-play and penalty-kill players respectively, and ConvHull(X) is the Convex Hull of a given set X.

By taking this coverage % by the total number of frames in the given penalty kill, we measure **how well the set of power-play players on the boundary of the power-play convex hull is covered by the set of penalty-kill players on the boundary of the penalty-kill hull**, in hopes of retrieving further insight on how well a penalty kill performs at all times, as well as hoping to perform more inference on how much of penalty-kill performance is on player quality vs the set-up of a penalty kill.



In [16]:
!pip install shapely
import pandas as pd
import shapely
from shapely.geometry import MultiPoint
from shapely.ops import unary_union
from google.colab import drive
from shapely.errors import EmptyPartError
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [17]:
def compute_dynamic_coverage(teamA_points, teamB_points, teamA_count, teamB_count):
    """
    coverage = area(ConvexHull(A) ∩ ConvexHull(B)) / area(ConvexHull(team with more players))
    If a hull is invalid or empty, return 0.
    """
    # If either side has < 3 points, it can't form a polygon hull
    if len(teamA_points) < 3 or len(teamB_points) < 3:
        return 0.0

    try:
        hullA = MultiPoint(teamA_points).convex_hull
        hullB = MultiPoint(teamB_points).convex_hull
    except EmptyPartError:
        # This happens if Shapely fails to create a geometry for some reason
        return 0.0

    intersection_area = hullA.intersection(hullB).area

    if teamA_count > teamB_count:
        denom_area = hullA.area
    elif teamB_count > teamA_count:
        denom_area = hullB.area
    else:
        # Tie -> default to hullA
        denom_area = hullA.area

    if denom_area == 0:
        return 0.0

    return intersection_area / denom_area


def calculate_coverage_dynamic(df,
                              home_label="Home",
                              away_label="Away",
                              frame_col="Image Id",
                              team_col="Team",
                              x_col="Rink Location X (Feet)",
                              y_col="Rink Location Y (Feet)",
                              period_col="Period",
                              clock_col="Game Clock",
                              output_csv="coverage_dynamic.csv"):
    """
    Computes coverage per frame, safely handling empty or invalid coordinates.
    """
    coverage_results = []

    # Ensure entire DataFrame has numeric x,y (coerce invalid to NaN)
    df[x_col] = pd.to_numeric(df[x_col], errors="coerce")
    df[y_col] = pd.to_numeric(df[y_col], errors="coerce")

    # Group by each frame
    for frame_id, frame_data in df.groupby(frame_col):

        # Separate Home vs Away
        home_data = frame_data[frame_data[team_col] == home_label].copy()
        away_data = frame_data[frame_data[team_col] == away_label].copy()

        # Drop rows with NaN coords
        home_data.dropna(subset=[x_col, y_col], inplace=True)
        away_data.dropna(subset=[x_col, y_col], inplace=True)

        # Build point lists
        home_points = list(zip(home_data[x_col], home_data[y_col]))
        away_points = list(zip(away_data[x_col], away_data[y_col]))

        home_count = len(home_data)
        away_count = len(away_data)

        coverage_value = compute_dynamic_coverage(
            home_points,
            away_points,
            home_count,
            away_count
        )

        # Grab clock & period from the first row (if any)
        if len(frame_data) > 0:
            game_clock = frame_data.iloc[0][clock_col]
            period = frame_data.iloc[0][period_col]
        else:
            game_clock = None
            period = None

        coverage_results.append({
            frame_col: frame_id,
            period_col: period,
            clock_col: game_clock,
            "Home_Player_Count": home_count,
            "Away_Player_Count": away_count,
            "Coverage_Percent": coverage_value
        })

    coverage_df = pd.DataFrame(coverage_results)
    coverage_df.to_csv(output_csv, index=False)
    print(f"Coverage calculation complete. Results saved to '{output_csv}'.")

In [19]:
HG_tracking = pd.read_csv('/content/drive/MyDrive/Stathletes Big Data Cup 2025/2024-10-25.Team.H.@.Team.G.-.Tracking.csv',
                          dtype={'Rink Location X (Feet)':float,
                                 'Rink Location Y (Feet)': float},
                          low_memory=False
)

CD_tracking = pd.read_csv('/content/drive/MyDrive/Stathletes Big Data Cup 2025/2024-11-15.Team.D.@.Team.C.-.Tracking.csv',
                          dtype={'Rink Location X (Feet)':float,
                                 'Rink Location Y (Feet)': float},
                          low_memory=False
)
EF_tracking = pd.read_csv('/content/drive/MyDrive/Stathletes Big Data Cup 2025/2024-11-16.Team.F.@.Team.E.-.Tracking.csv',
                          dtype={'Rink Location X (Feet)':float,
                                 'Rink Location Y (Feet)': float},
                          low_memory=False)

file_path_HG = '/content/drive/MyDrive/coverage_HG.csv'
file_path_CD = '/content/drive/MyDrive/coverage_CD.csv'
file_path_EF = '/content/drive/MyDrive/coverage_EF.csv'

calculate_coverage_dynamic(
    HG_tracking,
    home_label="Home",
    away_label="Away",
    frame_col="Image Id",
    team_col="Team",
    x_col="Rink Location X (Feet)",
    y_col="Rink Location Y (Feet)",
    period_col="Period",
    clock_col="Game Clock",
    output_csv=file_path_HG
    )

    # After running, a file named "coverage_XY.csv" will appear,
    # containing columns: [Image Id, Period, Game Clock, Home_Player_Count, Away_Player_Count, Coverage_Percent].

Coverage calculation complete. Results saved to '/content/drive/MyDrive/coverage_HG.csv'.


In [None]:
print(HG_tracking.columns)

Index(['Image Id', 'Period', 'Game Clock', 'Player or Puck', 'Team',
       'Player Id', 'Rink Location X (Feet)', 'Rink Location Y (Feet)',
       'Rink Location Z (Feet)', 'Goal Score'],
      dtype='object')
