# ESTIMATING "RELATIVE PHYSICAL FATIGUE INDEX" OF PLAYERS USING ICE HOCKEY METRICS

#### **Objective:** To develop an application that enables coaches to input match data and obtain instant feedback on potential fatigue levels,based on the dataset's metrics.

#### **Functionality:** Upload data files and obtain instant feedback on potential fatigue levels.

#### **Applications:**

1. **Player Management and Rotation:**
   Use the Fatigue Index (FI) to guide real-time player rotation decisions, optimizing performance and minimizing injury risks.

2. **Injury Prevention in Training:**
   Incorporate FI to tailor training intensity and recovery, reducing the likelihood of player injuries.

3. **Performance Analysis and Strategy:**
   Integrate FI with performance metrics to refine game strategies, making informed in-game decisions for optimal outcomes.


### DATA LOADING AND PREPROCESSING


In [None]:
# Import required Libraries
import pandas as pd

In [None]:
# # We change the delimiter to ; and save the file for ease of use
# df_lugano.to_csv("lugano.csv", sep=",", index=False)
# df_lugano.to_csv("zsc.csv", sep=",", index=False)

In [None]:
# Load the datasets into Pandas DataFrame
df_lugano = pd.read_csv("lugano.csv", encoding="ISO-8859-1")
# df_zsc = pd.read_csv("zsc.csv", encoding="ISO-8859-1")

# Print the first 5 rows of the datasets
df_lugano.head()

In [None]:
# List of columns to keep in the dataset
columns_to_keep = [
    "Group name",
    "Position",
    "Description",
    "Duration (s)",
    "Distance (m)",
    "Time on Ice (s)",
    "High Metabolic Power Distance (m)",
    "Mechanical Intensity",
]

In [None]:
# Create a new DataFrame with only the recommended columns to keep
df_imp_lugano = df_lugano[columns_to_keep]
# print(df_imp_lugano)

In [None]:
# add a new column called player id to the dataframe and fill it with the player id starting from 1 using .loc as the first column

# CHANGE PLAYER NOTATION TO ALPHABETS

df_imp_lugano.insert(loc=0, column="Player ID", value=range(1, 1 + len(df_imp_lugano)))

df_imp_lugano.head()

In [None]:
# Check the data types of the columns
df_imp_lugano.dtypes

In [None]:
# Convert the data type of the column mechanical intensity from string to float

df_imp_lugano["Mechanical Intensity"] = (
    df_imp_lugano["Mechanical Intensity"].str.replace(",", ".").astype(float)
)

# Check the data types of the columns
df_imp_lugano.dtypes

### CONCEPTUAL FRAMEWORK AND METHODOLOGY

Given the anonymized nature and the small sample size of our dataset, combined with numerous variables at play, we've made certain assumptions to ensure its suitable for our use-case. Here’s our approach to understanding player fatigue:

1. **Picking Player Data:**
   Due to lack of player identification data, we had to infer that first 21 rows represent individual player data. This is based on the understanding that a hockey team typically consists of 20 players. This selection ensures we capture data for each unique player.

2. **Identifying Player Roles:**
   We assess the 'Position' column to differentiate between a forward (FW), defenseman (D) and Goalkeeper. The roles they play on the ice rink might result in different fatigue levels.

3. **Data Segmentation:**
   With hundreds of variables present, we narrow down and segment data by recommended metrics from the icehockey club. This targeted analysis aids in the precise estimation of the Fatigue Index (FI).

4. **Developing the Fatigue Formula:**
   Using a combination of the dataset metrics, we create a formula that captures the aspects of player fatigue. This formula is designed to be sensitive to player roles, game dynamics, and other key factors.

5. **Fatigue Estimation:**
   Using the curated data from the previous steps, we derive the Fatigue Index (FI).

6. **Visualization:**
   We design a clear and easy-to-understand graphic to show the fatigue levels. This helps coaches quickly see and understand player tiredness.


In [None]:
# As we want only unique player data, we select only the first 21 rows of the dataframe
# We infer that because an ice hockey team typically consists of 20 players, including 2 goaltenders and 18 skaters.
# and the the proportion of forwards to defensemen
df_imp_lugano_21 = df_imp_lugano.iloc[0:21, :]
df_imp_lugano_21

In [None]:
# Check if any NA in Each Row for the first 21 rows
df_imp_lugano_21.isnull().sum(axis=1)
# df_imp_lugano.isna().any(axis=1)

# We see there are no NA values in the first 21 rows of the dataframe

In [None]:
# Calculate the ratio of forwards to defensemen using position column

df_imp_lugano_21["Position"].value_counts()

In [None]:
# Calculate ratio of F to D
F = df_imp_lugano_21["Position"].value_counts()[0]
D = df_imp_lugano_21["Position"].value_counts()[1]
ratio = F / D

# print ratio as a fraction

print(ratio)

### CREATING A RELATIVE FATIGUE LEVEL FORMULA

- Typically involves empirical research, biomechanics, and physiological considerations.
- However, we propose a simple, hypothetical formula to estimate fatigue.
- DISCLAIMER : This is a rough estimation and not based on scientific research.
- Using the important metrics recommended by the Ice Hockey Club & Professor Martin Rumo, we attempt to create a formula that can be used to estimate fatigue levels.


#### Relative Fatigue Index (RFI) = [A * (Time on ice/ Match Duration) + B * (High Metabolic Power Distance / Total Distance) + C * (Mechanical Intensity)]

- TIME FACTOR : Percentage of Time spent on Ice. Here, the idea is simple: the more time a player spends on the ice, the more fatigued they're likely to be.

- DISTANCE OR INTENSITY FACTOR : Percentage of the distance covered at high intensities. It's a straightforward indicator of how hard a player pushed themselves relative to their overall activity. It is the distance covered at high metabolic power divided by the total distance covered.

- MECHANICAL OR LOAD INTENSITY : How intense a player's session was concerning accelerations and decelerations.It models the intensity of a session or phase and provides important information about its impact on the lower limbs. It is the Mechanical Load divided by the total time in minutes.

- A, B, and C: Weights assigned to each of the factors. These weights are determined by the coach based on the importance of each factor in the context of the team's strategy and the coach's philosophy.


In [None]:
# First we need to normalize the Mechanical Intensity column, so that the values are between 0 and 1
# This is to make sure that it can be added to TIME FACTOR, and DISTANCE FACTOR which are ratios in the next step
# With historical data, from the clubs, min and max intensities can be adjusted further in the future to normalize the intensity

min_val = df_imp_lugano_21["Mechanical Intensity"].min()
max_val = df_imp_lugano_21["Mechanical Intensity"].max()

df_imp_lugano_21["Normalized Mechanical Intensity"] = (
    df_imp_lugano_21["Mechanical Intensity"] - min_val
) / (max_val - min_val)

df_imp_lugano_21

In [None]:
# Create a python function in pandas to calculate fatigue index


def relative_fatigue_index(df):
    """This function calculates the fatigue index for each player"""
    return (
        (df["Time on Ice (s)"] / df["Duration (s)"])
        + df["High Metabolic Power Distance (m)"] / df["Distance (m)"]
    ) + df["Normalized Mechanical Intensity"]

In [None]:
# test the function
relative_fatigue_index(df_imp_lugano_21)

In [None]:
# 1. Compute the intermediate fatigue index for each row
df_imp_lugano_21["Relative Fatigue Index"] = df_imp_lugano_21.apply(
    relative_fatigue_index, axis=1
)

# 2. Normalize this intermediate fatigue index
min_fatigue = df_imp_lugano_21["Relative Fatigue Index"].min()
max_fatigue = df_imp_lugano_21["Relative Fatigue Index"].max()

df_imp_lugano_21["Normalized Fatigue Index"] = (
    df_imp_lugano_21["Relative Fatigue Index"] - min_fatigue
) / (
    max_fatigue - min_fatigue
) + 1  # +1 to make sure the values are between 1 and 2

df_imp_lugano_21

In [None]:
# Sort the dataframe by the final fatigue index in descending order
# This will give us the most fatigued player at the top and the least fatigued player at the bottom

df_imp_lugano_21_sorted = df_imp_lugano_21.sort_values(
    by="Normalized Fatigue Index", ascending=False
)

df_imp_lugano_21_sorted

# REMOVE INTERMEDIATE FATIGUE INDEX, IF NOT REQUIRED.

In [142]:
# MANUAL THRESHOLD BASED CATEGORIZATION OF FATIGUE INDEX

# Fixed thresholds
high_threshold = 1.93
low_threshold = 1.6


# Categorize values based on manual thresholds
df_imp_lugano_21_sorted["Fatigue Category"] = pd.cut(
    df_imp_lugano_21_sorted["Relative Fatigue Index"],
    [0, low_threshold, high_threshold, 2],  # Explicitly using 2 as max value
    labels=["Low", "Medium", "High"],
    include_lowest=True,
)

# Display the results
print(
    df_imp_lugano_21_sorted[["Player ID", "Relative Fatigue Index", "Fatigue Category"]]
)


# # Categorize values
# df_imp_lugano_21_sorted["Fatigue Category"] = pd.cut(
#     df_imp_lugano_21_sorted["Relative Fatigue Index"],
#     [
#         0,
#         low_threshold,
#         high_threshold,
#         df_imp_lugano_21_sorted["Relative Fatigue Index"].max(),
#     ],
#     labels=["Low", "Medium", "High"],
#     include_lowest=True,
# )

# print(
#     df_imp_lugano_21_sorted[["Player ID", "Relative Fatigue Index", "Fatigue Category"]]
# )

    Player ID  Relative Fatigue Index Fatigue Category
10         11                1.669477           Medium
2           3                1.654899           Medium
7           8                1.641736           Medium
20         21                1.616113           Medium
15         16                1.548994              Low
1           2                1.548003              Low
3           4                1.538095              Low
12         13                1.532141              Low
9          10                1.525707              Low
17         18                1.519416              Low
16         17                1.517963              Low
0           1                1.485737              Low
5           6                1.471295              Low
18         19                1.453339              Low
4           5                1.448679              Low
19         20                1.443519              Low
8           9                1.404328              Low
14        

In [None]:
# QUANTILE BASED CATEGORIZATION OF FATIGUE INDEX


# Calculate quantiles
q33 = df_imp_lugano_21_sorted["Relative Fatigue Index"].quantile(0.33)
q66 = df_imp_lugano_21_sorted["Relative Fatigue Index"].quantile(0.66)

# Categorize values
df_imp_lugano_21_sorted["Fatigue Category"] = pd.cut(
    df_imp_lugano_21_sorted["Relative Fatigue Index"],
    [0, q33, q66, df_imp_lugano_21_sorted["Relative Fatigue Index"].max()],
    labels=["Low", "Medium", "High"],
    include_lowest=True,
)

print(
    df_imp_lugano_21_sorted[["Player ID", "Relative Fatigue Index", "Fatigue Category"]]
)

In [None]:
# Map Fatigue Category to pastel colors
color_mapping = {
    "High": "#FF9999",  # Pastel Red
    "Medium": "#FFFF99",  # Pastel Yellow
    "Low": "#99FF99",  # Pastel Green
}
df_imp_lugano_21_sorted["color"] = df_imp_lugano_21_sorted["Fatigue Category"].map(
    color_mapping
)

# Plot
plt.figure(figsize=(12, 8))
df_imp_lugano_21_sorted.plot(
    kind="bar",
    x="Player ID",
    y="Relative Fatigue Index",
    color=df_imp_lugano_21_sorted["color"],
    legend=False,
    ylim=[
        0.75,
        df_imp_lugano_21_sorted["Relative Fatigue Index"].max() + 0.25,
    ],  # Setting y-axis limit
)
plt.title("Player Fatigue Index")
plt.ylabel("Fatigue Index")
plt.xlabel("Player ID")
plt.yticks(
    [
        x / 4
        for x in range(
            3,
            int(4 * df_imp_lugano_21_sorted["Relative Fatigue Index"].max()) + 1,
        )
    ]
)  # Setting y-axis ticks
plt.tight_layout()
plt.show()