# ESTIMATING "RELATIVE PHYSICAL FATIGUE INDEX" OF PLAYERS USING ICE HOCKEY METRICS

#### **Objective:** To develop an application that enables coaches to input game data and obtain instant feedback on potential fatigue levels,based on the dataset's metrics.

#### **Functionality:** Upload data files and obtain instant feedback on potential fatigue levels.

#### **Potential Applications:**

1. **Player Management and Rotation:**
   Use the Fatigue Index (FI) to guide real-time player rotation decisions, optimizing performance and minimizing injury risks.

2. **Injury Prevention in Training:**
   Incorporate FI to tailor training intensity and recovery, reducing the likelihood of player injuries.

3. **Performance Analysis and Strategy:**
   Integrate FI with performance metrics to refine game strategies, making informed in-game decisions for optimal outcomes.


## 1.  DATA LOADING AND PREPROCESSING


This code block imports the necessary libraries for data analysis and visualization in Python.

In [55]:
# All libraries used in the project.
import pandas as pd
import plotly.express as px  # Visualization Fatigue Index each player.
import plotly.graph_objects as go  # Line Chart
import plotly.graph_objects as go  # Cumulative Line Chart (Periods)
import numpy as np  # Cumulative Line Chart (Periods)

In this code block, we import the two Original Datasets.

In [56]:
# Load the Original Datasets
lugano = pd.read_csv("LUGANO-ORIG.csv", encoding="ISO-8859-1", sep=";")
zsc = pd.read_csv("ZHC-ORIG.csv", encoding="ISO-8859-1", sep=";")

In this block of code, the delimiter for two DataFrames (lugano and zsc) is changed to a comma (','), and the resulting data is saved as CSV files named "lhc-lugano.csv" and "lhc-zsc.csv." This step is taken for ease of use and to make the data more accessible for further analysis.


In [57]:
# We change the delimiter to ',' rename and save the file for ease of use
lugano.to_csv("lhc-lugano.csv", sep=",", index=False)
zsc.to_csv("lhc-zsc.csv", sep=",", index=False)

In this code block, the previously saved CSV files ("lhc-lugano.csv" and "lhc-zsc.csv") are loaded into Pandas DataFrames (df_lugano and df_zsc) using the pd.read_csv function. These DataFrames now contain the data from the CSV files, and df_lugano.head() is used to display the first few rows of the "df_lugano" DataFrame for inspection.


In [58]:
# Load the updated datasets into Pandas DataFrame
df_lugano = pd.read_csv("lhc-lugano.csv", encoding="ISO-8859-1", index_col=False)
df_zsc = pd.read_csv("lhc-zsc.csv", encoding="ISO-8859-1", index_col=False)

df_lugano.head()

Unnamed: 0,Group name,League ID,Description,Session ID,Types,Session end (Local timezone),Session end date (Local timezone),Session end date (UTC),Session end time (Local timezone),Session end time (UTC),...,Sprints / min (medium),Sprints / min (high),Sprints / min (very high),Sprints / min (near max),Turns,Sharp Turns,Wide Turns,Turns / min,Sharp Turns / min,Wide Turns / min
0,Defense,,vs Lugano,194,Match,16.11.2022 22:09:47,16.11.2022,16.11.2022,22:09:47,21:09:47,...,35,6,0,0,33,10,23,191,58,133
1,Offense,,vs Lugano,194,Match,16.11.2022 22:09:47,16.11.2022,16.11.2022,22:09:47,21:09:47,...,104,7,17,0,56,9,47,488,78,409
2,Offense,,vs Lugano,194,Match,16.11.2022 22:09:47,16.11.2022,16.11.2022,22:09:47,21:09:47,...,58,32,26,0,48,6,42,309,39,27
3,Offense,,vs Lugano,194,Match,16.11.2022 22:09:47,16.11.2022,16.11.2022,22:09:47,21:09:47,...,86,4,12,0,58,12,46,334,69,265
4,Defense,,vs Lugano,194,Match,16.11.2022 22:09:47,16.11.2022,16.11.2022,22:09:47,21:09:47,...,196,0,0,0,2,0,2,131,0,131


### 1.1 FILTERING THE DATASET TO INCLUDE ONLY THE RELEVANT COLUMNS

This code selects specific columns of interest which are required for analysis (defined in columns_to_keep) and creates two new dataframes, df_imp_lugano and df_imp_zsc. It then displays the first 5 rows of both datasets for examination.

In [59]:
# List of columns to keep in the dataset
columns_to_keep = [
    "Group name",
    "Position",
    "Description",
    "Duration (s)",
    "Distance (m)",
    "Time on Ice (s)",
    "High Metabolic Power Distance (m)",
    "Mechanical Intensity",
]


# Function to filter the columns
def filter_columns(df, columns_to_keep):
    """
    Filters a DataFrame to only include specified columns.

    Parameters:
    - df (pd.DataFrame): The original DataFrame.
    - columns_to_keep (list of str): List of column names to keep.

    Returns:
    - pd.DataFrame: A new DataFrame with only the specified columns.
    """
    return df[columns_to_keep]


# Example usage:
df_imp_lugano = filter_columns(df_lugano, columns_to_keep)
df_imp_zsc = filter_columns(df_zsc, columns_to_keep)

# Print the first 5 rows of the datasets
print(df_imp_lugano.head())

print(
    "===================================================================================================="
)

print(df_imp_zsc.head())

  Group name Position Description  Duration (s)  Distance (m)  \
0    Defense        D   vs Lugano          8836          3163   
1    Offense        F   vs Lugano          8836          2881   
2    Offense        F   vs Lugano          8836          3586   
3    Offense        F   vs Lugano          8836          3881   
4    Defense        D   vs Lugano          8836           354   

   Time on Ice (s)  High Metabolic Power Distance (m) Mechanical Intensity  
0             1038                               1287                45,94  
1              689                               1724                42,85  
2              932                               2090                46,12  
3             1043                               1956                44,38  
4               92                                179                44,95  
     Group name Position Description  Duration (s)  Distance (m)  \
0       Defense        D   vs Zurich          7868          3140   
1       Off

### 1.2 HANDLING MISSING VALUES

This code checks for missing values in each column of df_imp_lugano and df_imp_zsc and prints the counts. It identifies 14 and 7 missing values in the "Mechanical Intensity" column for the two dataframes, respectively.

In [60]:
# Check if any NA in Each Row and column of the dataframes
print(df_imp_lugano.isnull().sum(axis=0))
# df_imp_lugano.isna().any(axis=1)


print(
    "============================================================================================================="
)

print(df_imp_zsc.isnull().sum(axis=0))

# We see 14 and 7, NA values in the column "Mechanical Intensity" in the two dataframes respectively

Group name                            0
Position                              0
Description                           0
Duration (s)                          0
Distance (m)                          0
Time on Ice (s)                       0
High Metabolic Power Distance (m)     0
Mechanical Intensity                 14
dtype: int64
Group name                           0
Position                             0
Description                          0
Duration (s)                         0
Distance (m)                         0
Time on Ice (s)                      0
High Metabolic Power Distance (m)    0
Mechanical Intensity                 7
dtype: int64


This code removes rows with missing values in the "Mechanical Intensity" column for both df_imp_lugano and df_imp_zsc DataFrames. It then checks if there are any remaining missing values in each column of the filtered DataFrames and prints the counts of missing values along with the updated shape of each DataFrame.

In [61]:
# Remove the rows with NA values in the column "Mechanical Intensity"
df_imp_lugano = df_imp_lugano.dropna(subset=["Mechanical Intensity"])

df_imp_zsc = df_imp_zsc.dropna(subset=["Mechanical Intensity"])

# Check if any NA in Each Column of the dataframe
print(df_imp_lugano.isnull().sum(axis=0))
print(df_imp_lugano.shape)

print(
    "============================================================================================================="
)

print(df_imp_zsc.isnull().sum(axis=0))
print(df_imp_zsc.shape)

Group name                           0
Position                             0
Description                          0
Duration (s)                         0
Distance (m)                         0
Time on Ice (s)                      0
High Metabolic Power Distance (m)    0
Mechanical Intensity                 0
dtype: int64
(89, 8)
Group name                           0
Position                             0
Description                          0
Duration (s)                         0
Distance (m)                         0
Time on Ice (s)                      0
High Metabolic Power Distance (m)    0
Mechanical Intensity                 0
dtype: int64
(161, 8)


### 1.3 CREATING A FUNCTION TO REMOVE IRRELEVANT ROWS OR 'PLAYERS'

This code defines a function called filter_group_name that filters a DataFrame to retain rows where the value in the "Group name" column is either 'Offense', 'Defense', or 'Goalies'. This is used to remove rows related to the guest team in the ZSC dataset.

In [62]:
# Function to remove rows where the value of column Group name is not equal to 'Offense' or 'Defense'
# This is done to remove the rows for the guest team present in the ZSC dataset


def filter_group_name(df):
    """
    Filters the DataFrame to retain rows where 'Group name' is 'Offense' or 'Defense'.

    Parameters:
    - df (pd.DataFrame): The input DataFrame.

    Returns:
    - pd.DataFrame: The filtered DataFrame.
    """
    return df[df["Group name"].isin(["Offense", "Defense", "Goalies"])]


# Example usage:
df_filtered_zsc = filter_group_name(df_imp_zsc)

df_filtered_lugano = filter_group_name(df_imp_lugano)

# Display the filtered data:
# df_filtered_zsc

# Display the filtered data:
df_filtered_lugano

Unnamed: 0,Group name,Position,Description,Duration (s),Distance (m),Time on Ice (s),High Metabolic Power Distance (m),Mechanical Intensity
0,Defense,D,vs Lugano,8836,3163,1038,1287,4594
1,Offense,F,vs Lugano,8836,2881,689,1724,4285
2,Offense,F,vs Lugano,8836,3586,932,2090,4612
3,Offense,F,vs Lugano,8836,3881,1043,1956,4438
4,Defense,D,vs Lugano,8836,354,92,179,4495
...,...,...,...,...,...,...,...,...
93,Offense,F,Overtime,129,254,51,180,3832
94,Defense,D,Overtime,129,92,27,12,2567
97,Defense,D,Overtime,129,235,46,148,3233
99,Offense,F,Overtime,129,228,48,131,436


### 1.4 ADDING PLAYER ID ONLY TO THE ROWS WITH COLUMN 'DESCRIPTION' = 'VS LUGANO' OR 'VS ZURICH' (GAME DATA ONLY, NOT PERIODS)

This code defines a function "add_player_id" to add a 'Player ID' column as we don't have any such information in the Database, starting from 1, but only for rows with 'Description' as 'vs Lugano' or 'vs Zurich' i.e. (overall game) not Periods, and displays the first 7 rows of each updated DataFrame.

In [63]:
def add_player_id(df):
    """
    Adds a new column 'Player ID' to the DataFrame starting from 1, but only to the rows
    with 'Description' as 'vs Lugano' or 'vs Zurich'.

    Parameters:
    - df (pd.DataFrame): The input DataFrame.

    Returns:
    - pd.DataFrame: The DataFrame with the new 'Player ID' column added to specific rows.
    """
    mask = df["Description"].isin(["vs Lugano", "vs Zurich"])
    df.loc[mask, "Player ID"] = list(range(1, sum(mask) + 1))
    df["Player ID"] = df["Player ID"].astype(
        "Int64"
    )  # Using Int64 to handle potential NaNs
    return df


# Example usage:
df_imp_lugano = add_player_id(df_filtered_lugano)
df_imp_zsc = add_player_id(df_filtered_zsc)

# Displaying the first 7 rows:
print(df_imp_lugano.head(7))

print(
    "============================================================================================================="
)

print(df_imp_zsc.head(7))

  Group name    Position Description  Duration (s)  Distance (m)  \
0    Defense           D   vs Lugano          8836          3163   
1    Offense           F   vs Lugano          8836          2881   
2    Offense           F   vs Lugano          8836          3586   
3    Offense           F   vs Lugano          8836          3881   
4    Defense           D   vs Lugano          8836           354   
5    Defense           D   vs Lugano          8836          5087   
6    Goalies  Goalkeeper   vs Lugano          8836          1066   

   Time on Ice (s)  High Metabolic Power Distance (m) Mechanical Intensity  \
0             1038                               1287                45,94   
1              689                               1724                42,85   
2              932                               2090                46,12   
3             1043                               1956                44,38   
4               92                                179            



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



### 1.5 CHECKING COLUMN DATA TYPES

This code checks and prints the data types of the columns in the "df_imp_lugano" and "df_imp_zsc" DataFrames. It helps in understanding the data types of each column, which can be important for data analysis and processing.

In [64]:
# Check the data types of the columns
print(df_imp_lugano.dtypes)

print(
    "============================================================================================================="
)

print(df_imp_zsc.dtypes)

Group name                           object
Position                             object
Description                          object
Duration (s)                          int64
Distance (m)                          int64
Time on Ice (s)                       int64
High Metabolic Power Distance (m)     int64
Mechanical Intensity                 object
Player ID                             Int64
dtype: object
Group name                           object
Position                             object
Description                          object
Duration (s)                          int64
Distance (m)                          int64
Time on Ice (s)                       int64
High Metabolic Power Distance (m)     int64
Mechanical Intensity                 object
Player ID                             Int64
dtype: object


### 1.6 CONVERTING MECHANICAL INTENSITY COLUMN TO FLOAT

This code defines a function that converts the 'Mechanical Intensity' column in the DataFrame from a string to a float data type which is required for further analysis.

In [65]:
def convert_mechanical_intensity_to_float(df):
    """
    Converts the data type of the 'Mechanical Intensity' column from string to float.

    Parameters:
    - df (pd.DataFrame): The input DataFrame.

    Returns:
    - pd.DataFrame: The DataFrame with the 'Mechanical Intensity' column converted to float.
    """
    df["Mechanical Intensity"] = (
        df["Mechanical Intensity"].str.replace(",", ".").astype(float)
    )
    return df


# Example usage:
df_imp_lugano = convert_mechanical_intensity_to_float(df_imp_lugano)
df_imp_zsc = convert_mechanical_intensity_to_float(df_imp_zsc)

# Checking the data types:
print(df_imp_lugano.dtypes)

print(
    "============================================================================================================="
)

print(df_imp_zsc.dtypes)

Group name                            object
Position                              object
Description                           object
Duration (s)                           int64
Distance (m)                           int64
Time on Ice (s)                        int64
High Metabolic Power Distance (m)      int64
Mechanical Intensity                 float64
Player ID                              Int64
dtype: object
Group name                            object
Position                              object
Description                           object
Duration (s)                           int64
Distance (m)                           int64
Time on Ice (s)                        int64
High Metabolic Power Distance (m)      int64
Mechanical Intensity                 float64
Player ID                              Int64
dtype: object


## 2. CONCEPTUAL FRAMEWORK AND METHODOLOGY

Given the anonymized nature and the small sample size of our dataset, combined with numerous variables at play, we've made certain assumptions to ensure its suitable for our use-case. Here’s our approach to understanding player fatigue:

1. **Picking Player Data:**
   Due to lack of player identification data, we had to infer that first 21 rows represent individual player data. This is based on the understanding that a hockey team typically consists of 20 players. This selection ensures we capture data for each unique player.

2. **Identifying Player Roles:**
   We assess the 'Position' column to differentiate between a forward (FW), defenseman (D) and Goalkeeper. The roles they play on the ice rink might result in different fatigue levels.

3. **Data Segmentation:**
   With hundreds of variables present, we narrow down and segment data by recommended metrics from the icehockey club. This targeted analysis aids in the precise estimation of the Fatigue Index (FI).

4. **Developing the Fatigue Formula:**
   Using a combination of the dataset metrics, we create a formula that captures the aspects of player fatigue. This formula is designed to be sensitive to player roles, game dynamics, and other key factors.

5. **Fatigue Estimation:**
   Using the curated data from the previous steps, we derive the Fatigue Index (FI).

6. **Visualization:**
   We design a clear and easy-to-understand graphic to show the fatigue levels. This helps coaches quickly see and understand player tiredness.


### 2.1 FILTERING THE DATASET BASED ON PERIOD OR FULL GAME

This code defines a function filter_period_data that filters a DataFrame based on the provided description. In this case, it's used to select rows with the description "vs Lugano" or "vs Zurich, to filter only match data not the period data from the Dataframe. The code then displays the filtered match data for both datasets.

In [66]:
# As we want only unique player data, we can safely select the game data i.e rows with Description = "vs Lugano"
# We can infer that because an ice hockey team typically consists of 20 players, including 2 goaltenders and 18 skaters.
# and "vs Lugano" implies that that row is for the game data NOT period data.
# and the proportion of forwards to defensemen


def filter_period_data(df, description):
    """
    Filters the DataFrame based on the given description.

    Parameters:
    - df (pd.DataFrame): The input DataFrame.
    - description (str): The description to filter by. Can be '1st Period', '2nd Period', '3rd Period' or 'vs Lugano' or 'vs Zurich'.

    Returns:
    - pd.DataFrame: The filtered DataFrame.
    """
    return df[df["Description"] == description]


# Example usage:
df_imp_lugano_game = filter_period_data(df_imp_lugano, "vs Lugano")
df_imp_zsc_game = filter_period_data(df_imp_zsc, "vs Zurich")

# Display the filtered data:

print(df_imp_zsc_game)

print(
    "============================================================================================================="
)

print(df_imp_lugano_game)

   Group name    Position Description  Duration (s)  Distance (m)  \
0     Defense           D   vs Zurich          7868          3140   
1     Offense           F   vs Zurich          7868          3037   
2     Offense           F   vs Zurich          7868          4872   
10    Offense           F   vs Zurich          7868          3648   
11    Offense           F   vs Zurich          7868          2747   
12    Defense           D   vs Zurich          7868          1324   
28    Defense           D   vs Zurich          7868          3048   
29    Goalies  Goalkeeper   vs Zurich          7868          1124   
30    Offense           F   vs Zurich          7868          4376   
31    Offense           F   vs Zurich          7868          3824   
32    Defense           D   vs Zurich          7868          4242   
33    Defense           D   vs Zurich          7868          4327   
34    Defense           F   vs Zurich          7868          1574   
35    Offense           F   vs Zur

This code calculates and displays the counts of players in different positions in the match data for both Lugano and Zurich teams.

In [67]:
# Calculate the ratio of forwards to defensemen using position column

print(df_imp_lugano_game["Position"].value_counts())

print(
    "============================================================================================================="
)

print(df_imp_zsc_game["Position"].value_counts())

Position
F             14
D              6
Goalkeeper     1
Name: count, dtype: int64
Position
F             13
D              6
Goalkeeper     1
Name: count, dtype: int64


This code calculates the ratio of forwards to defensemen for both Lugano (FL/DL) and ZSC (FZ/DZ) teams in the match data. It then prints these ratios as fractions.


In [68]:
# Calculate ratio of F to D for Lugano
FL = df_imp_lugano_game["Position"].value_counts()[0]
DL = df_imp_lugano_game["Position"].value_counts()[1]
ratio_L = FL / DL

# print ratio as a fraction

print(ratio_L)

print(
    "============================================================================================================="
)

# Calculate ratio of F to D for ZSC
FZ = df_imp_zsc_game["Position"].value_counts()[0]
DZ = df_imp_zsc_game["Position"].value_counts()[1]
ratio_Z = FZ / DZ

# print ratio as a fraction

print(ratio_Z)

2.3333333333333335
2.1666666666666665



Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`


Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`


Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`


Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`



## 3. CREATING A RELATIVE FATIGUE LEVEL FORMULA

- Typically involves empirical research, biomechanics, and physiological considerations.
- However, we propose a simple, hypothetical formula to estimate fatigue.
- DISCLAIMER : This is a rough estimation and not based on scientific research.
- Using the important metrics recommended by the Ice Hockey Club & Professor Martin Rumo, we attempt to create a formula that can be used to estimate fatigue levels.


#### Relative Fatigue Index (RFI) = [A * (Time on ice/ Match Duration) + B * (High Metabolic Power Distance / Total Distance) + C * (Mechanical Intensity)]

- TIME FACTOR : Percentage of Time spent on Ice. Here, the idea is simple: the more time a player spends on the ice, the more fatigued they're likely to be.

- DISTANCE OR INTENSITY FACTOR : Percentage of the distance covered at high intensities. It's a straightforward indicator of how hard a player pushed themselves relative to their overall activity. It is the distance covered at high metabolic power divided by the total distance covered.

- MECHANICAL OR LOAD INTENSITY : How intense a player's session was concerning accelerations and decelerations.It models the intensity of a session or phase and provides important information about its impact on the lower limbs. It is the Mechanical Load divided by the total time in minutes.

- A, B, and C: Weights assigned to each of the factors. These weights are determined by the coach based on the importance of each factor in the context of the team's strategy and the coach's philosophy.


### 3.1 CALCULATIONS FOR GAME DATA


#### NORMALIZE MECHANICAL INTENSITY


This code defines a function "normalize_column" that normalizes a specified column of a DataFrame between 0 and 1. It takes as input a DataFrame and the name of the column to be normalized, calculates the minimum and maximum values in that column, and then creates a new column with the normalized values.

In [69]:
def normalize_column(df: pd.DataFrame, column_name: str) -> pd.DataFrame:
    """
    Normalize the specified column of a DataFrame between 0 and 1.

    Parameters:
        df (pd.DataFrame): The input DataFrame.
        column_name (str): The column name to be normalized.

    Returns:
        pd.DataFrame: DataFrame with the specified column normalized.
    """

    min_val = df[column_name].min()
    max_val = df[column_name].max()

    df[f"Normalized {column_name}"] = (df[column_name] - min_val) / (max_val - min_val)

    return df

This code normalizes the "Mechanical Intensity" column in the game data DataFrames for both ZSC and Lugano. It applies the normalize_column function to each DataFrame, creating new columns with normalized values. The resulting normalized data for both teams is displayed.

In [70]:
# NORMALIZE MECHANICAL INTENSITY COLUMN FOR LUGANO AND ZSC GAME DATA
df_imp_zsc_game = normalize_column(df_imp_zsc_game, "Mechanical Intensity")

df_imp_zsc_game

print(
    "============================================================================================================="
)

df_imp_lugano_game = normalize_column(df_imp_lugano_game, "Mechanical Intensity")

df_imp_lugano_game





A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Group name,Position,Description,Duration (s),Distance (m),Time on Ice (s),High Metabolic Power Distance (m),Mechanical Intensity,Player ID,Normalized Mechanical Intensity
0,Defense,D,vs Lugano,8836,3163,1038,1287,45.94,1,0.961371
1,Offense,F,vs Lugano,8836,2881,689,1724,42.85,2,0.871624
2,Offense,F,vs Lugano,8836,3586,932,2090,46.12,3,0.966599
3,Offense,F,vs Lugano,8836,3881,1043,1956,44.38,4,0.916062
4,Defense,D,vs Lugano,8836,354,92,179,44.95,5,0.932617
5,Defense,D,vs Lugano,8836,5087,1360,2501,41.27,6,0.825733
6,Goalies,Goalkeeper,vs Lugano,8836,1066,4007,7,19.29,7,0.187337
7,Offense,F,vs Lugano,8836,4683,1162,2758,44.56,8,0.92129
8,Offense,F,vs Lugano,8836,4009,1239,2066,38.62,9,0.748766
9,Defense,D,vs Lugano,8836,4383,1307,1913,45.25,10,0.94133


#### CREATE RELATIVE FATIGUE INDEX (RFI) FUNCTION


This code defines a function fatigue_index that calculates the fatigue index for each player based on input parameters A, B, and C. It uses various columns from the DataFrame to perform the calculations.

The function is then tested with values A=1, B=1, and C=1 using the Lugano game data DataFrame (df_imp_lugano_game) as input. It calculates and returns the fatigue index for each player in the dataset.

In [71]:
def fatigue_index(df, A=1, B=1, C=1):
    """This function calculates the fatigue index for each player."""
    return (
        A * (df["Time on Ice (s)"] / df["Duration (s)"])
        + B * df["High Metabolic Power Distance (m)"] / df["Distance (m)"]
    ) + C * df["Normalized Mechanical Intensity"]


# test the function
A = 1
B = 1
C = 1

fatigue_index(df_imp_lugano_game, A, B, C)
fatigue_index(df_imp_zsc_game, A, B, C)  # trying now for zsc

0     1.143796
1     0.873329
2     1.192197
10    1.170528
11    1.218053
12    1.613342
28    1.171522
29    0.519065
30    1.386653
31    1.010780
32    1.191856
33    0.820656
34    1.190677
35    1.231283
36    1.041868
37    0.752794
38    1.212085
39    0.867318
40    1.155663
41    0.706194
dtype: float64

### 3.2 FATIGUE INDEX CALCULATION (MATCH DATA)


#### CREATE FUNCTION TO CALCULATE THE FATIGUE INDEX FOR EACH PLAYER


This code defines a function "compute_normalized_fatigue" that calculates and normalizes the fatigue index for a DataFrame based on a provided function. It sorts the DataFrame by the normalized fatigue index in ascending order.


In [72]:
def compute_normalized_fatigue(df: pd.DataFrame, fatigue_function) -> pd.DataFrame:
    """
    Compute the normalized fatigue index for the input DataFrame based on the provided fatigue function.

    Parameters:
        df (pd.DataFrame): The input DataFrame.
        fatigue_function (function): The function to compute the fatigue index for each row of df.

    Returns:
        pd.DataFrame: DataFrame with normalized fatigue index and sorted by it.
    """

    # 1. Compute the fatigue index for each row
    df["Fatigue Index"] = df.apply(fatigue_function, axis=1)

    # 2. Normalize the fatigue index
    min_fatigue = df["Fatigue Index"].min()
    max_fatigue = df["Fatigue Index"].max()

    # +1 to make sure the values are between 1 and 2
    df["Normalized Fatigue Index"] = (df["Fatigue Index"] - min_fatigue) / (
        max_fatigue - min_fatigue
    ) + 1

    # Sort the dataframe by the final fatigue index in descending order
    df_sorted = df.sort_values(by="Normalized Fatigue Index", ascending=True)

    return df_sorted

This code calculates the normalized fatigue index for both Lugano and ZSC game data using the compute_normalized_fatigue function with the fatigue_index function as input. It creates new DataFrames df_imp_zsc_game_FI and df_imp_lugano_game_FI containing the normalized fatigue index values for each player.

In [73]:
# Calculate the normalized fatigue index for Lugano and ZSC game data

df_imp_zsc_game_FI = compute_normalized_fatigue(df_imp_zsc_game, fatigue_index)

# df_imp_zsc_game_FI

print(
    "============================================================================================================="
)

df_imp_lugano_game_FI = compute_normalized_fatigue(df_imp_lugano_game, fatigue_index)
df_imp_lugano_game_FI





A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

Unnamed: 0,Group name,Position,Description,Duration (s),Distance (m),Time on Ice (s),High Metabolic Power Distance (m),Mechanical Intensity,Player ID,Normalized Mechanical Intensity,Fatigue Index,Normalized Fatigue Index
13,Offense,F,vs Lugano,8836,21,9,0,12.84,14,0.0,0.001019,1.0
6,Goalies,Goalkeeper,vs Lugano,8836,1066,4007,7,19.29,7,0.187337,0.647389,1.387406
11,Defense,D,vs Lugano,8836,4771,1396,1499,29.33,12,0.478943,0.951123,1.56945
14,Defense,D,vs Lugano,8836,5223,1485,2555,38.53,15,0.746152,1.403397,1.840523
8,Offense,F,vs Lugano,8836,4009,1239,2066,38.62,9,0.748766,1.404328,1.841081
19,Offense,F,vs Lugano,8836,2595,609,1465,40.73,20,0.810049,1.443519,1.864571
4,Defense,D,vs Lugano,8836,354,92,179,44.95,5,0.932617,1.448679,1.867663
18,Offense,F,vs Lugano,8836,4039,1034,2047,41.4,19,0.829509,1.453339,1.870456
5,Defense,D,vs Lugano,8836,5087,1360,2501,41.27,6,0.825733,1.471295,1.881218
0,Defense,D,vs Lugano,8836,3163,1038,1287,45.94,1,0.961371,1.485737,1.889874


#### COLOR CATEGORIZATION OF FATIGUE LEVELS


This code categorizes the fatigue index into "Low," "Medium," or "High" categories based on predefined threshold values. It creates a new DataFrame, df_imp_lugano_game_FI, which includes these categories and assigns an order to each player based on their fatigue index.

In [74]:
# MANUAL THRESHOLD BASED CATEGORIZATION OF FATIGUE INDEX

# Fixed thresholds
high_threshold = 1.57
low_threshold = 1


def categorize_fatigue(df, fatigue_col, low_threshold, high_threshold):
    """
    Categorize fatigue based on given thresholds and assign an order.

    Parameters:
    - df (pd.DataFrame): The input DataFrame
    - fatigue_col (str): The column name of the fatigue index in the dataframe
    - low_threshold (float): The threshold value for Low fatigue
    - high_threshold (float): The threshold value for High fatigue

    Returns:
    - pd.DataFrame: A DataFrame with categorized fatigue and order
    """
    # Make a copy of the input dataframe to avoid modifying the original
    df_result = df.copy()

    # Categorize values based on manual thresholds
    df_result["Fatigue Category"] = pd.cut(
        df_result[fatigue_col],
        [0, low_threshold, high_threshold, 2],  # Explicitly using 2 as max value
        labels=["Low", "Medium", "High"],
        include_lowest=True,
    )

    # Create from "Fatigue Index" a column called "order_fatigue" starting from 1 in string format and incrementing by 1 for each row
    df_result["order_fatigue"] = (
        df_result["Fatigue Index"]
        .rank(method="dense", ascending=True)
        .astype(int)
        .astype(str)
    )

    return df_result


# Usage example
df_imp_lugano_game_FI = categorize_fatigue(
    df_imp_lugano_game_FI, "Fatigue Index", low_threshold, high_threshold
)
print(df_imp_lugano_game_FI.head())


# trying for zsc
df_imp_zsc_game_FI = categorize_fatigue(
    df_imp_zsc_game_FI, "Fatigue Index", low_threshold, high_threshold
)
print(df_imp_zsc_game_FI.head())

   Group name    Position Description  Duration (s)  Distance (m)  \
13    Offense           F   vs Lugano          8836            21   
6     Goalies  Goalkeeper   vs Lugano          8836          1066   
11    Defense           D   vs Lugano          8836          4771   
14    Defense           D   vs Lugano          8836          5223   
8     Offense           F   vs Lugano          8836          4009   

    Time on Ice (s)  High Metabolic Power Distance (m)  Mechanical Intensity  \
13                9                                  0                 12.84   
6              4007                                  7                 19.29   
11             1396                               1499                 29.33   
14             1485                               2555                 38.53   
8              1239                               2066                 38.62   

    Player ID  Normalized Mechanical Intensity  Fatigue Index  \
13         14                         0

## 4. VISUALIZATION OF FATIGUE INDEX FOR EACH PLAYER


This code creates an interactive bar plot using Plotly to visualize the normalized fatigue index of players from the Lugano team. It categorizes the players into "Low," "Medium," or "High" fatigue categories and assigns colors accordingly. The plot displays Player ID on the x-axis and the normalized fatigue index on the y-axis. The legend represents the fatigue categories, and you can interact with the plot to explore the data visually.

In [75]:
import plotly.express as px


def plot_fatigue_index(df):
    """
    Generates an interactive bar plot displaying player fatigue levels using Plotly.

    Parameters:
    - df (pd.DataFrame): The input dataframe containing the required data. The dataframe
                         should have columns named "order_fatigue", "Normalized Fatigue Index",
                         "Fatigue Category", "Player ID", and "Position".

    Returns:
    None. Displays the generated plot.

    Example:
    plot_fatigue_index(df_imp_lugano_game_FI)
    """

    # Define a color map for the fatigue categories
    color_map = {
        "Low": "#66FF66",  # Darker Green
        "Medium": "#FFFF66",  # Darker Yellow
        "High": "#FF6666",  # Darker Red
    }

    df["Color"] = df["Fatigue Category"].map(color_map)

    fig = px.bar(
        df,
        x="order_fatigue",
        y="Normalized Fatigue Index",
        color="Fatigue Category",
        color_discrete_map=color_map,
        title=f"Normalized Fatigue Index of Players ({df.loc[0, 'Description']})",  # Making the title dynamic
        labels={"Normalized Fatigue Index": "Normalized Fatigue Index"},
        text="Position",
    )

    # Customize the layout
    fig.update_layout(
        xaxis_title="Player ID",
        xaxis_tickvals=df["order_fatigue"].tolist(),
        xaxis_ticktext=df["Player ID"].tolist(),
        yaxis_title="Normalized Fatigue Index",
        font=dict(family="Arial", size=14),
        legend_title="Fatigue Category",
        legend=dict(
            title_font=dict(size=14),
            font=dict(size=12),
            x=0.30,
            y=1.0,
            bgcolor="rgba(255,255,255,0.5)",
            bordercolor="black",
            borderwidth=1,
            orientation="h",
        ),
        xaxis=dict(tickfont=dict(size=12)),
        yaxis=dict(
            tickvals=[i * 0.25 + 0.75 for i in range(6)], tickfont=dict(size=12)
        ),
        yaxis_range=[0.75, 2.25],
    )

    # Show the interactive plot
    fig.show()

### 4.1 VISUALIZATION OF FATIGUE INDEX FOR EACH PLAYER (LUGANO)

In [76]:
# Plot the fatigue index for Lugano
plot_fatigue_index(df_imp_lugano_game_FI)





### 4.2 VISUALIZATION OF FATIGUE INDEX FOR EACH PLAYER (ZURICH)

In [77]:
# Plot the fatigue index for ZSC
plot_fatigue_index(df_imp_zsc_game_FI)





## 5. VISUALIZATION OF AVERAGE FATIGUE INDEX OF TEAM BY PERIODS


This code filters and analyzes player data for different "Periods" and "Overtime" of the game. It calculates and normalizes fatigue indices, then sorts the data and calculates the average fatigue index for that period.

In [78]:
def process_period_data(df, period_name):
    """
    Process the dataframe for a specific period by normalizing mechanical intensity and computing fatigue index.

    Parameters:
    - df (pd.DataFrame): The dataframe containing the raw data.
    - period_name (str): The name of the period to filter for (e.g., "Period 1").
    - normalize_column (function): The function to normalize the mechanical intensity.
    - fatigue_index (function): The function to compute fatigue index for each row.

    Returns:
    pd.DataFrame: The processed dataframe for the specific period.
    float: The average fatigue index for the period.
    """

    # Filter rows by Description column for the specified period
    df_period = df[df["Description"].str.contains(period_name)]

    # Normalize the Mechanical Intensity column
    df_period = normalize_column(df_period, "Mechanical Intensity")

    # Compute the normalized fatigue index
    df_period = compute_normalized_fatigue(df_period, fatigue_index)

    # Calculate the average fatigue index for the period
    avg_FI_period = df_period["Fatigue Index"].mean()

    return df_period, avg_FI_period


# Example usage:
df_period1_processed, avg_FI_period1 = process_period_data(df_imp_lugano, "Period 1")
print(df_period1_processed)
print(avg_FI_period1)

   Group name    Position Description  Duration (s)  Distance (m)  \
27    Goalies  Goalkeeper    Period 1          1736           278   
32    Defense           D    Period 1          1736          1144   
39    Offense           F    Period 1          1736          1154   
35    Defense           D    Period 1          1736          1570   
30    Defense           D    Period 1          1736          1426   
37    Offense           F    Period 1          1736          1372   
21    Defense           D    Period 1          1736          1286   
29    Offense           F    Period 1          1736          1405   
22    Offense           F    Period 1          1736          1022   
26    Defense           D    Period 1          1736          1706   
41    Offense           F    Period 1          1736          1061   
24    Offense           F    Period 1          1736          1274   
31    Offense           F    Period 1          1736          1598   
33    Defense           F    Perio



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



### 5.1 VS LUGANO: FATIGUE INDEX CALCULATION FOR DIFFERENT "PERIODS" AND "OVERTIME" 


This code uses above function and calculates and normalizes fatigue indices, then sorts the data and calculates the average fatigue index for all "Periods" and "Overtime". (for vs Lugano)

In [79]:
df_period1_processed_lug, avg_FI_period1_lug = process_period_data(
    df_imp_lugano, "Period 1"
)
df_period2_processed_lug, avg_FI_period2_lug = process_period_data(
    df_imp_lugano, "Period 2"
)
df_period3_processed_lug, avg_FI_period3_lug = process_period_data(
    df_imp_lugano, "Period 3"
)
df_overtime_processed_lug, avg_FI_overtime_lug = process_period_data(
    df_imp_lugano, "Overtime"
)

print(avg_FI_period1_lug, avg_FI_period2_lug, avg_FI_period3_lug, avg_FI_overtime_lug)

1.4638269712144625 1.4623009219845824 1.3671997994428162 1.2528037988659964




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

### 5.2 VS ZURICH: FATIGUE INDEX CALCULATION FOR PERIOD 1, 2 AND 3 FATIGUE INDEX CALCULATION FOR DIFFERENT "PERIODS" AND "OVERTIME" 

Similary we use the function on (vs ZSC dataframe) to calculate and normalizes fatigue indices, then sort the data and calculate the average fatigue index for all "Periods" and "Overtime". (for vs ZSC)

In [80]:
df_period1_processed_zsc, avg_FI_period1_zsc = process_period_data(
    df_imp_zsc, "Period 1"
)
df_period2_processed_zsc, avg_FI_period2_zsc = process_period_data(
    df_imp_zsc, "Period 2"
)
df_period3_processed_zsc, avg_FI_period3_zsc = process_period_data(
    df_imp_zsc, "Period 3"
)


print(avg_FI_period1_zsc, avg_FI_period2_zsc, avg_FI_period3_zsc)

1.3756061413324456 1.2114185897134728 1.0643050689291411




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

### 5.3 LINE CHART OF AVERAGE FATIGUE INDEX OF TEAM (CHANGE IN FATIGUE INDEX OVER TIME)

### Understanding Fatigue Index: Average vs Cumulative

- **Average Fatigue Index**:
  
  - Represents the **typical fatigue** in a game period.
  
  - Indicates how intense a specific period was.

  
- **Cumulative Fatigue Index**:
  
  - Reflects the **running total** of fatigue across the game.
  
  - Demonstrates the **building toll** on players as the game progresses.

  
#### Key Takeaway:

- Using the **Average** tells us: *How hard was a specific period?*

- The **Cumulative** illustrates: *How much fatigue have players gathered over the course of the game?*




This code creates an interactive line chart using Plotly to display the average fatigue index of Team Lugano in each period and overtime.

In [81]:
# Data
periods = ["Period 1", "Period 2", "Period 3", "Overtime"]
avg_fatigues_zsc = [avg_FI_period1_zsc, avg_FI_period2_zsc, avg_FI_period3_zsc]
avg_fatigues_lugano = [
    avg_FI_period1_lug,
    avg_FI_period2_lug,
    avg_FI_period3_lug,
    avg_FI_overtime_lug,
]

# Create a plot
fig = go.Figure()

# Add data
fig.add_trace(
    go.Scatter(x=periods, y=avg_fatigues_zsc, mode="lines+markers", name="ZSC")
)
fig.add_trace(
    go.Scatter(x=periods, y=avg_fatigues_lugano, mode="lines+markers", name="Lugano")
)

# Customize layout
fig.update_layout(
    title="Average Fatigue Index across Periods",
    xaxis_title="Periods",
    yaxis_title="Average Fatigue Index",
    font=dict(family="Arial", size=14),
)

# Show the plot
fig.show()

### 5.3 LINE CHART OF AVERAGE FATIGUE INDEX OF TEAM (CHANGE IN FATIGUE INDEX OVER TIME)

This code plots the cumulative sums of average fatigue for ZSC and Lugano across game periods, providing insights into fatigue trends during the game.

In [82]:
import numpy as np

# Calculate cumulative sums (not average, but to represent the trend)
cumulative_sum_zsc = np.cumsum(avg_fatigues_zsc)
cumulative_sum_lugano = np.cumsum(avg_fatigues_lugano)

# Create a new plot for cumulative sums
fig_cumulative = go.Figure()

# Add data for ZSC
fig_cumulative.add_trace(
    go.Scatter(
        x=periods, y=cumulative_sum_zsc, mode="lines+markers", name="ZSC Cumulative Sum"
    )
)

# Add data for Lugano
fig_cumulative.add_trace(
    go.Scatter(
        x=periods,
        y=cumulative_sum_lugano,
        mode="lines+markers",
        name="Lugano Cumulative Sum",
    )
)

# Customize the layout
fig_cumulative.update_layout(
    title="Cumulative Sum of Average Fatigue Index across Periods",
    xaxis_title="Periods",
    yaxis_title="Cumulative Sum of Average Fatigue Index",
    font=dict(family="Arial", size=14),
)

# Show the plot
fig_cumulative.show()

## 6. FUTURE SCOPE AND NEXT STEPS

Looking ahead, we have some ideas to make our tool even better:

- **Frontend Application**: The project can be extended with a Frontend Application to show Fatigue Index and other helpful information.

- **Integrating AI**: We plan to use AI to give coaches suggestions based on the Fatigue Index. This will help make sure players are safe and playing at their best.

- **Feedback from Users**: We want to know what coaches and players think. Their feedback will help us improve.

- **Use in Other Sports**: Right now, we're focusing on Icehockey. But we think our tool can be helpful in other sports too.

We're excited about the next steps and making our tool even better for everyone.

