# 01 - Calculation of Zermelo Strengths

**Objectives**: This notebook aimes to calculate the Zermelo score of each player for each year from _1991_ to _2024_. The idea is to use the strengths of year _Y_ as starting strengths for year _Y+1_. This allows the model to remember a player's level from one year to the next (if he was playing the previous year of course).

In [114]:
import os
import pandas as pd
from tqdm import tqdm

# importing the function calculating Zermelo strengths from list of matches 
from zermelo import compute_zermelo_strengths

In [115]:
# --- Loading of the full dataset from the .csv file --- 

data_1991_2024_path = "../data/processed/all_matches_1991-2024.csv"

if os.path.exists(data_1991_2024_path):
    print("Loading the data of all matches (ATP, QC and F) between 1991 and 2024...")
    all_matches_1991_2024_data = pd.read_csv(data_1991_2024_path)
    print(f"Done! The content is now available ({len(all_matches_1991_2024_data)} matches).")
else: 
    print("This path doesn't exist!")

Loading the data of all matches (ATP, QC and F) between 1991 and 2024...
Done! The content is now available (808627 matches).


In [116]:
output_path = "../data/processed/zermelo_strengths_1991-2024.csv"

## Iterative Calculation

We iterate over each year by isolating the matches from the current year, applying Zermelo's algorithm on this dataset and storing the results. The variable `last_year_strengths`variable is updated for the next iteration.

The results are initially stored in a dictionary, with the year, the player ID and the associated Zermelo strength.


In [117]:
# check whether the output file already exists
if os.path.exists(output_path):
    print("No calculation needed...Data loading...")
    zermelo_strenghts_1991_2024_data = pd.read_csv(output_path)
    print("The data has been successfully loaded!")

else: 
    # variable to store the strengths from the previous year
    last_year_strengths = None

    # dictionary to store the strengths for all years
    strengths_all_years = {} 

    for year in range(1991,2025):
        print(f"--- {year} ---")

        #Â get the matches from the current year
        all_matches_year = all_matches_1991_2024_data[all_matches_1991_2024_data["year"]==year]

        # apply Zermelo's algorithm
        strengths_year = compute_zermelo_strengths(matches_data=all_matches_year, initial_strengths=last_year_strengths)

        # store the results
        strengths_all_years[year]= strengths_year
        # update the last year strengths
        last_year_strengths = strengths_year

    # convert the dictionary to a DataFrame for storing
    print("\n Creation of the DataFrame to store the results...")

    data = []
    for year, players_dict in strengths_all_years.items():
        for player_id, strength in players_dict.items():
            data.append({
                "year": year,
                "player_id": player_id,
                "zermelo_strength": strength
            })

    zermelo_strenghts_1991_2024_data = pd.DataFrame(data)
    zermelo_strenghts_1991_2024_data.to_csv(output_path, index=False)
    print(f"Data successfully saved to {output_path}!")


No calculation needed...Data loading...
The data has been successfully loaded!


In [118]:
display(zermelo_strenghts_1991_2024_data.head(5))

Unnamed: 0,year,player_id,zermelo_strength
0,1991,100113,0.768773
1,1991,100273,0.957198
2,1991,100282,0.327418
3,1991,100284,5.093587
4,1991,100286,0.261372
