This repository contains the R scripts and dataset required to reproduce the analyses presented in our manuscript evaluating the Women's Tennis Association (WTA) Protected Ranking (PR) mechanism.
The Protected Ranking mechanism is a regulatory tool facilitating the reintegration of elite athletes following prolonged absences, such as injury or maternity leave. This project analyzes 21,779 Grand Slam matches (2000–2024) to evaluate the epidemiological incidence of PR usage and assess its competitive validity using Bradley-Terry models and natural spline-based logistic regression.
Data/: Contains the reconstructed and cross-validated dataset of Grand Slam matches (Protected Ranking Entries.xlsx,datos_combinados.rdsanddatos_combinados.RData). The first one is to correct missing data of PR entry. The second is the database used to provide the analysis.Study/: Contains the R code used for the whole analysis:WTA_Analysis.Rmdwhich contains all the code used for the analysis as well as some comments on the analysis. Those comments were only draft for the preliminary redaction of the final paper. They might contain some mistakes. The main code that you will find is:- Reconstruction of the database due to missing data.
- Descriptive statistics and epidemiological calculations (Incidence Proportion, CIR, RD).
- Bradley-Terry modeling for entry category comparisons.
- Linear and non-linear (cubic spline) logistic regression models.
- Bootstrapped stepwise selection for predictors of PR victories.
- Analysis on withdrawal of female athletes.
WTA_Analysis.htlmwhich is the htlm version of theWTA_Analysis.Rmd.matrix_sel.RData: Pre-calculated bootstrap selection matrix to reduce compilation time.libraries.R: All the libraries used.
The analysis was conducted in R (version 4.4.2). To run the scripts, ensure you have the following packages installed:
binom, BradleyTerry2, broom, car, caret, compareGroups, corrplot, data.table, DHARMa, dplyr, effects, emmeans, fmsb, forcats, ggplot2, knitr, lme4, MASS, mgcv, mgcViz, party, pheatmap, readxl, reshape2, splines, stringr, tidyr, fmsb.
To reproduce the analysis, clone this repository and open the primary .rmd or .R script in RStudio. Ensure that your working directory is set to the root of the cloned repository so that relative file paths (e.g., loading the Excel file) resolve correctly.
(Note: Citation details will be updated upon publication).