## Notebook Overview: Scraping the Top 16 Sprinters and Their Career Performances

This notebook is structured in two main phases and serves to collect and compile detailed data on elite 100m male sprinters for the 2024 season and beyond. It is part of a broader project aiming to analyze and simulate sprint duel outcomes using real-world performance data.

### Phase 1 – Identifying the Top 16 Sprinters of 2024:
- The script first scrapes the World Athletics results page for the men's 100m to retrieve the **top 16 performers of the 2024 season**.
- It extracts their **names**, **performance times**, and especially the **URLs of their athlete profiles**, which are needed for further scraping.
- This list serves as the input for the second phase.

### Phase 2 – Scraping Career Race Data:
- Using the list of sprinters gathered in Phase 1, the notebook navigates to each athlete's World Athletics profile.
- It scrapes **all historical 100m race results** available on their profile, including:
  - Date and location
  - Performance time
  - Wind conditions
  - Race category (final, semi-final, heat, etc. when available)
- The scraper interacts with the website using **Selenium**, handling dynamic content by expanding race result tables.
- The collected data for each athlete is appended into a single pandas DataFrame.

### Output:
- The complete dataset contains structured information on every 100m race ever run by each of the top 16 athletes identified in 2024.
- This dataset is intended for use in downstream tasks like predictive modeling, duel simulations, and performance progression analysis.

In [12]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
import time
import pandas as pd
import re
from datetime import datetime
import os

In [2]:
# Initialisation de Selenium
driver = webdriver.Chrome()
driver.get('https://worldathletics.org/records/toplists/sprints/100-metres/all/men/senior/2024?regionType=world&timing=electronic&windReading=regular&page=1&bestResultsOnly=true&maxResultsByCountry=all&eventId=10229630&ageCategory=senior')

# Attente que le tableau charge
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, 'records-table'))
)

# Récupération des lignes du tableau (16 premières)
rows = driver.find_elements(By.CSS_SELECTOR, "table.records-table tbody tr")[:16]

data = []

for row in rows:
    cols = row.find_elements(By.TAG_NAME, "td")
    
    rank = cols[0].text.strip()
    mark = cols[1].text.strip()
    wind = cols[2].text.strip()
    name_element = cols[3].find_element(By.TAG_NAME, 'a')
    name = name_element.text.strip()
    profile_url = name_element.get_attribute('href')
    dob = cols[4].text.strip()
    nation = cols[5].text.strip()
    position = cols[6].text.strip()
    venue = cols[8].text.strip()
    date = cols[9].text.strip()
    result_score = cols[10].text.strip()

    data.append({
        "Rank": rank,
        "Mark": mark,
        "Wind": wind,
        "Name": name,
        "Profile URL": profile_url,
        "DOB": dob,
        "Nation": nation,
        "Position": position,
        "Venue": venue,
        "Date": date,
        "Result Score": result_score
    })

driver.quit()

df = pd.DataFrame(data)

In [5]:
df.head(16)

Unnamed: 0,Rank,Mark,Wind,Name,Profile URL,DOB,Nation,Position,Venue,Date,Result Score
0,1,9.77,0.9,Kishane THOMPSON,https://worldathletics.org/athletes/athlete=14...,17 JUL 2001,JAM,1,"National Stadium, Kingston (JAM)",28 JUN 2024,1287
1,2,9.79,1.5,Ferdinand OMANYALA,https://worldathletics.org/athletes/athlete=14...,02 JAN 1996,KEN,1,"Nyayo National Stadium, Nairobi (KEN)",15 JUN 2024,1280
2,2,9.79,1.0,Noah LYLES,https://worldathletics.org/athletes/athlete=14...,18 JUL 1997,USA,1,"Stade de France, Paris (FRA)",04 AUG 2024,1280
3,4,9.81,1.0,Fred KERLEY,https://worldathletics.org/athletes/athlete=14...,07 MAY 1995,USA,3,"Stade de France, Paris (FRA)",04 AUG 2024,1273
4,4,9.81,0.7,Oblique SEVILLE,https://worldathletics.org/athletes/athlete=14...,16 MAR 2001,JAM,1sf1,"Stade de France, Paris (FRA)",04 AUG 2024,1273
5,6,9.82,1.0,Akani SIMBINE,https://worldathletics.org/athletes/athlete=14...,21 SEP 1993,RSA,4,"Stade de France, Paris (FRA)",04 AUG 2024,1269
6,7,9.85,1.0,Lamont Marcell JACOBS,https://worldathletics.org/athletes/athlete=14...,26 SEP 1994,ITA,5,"Stade de France, Paris (FRA)",04 AUG 2024,1259
7,8,9.86,1.5,Christian COLEMAN,https://worldathletics.org/athletes/athlete=14...,06 MAR 1996,USA,1sf3,"Hayward Field, Eugene, OR (USA)",23 JUN 2024,1255
8,8,9.86,1.9,Benjamin RICHARDSON,https://worldathletics.org/athletes/athlete=14...,19 DEC 2003,RSA,1f1,"Stade de La Charrière, La Chaux-de-Fonds (SUI)",14 JUL 2024,1255
9,8,9.86,1.0,Letsile TEBOGO,https://worldathletics.org/athletes/athlete=14...,07 JUN 2003,BOT,6,"Stade de France, Paris (FRA)",04 AUG 2024,1255


In [9]:
athletes_url = df["Profile URL"].to_list()

In [13]:
def scrape_100m_by_year(profile_url):
    options = Options()
    options.add_argument("--headless")
    options.add_argument("--start-maximized")
    driver = webdriver.Chrome(options=options)
    wait = WebDriverWait(driver, 20)

    driver.get(profile_url)

    wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(text(), 'STATISTICS')]"))).click()
    time.sleep(1)
    wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(text(), 'Results')]"))).click()
    time.sleep(2)

    all_data = []

    default_year = driver.find_elements(By.CSS_SELECTOR, "div.athletesSelectInput__control")[1].text.strip()
    print(f"📍 Année affichée par défaut : {default_year}")

    def extract_rows(current_year):
        try:
            main_rows = driver.find_elements(By.CSS_SELECTOR, "tbody.profileStatistics_tableBody__1w5O9 tr[role='button']")
            print(f"🔁 Scraping ({current_year}) : {len(main_rows)} lignes principales")

            for row in main_rows:
                try:
                    cols = row.find_elements(By.TAG_NAME, "td")
                    if len(cols) >= 4:
                        discipline = cols[0].text.strip()
                        if discipline.lower() == "100 metres":
                            mark = cols[1].text.strip()
                            date = cols[2].text.strip()
                            competition = cols[3].text.strip()

                            try:
                                driver.execute_script("arguments[0].scrollIntoView(true);", row)
                                row.click()
                                time.sleep(0.6)

                                detail_row = row.find_element(By.XPATH, "./following-sibling::tr[1]")
                                detail_cell = detail_row.find_element(By.TAG_NAME, "td")
                                lines = detail_cell.text.strip().splitlines()
                                info_dict = dict(zip(lines[::2], lines[1::2]))

                                country = info_dict.get("Country")
                                resultscore = info_dict.get("Resultscore")
                                wind = info_dict.get("Wind")
                                category = info_dict.get("Category")
                                race = info_dict.get("Race")
                                place = info_dict.get("Place")

                                all_data.append([
                                    discipline, mark, date, competition, current_year,
                                    country, resultscore, wind, category, race, place
                                ])
                            except Exception as e:
                                print(f"⚠️ Détails non trouvés : {e}")
                                all_data.append([
                                    discipline, mark, date, competition, current_year,
                                    None, None, None, None, None, None
                                ])
                except Exception as row_error:
                    print(f"⚠️ Ligne ignorée ({current_year}) : {row_error}")
        except Exception as outer_e:
            print(f"❌ Échec lecture lignes pour {current_year} : {outer_e}")

    extract_rows(default_year)

    try:
        all_menus = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "div.athletesSelectInput__control")))
        year_menu = all_menus[1]
        year_menu.click()
        time.sleep(1)

        year_options = wait.until(EC.presence_of_all_elements_located(
            (By.CSS_SELECTOR, "div.athletesSelectInput__menu div.athletesSelectInput__option")))
        years = [opt.text.strip() for opt in year_options if opt.text.strip().isdigit() and opt.text.strip() != default_year]
        print(f"📅 Années disponibles : {years}")
    except Exception as e:
        print(f"⚠️ Erreur chargement années : {e}")
        years = []

    for year in years:
        print(f"\n➡️ Traitement de l'année : {year}")
        try:
            all_menus = driver.find_elements(By.CSS_SELECTOR, "div.athletesSelectInput__control")
            year_menu = all_menus[1]
            year_menu.click()
            time.sleep(1)

            year_option = wait.until(EC.element_to_be_clickable(
                (By.XPATH, f"//div[contains(@class, 'athletesSelectInput__option') and text()='{year}']")))
            driver.execute_script("arguments[0].click();", year_option)
            time.sleep(2.5)

            extract_rows(year)

        except Exception as e:
            print(f"⚠️ Erreur pour l'année {year} : {e}")
            continue

    driver.quit()

    df = pd.DataFrame(all_data, columns=[
        "Discipline", "Mark", "Date", "Competition", "Year",
        "Country", "ResultScore", "Wind", "Category", "Race", "Place"
    ])

    df["Date"] = pd.to_datetime(df["Date"], errors="coerce")
    df["Year"] = df["Date"].dt.year.fillna(df["Year"])
    return df

In [16]:
# 🔁 Scraping de tous les athlètes de la liste
df_all = pd.DataFrame()

for i, athlete in enumerate(athletes_url):
    print(f"\n🏃‍♂️ [{i+1}/{len(athletes_url)}] Scraping : {athlete}")
    
    try:
        df_athlete = scrape_100m_by_year(athlete)
        df_athlete["AthleteURL"] = athlete
        df_all = pd.concat([df_all, df_athlete], ignore_index=True)
    except Exception as e:
        print(f"❌ Erreur pour {athlete} : {e}")
        continue

# ✅ Résumé final
print(f"\n✅ Scraping terminé pour {len(athletes_url)} athlètes — {len(df_all)} lignes collectées.")


🏃‍♂️ [1/16] Scraping : https://worldathletics.org/athletes/athlete=14738009
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 13 lignes principales
📅 Années disponibles : ['2025', '2024', '2023', '2022', '2021', '2020', '2019', '2018', '2017', '2016']

➡️ Traitement de l'année : 2025
⚠️ Erreur pour l'année 2025 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandleVerifier [0x00007FF7A974847F+222255]
	GetHandleV

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [2/16] Scraping : https://worldathletics.org/athletes/athlete=14747153
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 15 lignes principales
📅 Années disponibles : ['2025', '2024', '2023', '2022', '2021', '2020', '2019', '2017', '2016']

➡️ Traitement de l'année : 2025
⚠️ Erreur pour l'année 2025 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandleVerifier [0x00007FF7A974847F+222255]
	GetHandleVerifier 

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [3/16] Scraping : https://worldathletics.org/athletes/athlete=14536762
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 14 lignes principales
📅 Années disponibles : ['2025', '2024', '2023', '2022', '2021', '2020', '2019', '2018', '2017', '2016', '2015', '2014', '2013', '2012']

➡️ Traitement de l'année : 2025
⚠️ Erreur pour l'année 2025 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandleVerifier [0x0000

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [4/16] Scraping : https://worldathletics.org/athletes/athlete=14504382
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 14 lignes principales
📅 Années disponibles : ['2025', '2024', '2023', '2022', '2021', '2020', '2019', '2018', '2017', '2016', '2015', '2014', '2013', '2012', '2011']

➡️ Traitement de l'année : 2025
⚠️ Erreur pour l'année 2025 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandleVerifier

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [5/16] Scraping : https://worldathletics.org/athletes/athlete=14737998
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 14 lignes principales
📅 Années disponibles : ['2025', '2024', '2023', '2022', '2021', '2020', '2019', '2018', '2017', '2016']

➡️ Traitement de l'année : 2025
⚠️ Erreur pour l'année 2025 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandleVerifier [0x00007FF7A974847F+222255]
	GetHandleV

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [6/16] Scraping : https://worldathletics.org/athletes/athlete=14417763
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 17 lignes principales
📅 Années disponibles : ['2025', '2024', '2023', '2022', '2021', '2020', '2019', '2018', '2017', '2016', '2015', '2014', '2013', '2012', '2011', '2010']

➡️ Traitement de l'année : 2025
⚠️ Erreur pour l'année 2025 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandle

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [7/16] Scraping : https://worldathletics.org/athletes/athlete=14453864
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 12 lignes principales
📅 Années disponibles : ['2025', '2024', '2023', '2022', '2021', '2020', '2019', '2018', '2017', '2016', '2015', '2014', '2013', '2012', '2011']

➡️ Traitement de l'année : 2025
⚠️ Erreur pour l'année 2025 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandleVerifier

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [8/16] Scraping : https://worldathletics.org/athletes/athlete=14541956
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 11 lignes principales
📅 Années disponibles : ['2025', '2024', '2023', '2022', '2020', '2019', '2018', '2017', '2016', '2015', '2014', '2013', '2012']

➡️ Traitement de l'année : 2025
⚠️ Erreur pour l'année 2025 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandleVerifier [0x00007FF7A974

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [9/16] Scraping : https://worldathletics.org/athletes/athlete=14888283
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 15 lignes principales
📅 Années disponibles : ['2025', '2024', '2023', '2022', '2021', '2020', '2019']

➡️ Traitement de l'année : 2025
⚠️ Erreur pour l'année 2025 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandleVerifier [0x00007FF7A974847F+222255]
	GetHandleVerifier [0x00007FF7A972D

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [10/16] Scraping : https://worldathletics.org/athletes/athlete=14883897
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 16 lignes principales
📅 Années disponibles : ['2025', '2024', '2023', '2022', '2021', '2020', '2019']

➡️ Traitement de l'année : 2025
⚠️ Erreur pour l'année 2025 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandleVerifier [0x00007FF7A974847F+222255]
	GetHandleVerifier [0x00007FF7A972

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [11/16] Scraping : https://worldathletics.org/athletes/athlete=14715244
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 12 lignes principales
📅 Années disponibles : ['2025', '2024', '2023', '2022', '2021', '2020', '2019', '2018', '2017', '2016', '2015']

➡️ Traitement de l'année : 2025
⚠️ Erreur pour l'année 2025 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandleVerifier [0x00007FF7A974847F+222255]
	G

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [12/16] Scraping : https://worldathletics.org/athletes/athlete=14638971
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 18 lignes principales
📅 Années disponibles : ['2025', '2024', '2023', '2022', '2021', '2020', '2019', '2018', '2017', '2015', '2014']

➡️ Traitement de l'année : 2025
⚠️ Erreur pour l'année 2025 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandleVerifier [0x00007FF7A974847F+222255]
	G

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [13/16] Scraping : https://worldathletics.org/athletes/athlete=14835237
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 15 lignes principales
📅 Années disponibles : ['2025', '2024', '2023', '2022', '2021', '2020', '2019', '2018']

➡️ Traitement de l'année : 2025
⚠️ Erreur pour l'année 2025 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandleVerifier [0x00007FF7A974847F+222255]
	GetHandleVerifier [0x0000

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [14/16] Scraping : https://worldathletics.org/athletes/athlete=14375111
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 41 lignes principales
📅 Années disponibles : ['2024', '2023', '2022', '2021', '2019', '2018', '2017', '2016', '2015', '2014', '2013', '2012', '2011', '2010', '2009', '2008']

➡️ Traitement de l'année : 2024
⚠️ Erreur pour l'année 2024 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandl

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [15/16] Scraping : https://worldathletics.org/athletes/united-states/christian-miller-15033045
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 22 lignes principales
📅 Années disponibles : ['2024', '2023', '2022']

➡️ Traitement de l'année : 2024
⚠️ Erreur pour l'année 2024 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandleVerifier [0x00007FF7A974847F+222255]
	GetHandleVerifier [0x00007FF7A972D2D4+1112

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")



🏃‍♂️ [16/16] Scraping : https://worldathletics.org/athletes/athlete=14465376
📍 Année affichée par défaut : Select...
🔁 Scraping (Select...) : 12 lignes principales
📅 Années disponibles : ['2025', '2024', '2023', '2022', '2021', '2020', '2019', '2018', '2017', '2016', '2015', '2014', '2013', '2012', '2011', '2010']

➡️ Traitement de l'année : 2025
⚠️ Erreur pour l'année 2025 : Message: 
Stacktrace:
	GetHandleVerifier [0x00007FF7A9725355+78597]
	GetHandleVerifier [0x00007FF7A97253B0+78688]
	(No symbol) [0x00007FF7A94D91AA]
	(No symbol) [0x00007FF7A952F149]
	(No symbol) [0x00007FF7A952F3FC]
	(No symbol) [0x00007FF7A9582467]
	(No symbol) [0x00007FF7A955712F]
	(No symbol) [0x00007FF7A957F2BB]
	(No symbol) [0x00007FF7A9556EC3]
	(No symbol) [0x00007FF7A95203F8]
	(No symbol) [0x00007FF7A9521163]
	GetHandleVerifier [0x00007FF7A99CEF0D+2870973]
	GetHandleVerifier [0x00007FF7A99C96B8+2848360]
	GetHandleVerifier [0x00007FF7A99E6993+2967875]
	GetHandleVerifier [0x00007FF7A974019A+188746]
	GetHandl

  df["Date"] = pd.to_datetime(df["Date"], errors="coerce")


In [17]:
df_all.head()

Unnamed: 0,Discipline,Mark,Date,Competition,Year,Country,ResultScore,Wind,Category,Race,Place,AthleteURL
0,100 Metres,9.8,2024-08-04,"The XXXIII Olympic Games, Stade de France, Paris",2024,FRA,1276,0.5,OW,SF3,1,https://worldathletics.org/athletes/athlete=14...
1,100 Metres,9.79,2024-08-04,"The XXXIII Olympic Games, Stade de France, Paris",2024,FRA,1280,1.0,OW,F,2,https://worldathletics.org/athletes/athlete=14...
2,100 Metres,10.0,2024-08-03,"The XXXIII Olympic Games, Stade de France, Paris",2024,FRA,1206,0.6,OW,H1,1,https://worldathletics.org/athletes/athlete=14...
3,100 Metres,9.91,2024-07-09,Gyulai István Memorial - Hungarian Athletics G...,2024,HUN,1241,-0.6,A,F,1,https://worldathletics.org/athletes/athlete=14...
4,100 Metres,9.84,2024-06-28,"Jamaican Championships, National Stadium, King...",2024,JAM,1262,0.6,B,SF1,1,https://worldathletics.org/athletes/athlete=14...


In [18]:
# Préparation du DataFrame df (infos athlètes)
df = df.rename(columns={"Profile URL": "AthleteURL"})
df_infos = df[["AthleteURL", "Name", "DOB", "Nation"]]

# Fusion avec df_all
df_merged = df_all.merge(df_infos, on="AthleteURL", how="left")

df_merged.head()

Unnamed: 0,Discipline,Mark,Date,Competition,Year,Country,ResultScore,Wind,Category,Race,Place,AthleteURL,Name,DOB,Nation
0,100 Metres,9.8,2024-08-04,"The XXXIII Olympic Games, Stade de France, Paris",2024,FRA,1276,0.5,OW,SF3,1,https://worldathletics.org/athletes/athlete=14...,Kishane THOMPSON,17 JUL 2001,JAM
1,100 Metres,9.79,2024-08-04,"The XXXIII Olympic Games, Stade de France, Paris",2024,FRA,1280,1.0,OW,F,2,https://worldathletics.org/athletes/athlete=14...,Kishane THOMPSON,17 JUL 2001,JAM
2,100 Metres,10.0,2024-08-03,"The XXXIII Olympic Games, Stade de France, Paris",2024,FRA,1206,0.6,OW,H1,1,https://worldathletics.org/athletes/athlete=14...,Kishane THOMPSON,17 JUL 2001,JAM
3,100 Metres,9.91,2024-07-09,Gyulai István Memorial - Hungarian Athletics G...,2024,HUN,1241,-0.6,A,F,1,https://worldathletics.org/athletes/athlete=14...,Kishane THOMPSON,17 JUL 2001,JAM
4,100 Metres,9.84,2024-06-28,"Jamaican Championships, National Stadium, King...",2024,JAM,1262,0.6,B,SF1,1,https://worldathletics.org/athletes/athlete=14...,Kishane THOMPSON,17 JUL 2001,JAM


In [19]:
print("Répertoire de travail courant :", os.getcwd())

Répertoire de travail courant : C:\Users\antoi\Sprinters_War\Test_Prevision_2025\notebooks


In [20]:
df_merged.to_csv(f"../data/top16_race.csv", index=False)