# F1 Circuits Informations

This code focused on extracting the available information about F1 circuits, both current and historical.<br>
This work partially relies on the information available through the package `fastf1`, and also on the methods and functions of the library `geopy` to extract location specific information.

## Importing packages

In [63]:
import pandas as pd
import datetime as dt
import string
import numpy as np
import hashlib
import os
from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut
import fastf1

## Retrieving storical F1 circuits Information

Retrieving circuit information from Wikipedia page.<BR>
Selecting only the table related to the circuits.

In [64]:
url_circuits="https://en.wikipedia.org/wiki/List_of_Formula_One_circuits"
os.chdir(r'C:\Users\Lenovo\Documents\PERSONALI\F1_CHALLENGE')

def read_tables(url):
    tables=pd.read_html(url)
    return tables

tables=read_tables(url_circuits)
for i, table in enumerate(tables):
    print(f'The table: {i}, contains:\n',table.head(10))
    

f1_circuits_table=tables[2]
# print(f1_circuits_table)

The table: 0, contains:
                                          Formula One
0  Current season 2024 Formula One World Champion...
1  Related articles History of Formula One Formul...
2  Lists Drivers (GP winnersSprint winnersPolesit...
3   Records Drivers Constructors Engines Tyres Races
4  Organisations FIA FIA World Motor Sport Counci...
5                                                vte
The table: 1, contains:
    0                                       1
0  *  Current circuits (for the 2024 season)
The table: 2, contains:
                                           Circuit  Map            Type  \
0                         Adelaide Street Circuit  NaN  Street circuit   
1                                Ain-Diab Circuit  NaN    Road circuit   
2                    Aintree Motor Racing Circuit  NaN    Road circuit   
3                           Albert Park Circuit *  NaN  Street circuit   
4                   Algarve International Circuit  NaN    Race circuit   
5                   

## Cleaning circuits table

Adding the columns):
* InCalendar: which identifies those circuits which are in the scheduled event for the current year.
*  CleanCircuit: which is a cleaned version of the circuit's name.
<br>

It is dropped the column: Map.

In [65]:
#Dropping column 'Map'
f1_circuits_table.drop(columns=['Map'], inplace=True)

# print(f1_circuits_table.dtypes)
# print(f1_circuits_table['Season(s)'][1][-4:]=='1958')

current_year=dt.datetime.now().year
punctuations=string.punctuation

f1_circuits_table['InCalendar']= None
f1_circuits_table['CleanCircuit']= None

#The race in calendar this year are marked by a final '*'
for index, row in f1_circuits_table.iterrows():
    clean_name = ''.join(char for char in row['Circuit'] if char not in punctuations).strip()

    if row['Circuit'][-1] == '*':
        f1_circuits_table.at[index, 'InCalendar'] = True
    else:
        f1_circuits_table.at[index, 'InCalendar'] = False
        if not row['Season(s)'][-3:].isdigit() and row['Season(s)'][-7:-3] == str(current_year):
            f1_circuits_table.at[index, 'InCalendar'] = True
        elif row['Season(s)'][-4:] == str(current_year):
            f1_circuits_table.at[index, 'InCalendar'] = True

    f1_circuits_table.at[index, 'CleanCircuit'] = clean_name

We can later check for the rightness of the events in calendar.

In [5]:
# straights_table=tables[0]
# straights_table.columns=straights_table.iloc[0]
# straights_table=straights_table.drop(0).reset_index(drop=True)

# # print(straights_table)

Cleaning the value in the column `Length`, so that it is reported a type float for the circuit length.

In [66]:
f1_circuits_table['Length(Km)'] = f1_circuits_table['Last length used'].str.extract(r'(\d+\.\d+)\s*km')[0]

f1_circuits_table['Length(Km)']=f1_circuits_table['Length(Km)'].astype(float)
# print(f1_circuits_table[['Length(Km)']].head())

## Enriching the Circuit Table

We later hashed the column CleanCircuit, so to build a unique circuit_id.

In [67]:
f1_circuits_table['circuit_id'] = f1_circuits_table['CleanCircuit'].apply(lambda x: hashlib.md5(x.encode()).hexdigest())
# print(f1_circuits_table['circuit_id'])

Extracting the dataset with the information concerning the latitute and longitude

In [68]:
geolocator = Nominatim(user_agent="circuit_location_extractor")

# Function to get approximate latitude and longitude from City and country
def get_lat_lon(city, country):
    """
    Get the latitude and longitude through geocoding the City and the Country
    specific for each circuit
    """
    query = f"{city}, {country}"
    try:
        location = geolocator.geocode(query, timeout=10)
        if location:
            return location.latitude, location.longitude
        else:
            return None, None
    except GeocoderTimedOut:
        time.sleep(1)
        return get_lat_lon(city, country)


f1_circuits_table["Latitude"] = None
f1_circuits_table["Longitude"] = None

for index, row in f1_circuits_table.iterrows():
    if row["Location"] and row["Country"]:
        lat, lon = get_lat_lon(row["Location"], row["Country"])
        f1_circuits_table.at[index, "Latitude"] = lat
        f1_circuits_table.at[index, "Longitude"] = lon

We add the percentage of the circuit that is done _full throttle_ so in full acceleration. This parts of the track represents the _straights_.

In [69]:
# Define the data for full-throttle percentages
full_throttle_data = {
    'CleanCircuit': [
        'Albert Park Circuit', 'Autódromo Hermanos Rodríguez', 'Autodromo Internazionale Enzo e Dino Ferrari',
        'Autódromo José Carlos Pace', 'Autodromo Nazionale di Monza', 'Bahrain International Circuit',
        'Baku City Circuit', 'Circuit de Barcelona-Catalunya', 'Circuit de Monaco', 'Circuit de Spa-Francorchamps',
        'Circuit Gilles-Villeneuve', 'Circuit of the Americas', 'Circuit Zandvoort', 'Hungaroring',
        'Jeddah Corniche Circuit', 'Las Vegas Strip Circuit', 'Lusail International Circuit',
        'Marina Bay Street Circuit', 'Miami International Autodrome', 'Red Bull Ring',
        'Shanghai International Circuit', 'Silverstone Circuit', 'Suzuka International Racing Course',
        'Yas Marina Circuit'
    ],
    'Full_throttle_percentage': [
        0.60, 0.65, 0.70, 0.70, 0.80, 0.65, 0.50, 0.65, 0.40, 0.65,
        0.60, 0.65, 0.60, 0.60, 0.60, 0.60, 0.60, 0.50, 0.60, 0.65,
        0.65, 0.65, 0.60, 0.60
    ]
}

full_throttle_df = pd.DataFrame(full_throttle_data)

f1_circuits_table = f1_circuits_table.merge(full_throttle_df, on='CleanCircuit', how='left')

# print(f1_circuits_table[['CleanCircuit', 'Full_throttle_percentage']])

We are going to make a temporary saving of the table.

In [70]:
now=dt.datetime.now()
prefix=str(now.year)+str(now.month)
f1_circuits_table.to_excel(f'temp_f1_circuits_table.xlsx')
# f1_circuits_table=pd.read_excel("temp_f1_circuits_table.xlsx")

## Recovering more Track-Specific information from fastf1

Recover the number of fast and slow corners from each track in the current calendar.<br>
In the following block we are retrieving all the events for the current year, and filtering for the desired events.

In [71]:
current_year=dt.datetime.now().year
#there is a current year already above

schedule_current_year=fastf1.get_event_schedule(current_year)
schedule_past_year=fastf1.get_event_schedule(current_year-1)
schedule_past_two_year=fastf1.get_event_schedule(current_year-2)

schedule_current_year=schedule_current_year[schedule_current_year['EventFormat']!='testing']
schedule_past_year=schedule_past_year[schedule_past_year['EventFormat']!='testing']
schedule_past_two_year=schedule_past_two_year[schedule_past_two_year['EventFormat']!='testing']

race_schedule=pd.concat([schedule_current_year,schedule_past_year,schedule_past_two_year],ignore_index=True)
race_schedule['Year']=race_schedule['EventDate'].dt.year
# print(race_schedule)

Given the events we can now determine which of the circuits host multiple race events through out the current year. For this reason we define the column `MultipleRaces`.

In [72]:
multiple_circuits = f1_circuits_table[f1_circuits_table['InCalendar'] == 1]

country_counts = multiple_circuits['Country'].value_counts()

f1_circuits_table['MultipleRaces'] = f1_circuits_table.apply(
    lambda row: 1 if country_counts.get(row['Country'], 0) > 1 and row['InCalendar'] == 1 else 0,
    axis=1
)

Before retrieving and classifying the types and number of curves present within each circuit we save the data.

In [73]:
f1_circuits_table.to_excel('temp_f1_circuits_table.xlsx',index=False)

We are going to count and add all the number of corners. They are going to be classified as Sharp, Mid, and Open. Moreover we are going to add the average lap time for the Q1 and the fastest lap for the most recent session.

In [74]:
current_day=dt.datetime.today()

def classify_corner(angle):
    if abs(angle) > 90:
        return 3#is for SharpCorner
    elif abs(angle) >= 45:
        return 2#'Mid'
    else:
        return 1#'Open'

schedule_current_year['NumSharpCorner']=None
schedule_current_year['NumMidCorner']=None
schedule_current_year['NumOpenCorner']=None
schedule_current_year['AverageQ1Time']=None
schedule_current_year['RecentFastestLap']=None

for index,row in schedule_current_year.iterrows():
    if row['EventDate']>current_day:
        if row['EventName'] in schedule_past_year['EventName']:
            session=fastf1.get_session(current_year-1,row['EventName'],'Q')
        elif row['EventName'] in schedule_past_two_year['EventName']:
            session=fastf1.get_session(current_year-2,row['EventName'],'Q')
        else:
            schedule_current_year.at[index,'NumSharpCorner']=-1
            schedule_current_year.at[index,'NumMidCorner']=-1
            schedule_current_year.at[index,'NumOpenCorner']=-1
    else:
        session=fastf1.get_session(current_year,row['EventName'], 'Q')
        
    session.load()
    circuit=session.get_circuit_info()
    corners=circuit.corners
    corners['Type']=corners['Angle'].apply(classify_corner)
    
    schedule_current_year.at[index,'NumSharpCorner']=len(corners[corners['Type']==3])
    schedule_current_year.at[index,'NumMidCorner']=len(corners[corners['Type']==2])
    schedule_current_year.at[index,'NumOpenCorner']=len(corners[corners['Type']==1])

    session_results=session.results
    schedule_current_year.at[index,'AverageQ1Time']=session_results['Q1'][1:-1].mean().total_seconds()
    
    fastest_lap=session.laps.pick_fastest()
    schedule_current_year.at[index,'RecentFastestLap']=fastest_lap['LapTime'].total_seconds()

core           INFO 	Loading data for Bahrain Grand Prix - Qualifying [v3.4.3]
INFO:fastf1.fastf1.core:Loading data for Bahrain Grand Prix - Qualifying [v3.4.3]
req            INFO 	Using cached data for session_info
INFO:fastf1.fastf1.req:Using cached data for session_info
req            INFO 	Using cached data for driver_info
INFO:fastf1.fastf1.req:Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
INFO:fastf1.fastf1.req:Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
INFO:fastf1.fastf1.req:Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
INFO:fastf1.fastf1.req:Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
INFO:fastf1.fastf1.req:Using cached data for timing_app_data
core           INFO 	Processing timing data...
INFO:fastf1.fastf1.core:Processing timing data...
req         

Converting some specific columns and selecting the columns that are going to be exported in Excel.

In [42]:
# print(schedule_current_year.dtypes)
# print(schedule_current_year['EventDate'])

In [75]:
schedule_current_year['EventDate'] = pd.to_datetime(schedule_current_year['EventDate'])
#schedule_current_year['EventDate']=schedule_current_year['EventDate'].dt.date
# schedule_current_year['AverageQ1Time']=pd.to_timedelta(schedule_current_year['AverageQ1Time'])
# schedule_current_year['RecentFastestLap']=pd.to_timedelta(schedule_current_year['RecentFastestLap'])

red_schedule_current_year=schedule_current_year[['RoundNumber','Country','Location','OfficialEventName','EventDate','EventName','EventFormat','NumSharpCorner','NumMidCorner','NumOpenCorner','AverageQ1Time','RecentFastestLap']]
red_schedule_current_year.to_excel('schedule_current_year.xlsx',sheet_name='Schedule',index=False)

We are now going to read the data that we need to makes our elaboration

In [76]:
#schedule_current_year.to_excel('schedule_current_year.xlsx')
schedule_current_year=pd.read_excel('schedule_current_year.xlsx')

## Merging the two tables

We are now going to merge the table `f1_circuits_table` and the table `schedule_current_year`, so to obtain the track specific information that we have retrieved by `fastf1` and merging them with those already retrieved.

While merging the two tables we are going to differentiate between those with `MultipleRaces` and the others, because among those who did not have `MultipleRaces` within one country there was no difference between `f1_circuits_table` and `schedule_current_year`, while in others this was the case.

In [77]:
multiple_races = f1_circuits_table[f1_circuits_table['MultipleRaces'] == 1]
multiple_races=multiple_races[multiple_races['InCalendar']==1]
single_races = f1_circuits_table[f1_circuits_table['MultipleRaces'] == 0]
single_races=single_races[single_races['InCalendar']==1]

merged_multiple = pd.merge(multiple_races, schedule_current_year, on=['Country', 'Location'], how='left')

merged_single = pd.merge(single_races, schedule_current_year, on='Country', how='left')

merged_result = pd.concat([merged_multiple, merged_single], ignore_index=True)
# print(merged_result)

It is computed the length of the circuit and of its straights.

In [78]:
merged_result['Length(m)']=merged_result['Length(Km)']*1000
merged_result['Straight_length(m)']=merged_result['Length(m)']*merged_result['Full_throttle_percentage']

In [79]:
# print(merged_result.head)

<bound method NDFrame.head of                                            Circuit            Type  \
0   Autodromo Internazionale Enzo e Dino Ferrari *    Race circuit   
1                   Autodromo Nazionale di Monza *    Race circuit   
2                        Circuit of the Americas *    Race circuit   
3                        Las Vegas Strip Circuit *  Street circuit   
4                  Miami International Autodrome *  Street circuit   
5                            Albert Park Circuit *  Street circuit   
6                   Autódromo Hermanos Rodríguez *    Race circuit   
7                     Autodromo José Carlos Pace *    Race circuit   
8                  Bahrain International Circuit *    Race circuit   
9                              Baku City Circuit *  Street circuit   
10                Circuit de Barcelona-Catalunya *    Race circuit   
11                             Circuit de Monaco *  Street circuit   
12                  Circuit de Spa-Francorchamps *    Race c

## Speed Analysis

In order to have an understanding of the speed through out different circuits, and different corners of these circuits it was computed the average speed around the circuts and across corners, with different fixed weight for open, mid, and sharp corners.

In [80]:
def analyze_circuit_speeds_detailed(df):
    """
    Analyze and display speed calculations for each circuit individually,
    showing the detailed computation process for each line.
    
    Returns:
    pandas.DataFrame: Detailed analysis results for each circuit
    """
    # Create a copy to avoid modifying original data
    analysis = df.copy()
    
    # Convert lap times from string to float
    analysis['RecentFastestLap'] = pd.to_numeric(analysis['RecentFastestLap'], errors='coerce')
    analysis['AverageQ1Time'] = pd.to_numeric(analysis['AverageQ1Time'], errors='coerce')
    
    def calculate_detailed_speeds(circuit_data):
        """Calculate detailed speeds for a single circuit"""
        # Extract relevant data
        circuit_name = circuit_data['Circuit']
        length_km = circuit_data['Length(Km)']
        fastest_lap = circuit_data['RecentFastestLap']
        sharp_corners = circuit_data['NumSharpCorner']
        mid_corners = circuit_data['NumMidCorner']
        open_corners = circuit_data['NumOpenCorner']
        
        # Skip if missing crucial data
        if pd.isna(fastest_lap) or fastest_lap == 0:
            return pd.Series({
                'Circuit': circuit_name,
                'avg_speed_kmh': np.nan,
                'Sharp_Corner_Speed_kmh': np.nan,
                'Mid_Corner_Speed_kmh': np.nan,
                'Open_Corner_Speed_kmh': np.nan,
                'Calculation_Details': 'Missing or invalid lap time data'
            })
        
        # Calculate base speed
        length_meters = length_km * 1000
        avg_speed_ms = length_meters / fastest_lap
        avg_speed_kmh = avg_speed_ms * 3.6
        
        # Corner coefficients
        sharp_coeff = 0.35
        mid_coeff = 0.6
        open_coeff = 0.85
        
        # Calculate corner speeds
        sharp_speed_kmh = avg_speed_kmh * sharp_coeff
        mid_speed_kmh = avg_speed_kmh * mid_coeff
        open_speed_kmh = avg_speed_kmh * open_coeff
        
        # Create printing calculation string
        calculation_details = (
            f"Circuit: {circuit_name}\n"
            f"Length: {length_km:.3f} km\n"
            f"Fastest Lap: {fastest_lap:.3f} s\n"
            f"Average Speed: {length_meters:.0f}m ÷ {fastest_lap:.3f}s = {avg_speed_ms:.2f} m/s = {avg_speed_kmh:.2f} km/h\n"
            f"Corner Distribution: {sharp_corners} sharp, {mid_corners} mid, {open_corners} open\n"
            f"Sharp Corners: {avg_speed_kmh:.2f} × {sharp_coeff} = {sharp_speed_kmh:.2f} km/h\n"
            f"Mid Corners: {avg_speed_kmh:.2f} × {mid_coeff} = {mid_speed_kmh:.2f} km/h\n"
            f"Open Corners: {avg_speed_kmh:.2f} × {open_coeff} = {open_speed_kmh:.2f} km/h"
        )
        
        return pd.Series({
            'Circuit': circuit_name,
            'avg_speed_kmh': avg_speed_kmh,
            'Sharp_Corner_Speed_kmh': sharp_speed_kmh,
            'Mid_Corner_Speed_kmh': mid_speed_kmh,
            'Open_Corner_Speed_kmh': open_speed_kmh,
            'Calculation_Details': calculation_details
        })
    
    # Apply calculations to each row
    detailed_results = analysis.apply(calculate_detailed_speeds, axis=1)
    
    return detailed_results

# Example usage:
detailed_analysis = analyze_circuit_speeds_detailed(merged_result)

# Print detailed calculations for each circuit
for idx, row in detailed_analysis.iterrows():
    print("\n" + "="*80)
    print(row['Calculation_Details'])
    print("="*80)


Circuit: Autodromo Internazionale Enzo e Dino Ferrari *
Length: 4.909 km
Fastest Lap: 74.746 s
Average Speed: 4909m ÷ 74.746s = 65.68 m/s = 236.43 km/h
Corner Distribution: 11.0 sharp, 4.0 mid, 4.0 open
Sharp Corners: 236.43 × 0.35 = 82.75 km/h
Mid Corners: 236.43 × 0.6 = 141.86 km/h
Open Corners: 236.43 × 0.85 = 200.97 km/h

Circuit: Autodromo Nazionale di Monza *
Length: 5.793 km
Fastest Lap: 79.327 s
Average Speed: 5793m ÷ 79.327s = 73.03 m/s = 262.90 km/h
Corner Distribution: 6.0 sharp, 2.0 mid, 3.0 open
Sharp Corners: 262.90 × 0.35 = 92.01 km/h
Mid Corners: 262.90 × 0.6 = 157.74 km/h
Open Corners: 262.90 × 0.85 = 223.46 km/h

Circuit: Circuit of the Americas *
Length: 5.513 km
Fastest Lap: 92.330 s
Average Speed: 5513m ÷ 92.330s = 59.71 m/s = 214.96 km/h
Corner Distribution: 11.0 sharp, 4.0 mid, 5.0 open
Sharp Corners: 214.96 × 0.35 = 75.23 km/h
Mid Corners: 214.96 × 0.6 = 128.97 km/h
Open Corners: 214.96 × 0.85 = 182.71 km/h

Missing or invalid lap time data

Missing or invalid 

Calling the function.

In [81]:
detailed_analysis = analyze_circuit_speeds_detailed(merged_result)
# print(detailed_analysis)

                                           Circuit  avg_speed_kmh  \
0   Autodromo Internazionale Enzo e Dino Ferrari *     236.432719   
1                   Autodromo Nazionale di Monza *     262.896618   
2                        Circuit of the Americas *     214.955053   
3                        Las Vegas Strip Circuit *            NaN   
4                  Miami International Autodrome *            NaN   
5                            Albert Park Circuit *     250.290456   
6                   Autódromo Hermanos Rodríguez *     204.018645   
7                     Autodromo José Carlos Pace *     185.988850   
8                  Bahrain International Circuit *     218.507262   
9                              Baku City Circuit *     213.197849   
10                Circuit de Barcelona-Catalunya *     234.862642   
11                             Circuit de Monaco *     170.957734   
12                  Circuit de Spa-Francorchamps *     222.822754   
13                     Circuit Gil

Merging the results just found with the `merged_result`.

In [82]:
merged_result = pd.merge(merged_result, detailed_analysis, on='Circuit', how='left')

Saving the `merged_result` and `f1_circuits_table`, into their final state.

In [83]:
merged_result.to_excel('Merged_result.xlsx',sheet_name='Circuit',index=False)

In [84]:
new_column_order = [
    'circuit_id', 'Circuit', 'CleanCircuit', 'Type', 'Direction', 'Location', 
    'Country','Latitude','Longitude', 'Last length used','Length(Km)','Full_throttle_percentage', 'Turns', 'Grands Prix', 'Season(s)', 
    'Grands Prix held', 'InCalendar','MultipleRaces']

f1_circuits_table = f1_circuits_table[new_column_order]

f1_circuits_table.to_excel("f1_circuits_table.xlsx",sheet_name='Circuit_Information',index=False)

Saving a specific table necessary for subsequent clustering analysis

In [85]:
columns_to_keep = [
    'Circuit', 'CleanCircuit', 'Type', 'Latitude', 'Longitude', 
    'Full_throttle_percentage', 'Turns', 'RoundNumber', 'NumSharpCorner', 
    'NumMidCorner', 'NumOpenCorner', 'AverageQ1Time', 'RecentFastestLap', 
    'Length(m)', 'avg_speed_kmh', 'Sharp_Corner_Speed_kmh', 
    'Mid_Corner_Speed_kmh', 'Open_Corner_Speed_kmh'
]

cluster_data = merged_result[columns_to_keep]
cluster_data.to_excel('cluster_data.xlsx')