# **Race Pace Analysis for F1 FP2 Sessions**

## **Objective**
This project aims to develop a machine learning pipeline to identify **race pace laps** from FP2 sessions of Formula 1 events. Race pace laps are those that represent consistent lap times a driver can maintain over a stint, excluding qualifying-style laps or outliers.

The project involves:
- Loading and preprocessing FP2 session data for multiple F1 events.
- Engineering relevant features (e.g., lap time differences and consistency).
- Training a machine learning model to classify race pace laps.
- Applying the trained model to unseen data and filtering results for meaningful insights.


In [1]:
import matplotlib.pyplot as plt
import numpy as np
import fastf1.plotting
import seaborn as sns
import pandas as pd
import matplotlib.ticker as ticker
from fastf1 import utils
import random
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score
import joblib
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
#from sklearn.cluster import KMeans

# FastF1's default color scheme
fastf1.plotting.setup_mpl(misc_mpl_mods=False, color_scheme='fastf1' )

## **Loading FP2 Data**

### **Description**
- **Dataset**: FP2 sessions of non-sprint F1 events from the 2024 calendar.
- **Process**:
  1. **Filter Out Box and Inaccurate Laps**: Only valid laps are retained.
  2. **Transform Lap Times**: Convert lap times to seconds for numerical analysis.

In [2]:
# List of non-sprint events in the 2024 F1 calendar
non_sprint_events = [
    'Bahrain Grand Prix',
    'Saudi Arabian Grand Prix',
    'Australian Grand Prix',
    'Japanese Grand Prix',
    'Emilia Romagna Grand Prix',
    'Canadian Grand Prix',
    'Spanish Grand Prix',
    'British Grand Prix',
    'Hungarian Grand Prix',
    'Belgian Grand Prix',
    'Dutch Grand Prix',
    'Italian Grand Prix',
    'Singapore Grand Prix',
    'Mexican Grand Prix',
    'Las Vegas Grand Prix',
    'Abu Dhabi Grand Prix'
]

transformed_laps_dataframes = {}


for event in non_sprint_events:
    try:
        # Load the FP2 session
        session = fastf1.get_session(2024, event, 'FP2')
        session.load()
        
        # Filter out box laps and keep only accurate laps
        laps = session.laps.pick_wo_box()
        laps = laps[laps['IsAccurate'] == True]
        
        # Transform laps: add a new column for lap times in seconds
        transformed_laps = laps.copy()
        transformed_laps.loc[:, "LapTime (s)"] = laps["LapTime"].dt.total_seconds()
        
        # Store the transformed dataframe in the dictionary
        transformed_laps_dataframes[event] = transformed_laps
        print(f"Transformed laps for {event} FP2 loaded successfully.")
    except Exception as e:
        print(f"Failed to load data for {event}: {e}")

core           INFO 	Loading data for Bahrain Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...
req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '2', '3', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '31', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Saudi Arabian Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Usi

Transformed laps for Bahrain Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '2', '3', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '31', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Australian Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


Transformed laps for Saudi Arabian Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '2', '3', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '31', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Japanese Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


Transformed laps for Australian Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '2', '3', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '31', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Emilia Romagna Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


Transformed laps for Japanese Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '2', '3', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '31', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Canadian Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


Transformed laps for Emilia Romagna Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '2', '3', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '31', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Spanish Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


Transformed laps for Canadian Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '2', '3', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '31', '44', '55', '63', '77', '81']
core           INFO 	Loading data for British Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


Transformed laps for Spanish Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '2', '3', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '31', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Hungarian Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


Transformed laps for British Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '2', '3', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '31', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Belgian Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


Transformed laps for Hungarian Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '2', '3', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '31', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Dutch Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


Transformed laps for Belgian Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '2', '3', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '31', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Italian Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


Transformed laps for Dutch Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '3', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '31', '43', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Singapore Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


Transformed laps for Italian Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '3', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '31', '43', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Mexico City Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


Transformed laps for Singapore Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '30', '31', '43', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Las Vegas Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


Transformed laps for Mexican Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '30', '31', '43', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Abu Dhabi Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


Transformed laps for Las Vegas Grand Prix FP2 loaded successfully.


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '30', '43', '44', '55', '61', '63', '77', '81']


Transformed laps for Abu Dhabi Grand Prix FP2 loaded successfully.


## **Feature Engineering and Data Labeling**

### **Overview**
The `label_race_pace` function preprocesses the lap times for individual drivers and adds the following features:
1. **Outlier Removal**:
   - The fastest and slowest laps are removed to focus on laps representative of the driver's pace.
2. **Race Pace Labeling**:
   - Laps within a specified time range (`min_time` and `max_time`) are labeled as race pace laps (`is_race_pace = 1`).
3. **Lap Time Difference**:
   - The absolute difference between consecutive lap times is calculated.
4. **Consistency Feature**:
   - Consistency is determined based on whether the rolling range of lap times (grouped by windows) falls within a threshold (`max_diff`).

### **Key Features**
- **LapTime (s)**: Lap time in seconds.
- **LapTimeDifference**: Difference between consecutive laps.
- **Consistency**: Binary flag indicating if laps within a rolling window are consistent.

In [3]:
def label_race_pace(dataframe, driver, min_time, max_time):
    # Keep LapTime, Stint, LapNumber (and any ID you need like Driver if not implicit)
    driver_data = dataframe.pick_drivers(driver)[[
        'LapTime (s)', 'Stint', 'LapNumber'
    ]].copy()

    if driver_data.empty:
        print(f"No data for driver: {driver}")
        return None

    # Remove fastest and slowest laps
    n_to_remove = 1
    driver_data = driver_data.sort_values('LapTime (s)').iloc[n_to_remove:-n_to_remove]
    driver_data = driver_data.reset_index(drop=True)

    # ... existing logic to compute LapTimeDifference, Consistency, is_race_pace, etc.
    # Make sure when you create those new columns you do it on driver_data and do not drop 'Stint'.

    # Reset the index after filtering
    driver_data = driver_data.reset_index(drop=True)

    # **2. Assign is_race_pace Based on Range**
    driver_data['is_race_pace'] = 0  # Default to 0
    driver_data.loc[
        (driver_data['LapTime (s)'] >= min_time) & 
        (driver_data['LapTime (s)'] <= max_time), 
        'is_race_pace'
    ] = 1

    # **3. Calculate LapTimeDifference**
    driver_data['LapTimeDifference'] = driver_data['LapTime (s)'].diff(periods = -1 ).abs()

    driver_data = driver_data.dropna(subset=['LapTimeDifference']).reset_index(drop=True)

    # **4. Add Consistency Feature with Non-Overlapping Rolling**
    window_size = 5  # Non-overlapping group size
    max_diff = 3   # Maximum allowed difference for consistency

    # Assign groups for non-overlapping rolling
    driver_data['Group'] = (driver_data.index // window_size)

    # Calculate max, min, and range for each group
    grouped = driver_data.groupby('Group')['LapTime (s)'].agg(['max', 'min']).reset_index()
    grouped['RollingRange'] = grouped['max'] - grouped['min']

    # Map RollingRange back to the original dataframe
    driver_data = driver_data.merge(grouped[['Group', 'RollingRange']], on='Group', how='left')

    # Add a Consistency flag (1 if within range, 0 otherwise)
    driver_data['Consistency'] = (driver_data['RollingRange'] <= max_diff).astype(int)

    # Drop intermediate columns to keep the dataframe clean
    driver_data.drop(columns=['Group', 'RollingRange'], inplace=True)

    return driver_data


# Bahrain
bahrain_pia = label_race_pace(transformed_laps_dataframes[non_sprint_events[0]], 'PIA', 95, 98).reset_index()
bahrain_oco = label_race_pace(transformed_laps_dataframes[non_sprint_events[0]], 'OCO', 97, 100).reset_index()
bahrain_bot = label_race_pace(transformed_laps_dataframes[non_sprint_events[0]], 'BOT', 97, 99).reset_index()
bahrain_ver = label_race_pace(transformed_laps_dataframes[non_sprint_events[0]], 'VER', 96, 98).reset_index()

# Saudi Arabian
saudi_lec = label_race_pace(transformed_laps_dataframes[non_sprint_events[1]], 'LEC', 93, 96).reset_index()
saudi_hul = label_race_pace(transformed_laps_dataframes[non_sprint_events[1]], 'HUL', 94.5, 96.5).reset_index()
saudi_alb = label_race_pace(transformed_laps_dataframes[non_sprint_events[1]], 'ALB', 94, 96).reset_index()


# Australia
australia_nor = label_race_pace(transformed_laps_dataframes[non_sprint_events[2]], 'NOR', 82.5, 84).reset_index()
australia_lec = label_race_pace(transformed_laps_dataframes[non_sprint_events[2]], 'LEC', 82, 84).reset_index()
australia_str = label_race_pace(transformed_laps_dataframes[non_sprint_events[2]], 'STR', 82, 84).reset_index()
australia_rus = label_race_pace(transformed_laps_dataframes[non_sprint_events[2]], 'RUS', 83, 85).reset_index()


# Imola
imola_mag = label_race_pace(transformed_laps_dataframes[non_sprint_events[4]], 'MAG', 81, 83).reset_index()
imola_alo = label_race_pace(transformed_laps_dataframes[non_sprint_events[4]], 'ALO', 81, 83).reset_index()
imola_zho = label_race_pace(transformed_laps_dataframes[non_sprint_events[4]], 'ZHO', 81, 83).reset_index()

#Spain

spain_ver = label_race_pace(transformed_laps_dataframes[non_sprint_events[4]], 'VER', 79, 82).reset_index()
spain_nor = label_race_pace(transformed_laps_dataframes[non_sprint_events[4]], 'NOR', 78, 81.5).reset_index()
spain_ham = label_race_pace(transformed_laps_dataframes[non_sprint_events[4]], 'HAM', 79, 82).reset_index()

#Belgium

belg_lec = label_race_pace(transformed_laps_dataframes[non_sprint_events[9]], 'LEC', 109, 112).reset_index()
belg_alb = label_race_pace(transformed_laps_dataframes[non_sprint_events[9]], 'ALB', 108, 111).reset_index()
belg_bot = label_race_pace(transformed_laps_dataframes[non_sprint_events[9]], 'BOT', 109, 112).reset_index()

# Step 2: Combine all labeled data
all_training_data = pd.concat([
    bahrain_pia, bahrain_oco, bahrain_bot, bahrain_ver,
    saudi_lec, saudi_hul, saudi_alb,
    australia_nor, australia_lec, australia_str, australia_rus,
    imola_mag, imola_alo, imola_zho, spain_ver, spain_nor, spain_ham,
    belg_alb, belg_alb, bahrain_bot
], ignore_index=True)

## **Training the Model**

### **Model Choice**
- **Random Forest Classifier**:
  - Handles non-linearity in lap time features effectively.
  - Provides flexibility and robustness for classification.
- **Features Used**:
  - `LapTime (s)`: Lap time in seconds.
  - `LapTimeDifference`: Time difference between consecutive laps.
  - `Consistency`: Indicates stable pace across laps.
  - `Stint`: Numeric indicator of the current stint; helps the model capture that race-pace attempts tend to cluster within the same stint.
- **Target Variable**:
  - `is_race_pace`: Indicates whether a lap is part of a race pace stint.

### **Training Process**
1. **Standardization**:
   - Lap time features are standardized for consistent scaling.
2. **Train-Test Split**:
   - Data is split into training and testing sets to evaluate model performance.
3. **Evaluation**:
   - Metrics such as accuracy and classification reports are used to evaluate the model.

In [4]:
class RacePaceAnalyzer:
    def __init__(self):
        self.pipeline = Pipeline([
            ('scaler', StandardScaler()),
            ('classifier', RandomForestClassifier(n_estimators=100, random_state=42))
        ])

    def train_model(self, data=None, test_size=0.2, random_state=42):
        if data is None:
            data = all_training_data

        data = data.copy()

        feature_cols = [
            'LapTime (s)',
            'LapTimeDifference',
            'Consistency',
            'Stint'
        ]
        X = data[feature_cols]
        y = data['is_race_pace']

        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=test_size, random_state=random_state, stratify=y
        )

        self.pipeline.fit(X_train, y_train)

        y_pred = self.pipeline.predict(X_test)
        print("Accuracy:", accuracy_score(y_test, y_pred))
        print("Classification Report:\n", classification_report(y_test, y_pred))

    def predict(self, data):
        #Preduct using the trained model
        return self.pipeline.predict(data)
    
    def save_model(self, path='race_pace_pipeline.pkl'):
        joblib.dump(self.pipeline, path)
        print(f"Pipeline saved to {path}")


    def load_model(self, path='race_pace_pipeline.pkl'):
        self.pipeline = joblib.load(path)
        print("Pipeline loaded successfully")
               
    @staticmethod
    def add_consistency_feature(dataframe, window_size, max_diff, n_to_remove=2):
        """
        Add consistency feature to the given dataframe.
        """
        dataframe = dataframe.sort_values('LapTime (s)').iloc[n_to_remove:-n_to_remove].copy()
        dataframe = dataframe.sort_values('LapNumber').reset_index(drop=True)
        dataframe['LapTimeDifference'] = dataframe['LapTime (s)'].diff(periods=-1).abs()
        dataframe = dataframe.dropna(subset=['LapTimeDifference']).reset_index(drop=True)
        dataframe['Group'] = dataframe.index // window_size
        grouped = dataframe.groupby('Group')['LapTime (s)'].agg(['max', 'min']).reset_index()
        grouped['RollingRange'] = grouped['max'] - grouped['min']
        dataframe = dataframe.merge(grouped[['Group', 'RollingRange']], on='Group', how='left')
        dataframe['Consistency'] = (dataframe['RollingRange'] <= max_diff).astype(int)
        dataframe.drop(columns=['Group', 'RollingRange'], inplace=True)
        return dataframe

    def get_race_pace_laps(self, event_data, window_size=3, max_diff=3, n_to_remove=2, lap_time_tolerance=2.0):
        """
        Identify race pace laps for all drivers in a given event.
        """
        drivers_event = event_data['Driver'].unique()
        race_pace_laps_all_drivers = []

        for driver in drivers_event:
            driver_data = event_data.pick_drivers(driver).copy()
            driver_data = driver_data.pick_wo_box()
            driver_data = driver_data[driver_data['IsAccurate'] == True]

            if driver_data.empty:
                continue

            driver_data = self.add_consistency_feature(driver_data, window_size, max_diff, n_to_remove)
            unknown_features = driver_data[['LapTime (s)', 'LapTimeDifference', 'Consistency', 'Stint']]

            if unknown_features.empty:
                continue

            driver_data['is_race_pace'] = self.pipeline.predict(unknown_features)

            race_pace_driver = driver_data[driver_data['is_race_pace'] == 1]
            median_lap_time = race_pace_driver['LapTime (s)'].median()
            lower_bound = median_lap_time - lap_time_tolerance
            upper_bound = median_lap_time + lap_time_tolerance

            race_pace_driver = race_pace_driver[
                (race_pace_driver['LapTime (s)'] >= lower_bound) &
                (race_pace_driver['LapTime (s)'] <= upper_bound)
            ]

            race_pace_laps_all_drivers.append(race_pace_driver)

        if race_pace_laps_all_drivers:
            race_pace_laps_all_drivers_df = pd.concat(race_pace_laps_all_drivers, ignore_index=True)

            # Filter rows where 'LapTime (s)' is within ±3 of the median for the DataFrame
            filtered_race_pace_laps_all_drivers = race_pace_laps_all_drivers_df[
                (race_pace_laps_all_drivers_df['LapTime (s)'] <
                 race_pace_laps_all_drivers_df['LapTime (s)'].median() + 3) &
                (race_pace_laps_all_drivers_df['LapTime (s)'] >
                 race_pace_laps_all_drivers_df['LapTime (s)'].median() - 3)
            ]

            return filtered_race_pace_laps_all_drivers



## **Applying the Model to Unknown Data**

### **Steps**
1. **Filter Laps**:
   - Exclude box laps and inaccurate laps from the dataset.
2. **Add Consistency Feature**:
   - Calculate consistency for grouped laps within a rolling window.
3. **Feature Standardization**:
   - Standardize the lap time features using the same scaler from training.
4. **Prediction**:
   - Use the trained Random Forest model to classify laps as race pace or not.

### **Median-Based Filtering**
- After prediction, laps are filtered further based on proximity to the median lap time for each driver to ensure outliers are excluded.

---

In [5]:
analyzer = RacePaceAnalyzer()
analyzer.train_model()
analyzer.save_model(path='race_pace_pipeline.pkl')

Accuracy: 0.9166666666666666
Classification Report:
               precision    recall  f1-score   support

           0       0.85      0.96      0.90        23
           1       0.97      0.89      0.93        37

    accuracy                           0.92        60
   macro avg       0.91      0.92      0.91        60
weighted avg       0.92      0.92      0.92        60

Pipeline saved to race_pace_pipeline.pkl


## **Potential for Improvement**

This model serves as an initial implementation and can be improved in several ways:
1. **More Data**:
   - Incorporate additional races and practice sessions to enhance generalization.
2. **Feature Engineering**:
   - Add more features
3. **Model Selection**:
   - Explore other models like Gradient Boosting or Neural Networks( Need more data for data, I could use the whole 2023 and 2024 seasons as training for the 2025 season) for potentially better performance.
4. **Dynamic Parameters**:
   - Adjust thresholds (e.g., `max_diff`, `window_size`) dynamically based on track characteristics.

---

## **Conclusion**

The project demonstrates an end-to-end pipeline for identifying race pace laps in FP2 sessions using machine learning. While the results are promising, this is just the beginning, and further iterations can improve the model's accuracy and adaptability across different tracks and conditions.