# N05: Gap-Based Strategy Rules for F1

## 1. Introduction and Objective


Gap analysis is a critical component of Formula 1 race strategy. The time difference between cars (measured in seconds) directly influences strategic decisions like undercuts, overcuts, and defensive pit stops. This notebook implements rules that use these gaps to make strategic recommendations.


### Why Gap Analysis Matters

* **Undercut Opportunities:** When a car is close behind another (~1-2s), pitting earlier might allow overtaking through fresher tires
* **Overcut Potential:** Sometimes staying out longer works better, especially when the leading car has clean air
* **Defensive Strategy**: Teams must react when competitors attempt undercuts
* **Traffic Management**: Gaps determine whether a car will exit the pits into free air or traffic

### Our Objectives

Extract and process gap data from FastF1 (replacing our previous computer vision approach)
Analyze gap distributions to establish meaningful thresholds for strategic decisions
Implement four key rules:

1. **Undercut Opportunity Rule**: Identify when an undercut might succeed
2. **Defensive Pit Stop Rule**: Protect position against undercut attempts
3. **Overcut Strategy Rule:** Recognize when staying out longer is advantageous
4. **Traffic Management Rule**: Avoid exiting pit stops into traffic



These rules will complement our existing components (tire degradation, lap time analysis, and radio communication) to create a comprehensive F1 strategy system.

---

## What is an undercut or an overcut?

In this section, I will make an explanation of these two concepts, enhancing the explanation with two posters made with ChatGPT (they have some spelling mistakes but still provide good explanations).

<p float="left">
  <img src="images/undercut_poster.png" alt="Undercut Image" width="40%" style="margin-right: 10px;">
  <img src="images/overcut_poster.png" alt="Overcut Image" width="40%">
</p>


### Undercut

The F1 undercut is a strategic maneuver that can change the course of a race. When the chasing car is within about three seconds of the leader, the leader may choose to enter the pit lane early to switch to fresh tires.

Although this pit stop costs roughly 20–23 seconds, the significant performance boost from the fresh tires allows the leader to complete laps much faster than the car still running on worn tires. By the time the chaser paces into its own pit stop, the leader has gained enough of a time advantage to overtake on track. 

This short-term benefit—gaining extra pace due to better tire grip—is critical on circuits with high tire degradation, such as Barcelona.


### Overcut

In contrast, the overcut is a similar concept with an opposite approach. Rather than pitting early, a driver using the overcut strategy remains on the track with worn tires for a longer period while the competitors pit.

If the driver can maintain competitive lap times despite the older rubber—or if the competitors’ new tires take extra time to warm up—the longer stint can yield a time advantage. Once the rival makes its pit stop, the overcutting driver then pits later and, thanks to having run a more consistent pace in clean air, can gain positions on the circuit.




In [1]:
__author__ = "Víctor Vega Sobral"

---

In [2]:
# Standard data processing libraries
import pandas as pd              # For data manipulation and analysis
import numpy as np               # For numerical operations
import matplotlib.pyplot as plt  # For creating visualizations
import seaborn as sns            # For enhanced visualizations
from datetime import datetime, timedelta    # For timestamp handling
import os                        # For operating system interactions
import sys                       # For system-specific parameters and functions
import requests
# Add parent directory to system path to make custom modules accessible
sys.path.append(os.path.abspath('../'))
import fastf1


# Import Experta components for building the rule engine
from experta import Rule, NOT, OR, AND, AS, MATCH, TEST, EXISTS  # Rule definition components
from experta import DefFacts, Fact, Field, KnowledgeEngine      # Core Experta classes
fastf1.Cache.enable_cache('../../f1-strategy/f1_cache')

import utils.N01_agent_setup as agent_setup
from utils.N01_agent_setup import (
        TelemetryFact,              # For storing car performance data
        DegradationFact,            # For storing tire degradation information
        GapFact,                    # For storing gap information
        RaceStatusFact,             # For storing current race conditions
        StrategyRecommendation,     # For storing strategy recommendations
        F1StrategyEngine,           # Base engine class
        transform_gap_data,         # Function to convert gap data to facts
        load_gap_data               # Function to load gap data from FastF1
)


# Configure plotting style for better visualization
plt.style.use('seaborn-v0_8-darkgrid')  # Set default plot style
sns.set_context("notebook", font_scale=1.2)  # Increase font size slightly

print("Libraries and fact classes loaded successfully.")

Engine initialized with 2 facts
Initial facts: [InitialFact(), RaceStatusFact(lap=1, total_laps=60, race_phase='start', track_status='clear')]

=== TIRE DEGRADATION ANALYSIS ===
Using first predicted rate as current degradation: 0.07
Tire facts declared: {'degradation': DegradationFact(degradation_rate=0.07, predicted_rates=frozenlist([0.07, 0.09, 0.12])), 'telemetry': TelemetryFact(tire_age=4, compound_id=2, driver_number=44, position=1)}
Engine now has 4 facts

=== LAP TIME PREDICTION ===
Lap time facts declared: {'telemetry': TelemetryFact(driver_number=44, lap_time=80.3, predicted_lap_time=79.9, compound_id=2, tire_age=4, position=1)}
Engine now has 5 facts

=== RADIO ANALYSIS ===
Radio fact declared: <f-5>
Engine now has 6 facts

=== ALL ENGINE FACTS ===
Fact 1: InitialFact - <f-0>
Fact 2: RaceStatusFact - <f-1>
Fact 3: DegradationFact - <f-2>
Fact 4: TelemetryFact - <f-3>
Fact 5: TelemetryFact - <f-4>
Fact 6: RadioFact - <f-5>
Libraries and fact classes loaded successfully.


---

## 2. Data Extraction and Processing

### 2.1 Loading session and lap data

In [3]:
# ------------------------------------------------------------------------------------
# LOAD SESSION DATA
# ------------------------------------------------------------------------------------
# Loading the session data for the Spanish Grand Prix 2023 (Race)
print("Loading the Spanish Grand Prix 2023 session data...")
race = fastf1.get_session(2023, 'Spain', 'R')
# The .load() method fetches basic session details, including lap times and more
race.load()

# ------------------------------------------------------------------------------------
# LOAD LAP DATA
# ------------------------------------------------------------------------------------
# Retrieving lap data from the session. This is automatically loaded when calling race.load()
print("Loading lap data...")
laps_data = race.laps
# Print the number of lap entries loaded
print(f"Loaded {len(laps_data)} lap entries")

# Note:
# We no longer use race.api.timing_data because all necessary timing data is now available directly from laps_data.

# ------------------------------------------------------------------------------------
# DEFINE FUNCTION: GET GAP AT LAP COMPLETION
# ------------------------------------------------------------------------------------
def get_gap_at_lap_completion(driver, lap_number, laps_data):
    """
    Calculates the gap to the leader for a given driver at the moment they complete a lap.
    It is computed by comparing the lap completion time ('Time') of the driver with the leader's
    completion time (minimum 'Time' of that lap).

    Args:
        driver (str): Abbreviation for the driver.
        lap_number (int): The lap number to analyze.
        laps_data (DataFrame): The lap data from FastF1.

    Returns:
        float or None: The gap to the leader in seconds, or None if the information is missing.
    """
    # Filter the lap data for the specific driver and lap number
    driver_lap = laps_data[(laps_data['Driver'] == driver) & (laps_data['LapNumber'] == lap_number)]
    if driver_lap.empty:
        # If no data is found for the given driver and lap, return None
        return None

    # Get the completion time of this lap for the driver
    driver_finish_time = driver_lap.iloc[0]['Time']

    # For the specified lap, find the fastest (leader) lap completion time
    lap_group = laps_data[laps_data['LapNumber'] == lap_number]
    leader_time = lap_group['Time'].min()

    # Compute the gap by calculating the time difference between the driver's finish time and the leader's finish time
    gap_to_leader = driver_finish_time - leader_time
    # Convert the gap to seconds if gap_to_leader is a timedelta object
    if hasattr(gap_to_leader, 'total_seconds'):
        gap_to_leader = gap_to_leader.total_seconds()
    return gap_to_leader



core           INFO 	Loading data for Spanish Grand Prix - Race [v3.1.6]
INFO:fastf1.fastf1.core:Loading data for Spanish Grand Prix - Race [v3.1.6]
req            INFO 	Using cached data for session_info
INFO:fastf1.fastf1.req:Using cached data for session_info
req            INFO 	Using cached data for driver_info
INFO:fastf1.fastf1.req:Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
INFO:fastf1.fastf1.req:Using cached data for session_status_data
req            INFO 	Using cached data for lap_count
INFO:fastf1.fastf1.req:Using cached data for lap_count
req            INFO 	Using cached data for track_status_data
INFO:fastf1.fastf1.req:Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
INFO:fastf1.fastf1.req:Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
INFO:fastf1.fastf1.req:Using cached data for timing_app_data
core         

Loading the Spanish Grand Prix 2023 session data...


req            INFO 	Using cached data for car_data
INFO:fastf1.fastf1.req:Using cached data for car_data
req            INFO 	Using cached data for position_data
INFO:fastf1.fastf1.req:Using cached data for position_data
req            INFO 	Using cached data for weather_data
INFO:fastf1.fastf1.req:Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
INFO:fastf1.fastf1.req:Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '44', '63', '11', '55', '18', '14', '31', '24', '10', '16', '22', '81', '21', '27', '23', '4', '20', '77', '2']
INFO:fastf1.fastf1.core:Finished loading data for 20 drivers: ['1', '44', '63', '11', '55', '18', '14', '31', '24', '10', '16', '22', '81', '21', '27', '23', '4', '20', '77', '2']


Loading lap data...
Loaded 1312 lap entries


---

### 2.2 Calculate all gaps

In [4]:
# ------------------------------------------------------------------------------------
# DEFINE FUNCTION: CALCULATE ALL GAPS
# ------------------------------------------------------------------------------------
def calculate_all_gaps(laps_data):
    """
    Calculates the gaps for all drivers at the completion of each lap. It includes:
    - The gap to the leader (difference in 'Time' at lap completion)
    - The gap relative to the car directly ahead (difference between consecutive gaps)
    - The gap relative to the car directly behind
    - Flags for undercut window (<2.5 seconds) and DRS window (<1.0 second)

    Args:
        laps_data (DataFrame): Lap data provided by FastF1.

    Returns:
        DataFrame: Processed gap data with all calculated metrics.
    """
    gap_results = []  # This list will store the results for each lap and driver

    # Get the unique drivers and sorted lap numbers from the lap data
    drivers = laps_data['Driver'].unique()
    lap_numbers = sorted(laps_data['LapNumber'].unique())

    print(f"Processing gaps for {len(drivers)} drivers across {len(lap_numbers)} laps...")

    # Process each lap one by one
    for lap_num in lap_numbers:
        print(f"Processing lap {lap_num}...", end='\r')
        
        # Create a dictionary to store the computed gap to the leader for each driver on this lap
        lap_positions = {}
        for driver in drivers:
            gap_to_leader = get_gap_at_lap_completion(driver, lap_num, laps_data)
            if gap_to_leader is not None:
                # Store the gap only if data exists for this driver on this lap
                lap_positions[driver] = gap_to_leader

        # Order the drivers by their gap to the leader. The leader will have a gap of 0.
        sorted_drivers = sorted(lap_positions.items(), key=lambda x: x[1])

        # For each driver, calculate additional gap metrics relative to the car ahead and behind
        for i, (driver, gap_to_leader) in enumerate(sorted_drivers):
            # Retrieve additional driver details (like car number and team) from the lap data
            driver_info = laps_data[laps_data['Driver'] == driver].iloc[0]
            driver_number = driver_info['DriverNumber']

            # Initialize default values for gaps and adjacent car information
            gap_ahead = None
            car_ahead = None
            gap_behind = None
            car_behind = None

            # If this is not the leader, calculate the gap relative to the car ahead
            if i > 0:
                car_ahead = sorted_drivers[i-1][0]
                # The gap to the car ahead is the difference between the current driver's gap and the one immediately ahead
                gap_ahead = gap_to_leader - sorted_drivers[i-1][1]
                car_ahead_info = laps_data[laps_data['Driver'] == car_ahead].iloc[0]
                car_ahead_number = car_ahead_info['DriverNumber']
            else:
                car_ahead_number = None  # No car ahead for the leader

            # If this is not the last driver, calculate the gap to the car behind
            if i < len(sorted_drivers) - 1:
                car_behind = sorted_drivers[i+1][0]
                gap_behind = sorted_drivers[i+1][1] - gap_to_leader
                car_behind_info = laps_data[laps_data['Driver'] == car_behind].iloc[0]
                car_behind_number = car_behind_info['DriverNumber']
            else:
                car_behind_number = None  # No car behind for the last driver

            # Append the computed data for this driver and lap into the results list
            gap_results.append({
                'LapNumber': lap_num,
                'Driver': driver,
                'DriverNumber': driver_number,
                'Position': i + 1,  # Position is 1-indexed (first is 1, not 0)
                'GapToLeader': gap_to_leader,
                'CarAhead': car_ahead,
                'CarAheadNumber': car_ahead_number,
                'GapToCarAhead': gap_ahead,
                'CarBehind': car_behind,
                'CarBehindNumber': car_behind_number,
                'GapToCarBehind': gap_behind,
                'InUndercutWindow': gap_ahead is not None and gap_ahead < 2.5,  # True if gap ahead is less than 2.5 seconds
                'InDRSWindow': gap_ahead is not None and gap_ahead < 1.0       # True if gap ahead is less than 1.0 seconds
            })
            
    print("\nProcessing complete!")
    # Convert the list of gap results into a Pandas DataFrame and return it
    return pd.DataFrame(gap_results)




---

### 2.3 Calculate gaps using only lap data

In [5]:
# ------------------------------------------------------------------------------------
# CALCULATE GAPS USING ONLY LAP DATA
# ------------------------------------------------------------------------------------
print("Calculating gaps at lap completion points...")
all_gaps_df = calculate_all_gaps(laps_data)

# ------------------------------------------------------------------------------------
# ADD ADDITIONAL DRIVER INFORMATION
# ------------------------------------------------------------------------------------
# Create a mapping for each driver to include additional information (e.g., car number, team)
driver_info = {}
for _, lap in laps_data.iterrows():
    driver = lap['Driver']
    # Ensure the driver has an entry in the dictionary
    if driver not in driver_info:
        driver_info[driver] = {
            'DriverNumber': lap['DriverNumber'],
            'Team': lap['Team']
        }

# Map the 'Team' information to the gap DataFrame using the driver mapping
all_gaps_df['Team'] = all_gaps_df['Driver'].map(lambda x: driver_info.get(x, {}).get('Team', 'Unknown'))

Calculating gaps at lap completion points...
Processing gaps for 20 drivers across 66 laps...
Processing lap 66.0...
Processing complete!


---

### 2.4 Basic statistics and csv output

In [6]:
# ------------------------------------------------------------------------------------
# BASIC STATISTICS AND CSV OUTPUT FOR GAP DATA
# ------------------------------------------------------------------------------------

# Print basic statistics regarding the gap data
print("\nBasic statistics on gap data:")

# Total number of gap records in the DataFrame
print(f"Total gap records: {len(all_gaps_df)}")
# The number of unique laps covered in the DataFrame
print(f"Number of laps covered: {all_gaps_df['LapNumber'].nunique()}")
# The number of unique drivers in the DataFrame
print(f"Number of drivers: {all_gaps_df['Driver'].nunique()}")

# ------------------------------------------------------------------------------------
# CALCULATE PERCENTAGE STATISTICS FOR STRATEGIC WINDOWS
# ------------------------------------------------------------------------------------
# Calculate the percentage of records where the gap ahead is less than 2.5 seconds,
# which could indicate an undercut opportunity
undercut_pct = all_gaps_df['InUndercutWindow'].mean() * 100

# Calculate the percentage of records where the gap ahead is less than 1.0 second,
# which may indicate eligibility for DRS
drs_pct = all_gaps_df['InDRSWindow'].mean() * 100

# Print the calculated percentages with one decimal place
print(f"\nPercentage of gaps in undercut window (<2.5s): {undercut_pct:.1f}%")
print(f"Percentage of gaps in DRS window (<1.0s): {drs_pct:.1f}%")

# ------------------------------------------------------------------------------------
# SUMMARY STATISTICS FOR THE GAP TO THE CAR AHEAD
# ------------------------------------------------------------------------------------
# Print summary statistics (e.g., count, mean, std, etc.) for the 'GapToCarAhead' column
print("\nSummary statistics for Gap To Car Ahead (seconds):")
print(all_gaps_df['GapToCarAhead'].describe())
missing_counts = all_gaps_df.isnull().sum()
print(missing_counts)



Basic statistics on gap data:
Total gap records: 1312
Number of laps covered: 66
Number of drivers: 20

Percentage of gaps in undercut window (<2.5s): 44.8%
Percentage of gaps in DRS window (<1.0s): 16.5%

Summary statistics for Gap To Car Ahead (seconds):
count    1246.000000
mean        4.420835
std         4.707452
min         0.093000
25%         1.331250
50%         2.751000
75%         5.667250
max        39.657000
Name: GapToCarAhead, dtype: float64
LapNumber            0
Driver               0
DriverNumber         0
Position             0
GapToLeader          0
CarAhead            66
CarAheadNumber      66
GapToCarAhead       66
CarBehind           66
CarBehindNumber     66
GapToCarBehind      66
InUndercutWindow     0
InDRSWindow          0
Team                 0
dtype: int64


---

### 2.5 Filling missing values

In [7]:
# ------------------------------------------------------------------------------------
# FILLING MISSING VALUES FOR LEADER AND LAST POSITION
# ------------------------------------------------------------------------------------
# For the leader, there is no car ahead; therefore, fill the missing values with 'Leader'
all_gaps_df['CarAhead'] = all_gaps_df['CarAhead'].fillna('Leader')
all_gaps_df['CarAheadNumber'] = all_gaps_df['CarAheadNumber'].fillna('Leader')
# Also, fill the missing gap value with 0 seconds, as the leader has no gap to a car ahead
all_gaps_df['GapToCarAhead'] = all_gaps_df['GapToCarAhead'].fillna(0)

# For the last position, there is no car behind; fill the missing values with 'Tail'
all_gaps_df['CarBehind'] = all_gaps_df['CarBehind'].fillna('Tail')
all_gaps_df['CarBehindNumber'] = all_gaps_df['CarBehindNumber'].fillna('Tail')
# Similarly, fill the missing gap value for the car behind with 0 seconds
all_gaps_df['GapToCarBehind'] = all_gaps_df['GapToCarBehind'].fillna(0)

# ------------------------------------------------------------------------------------
# EXPORT THE PROCESSED DATA TO A CSV FILE
# ------------------------------------------------------------------------------------
# Save the processed gap data to a CSV file
# The 'float_format' argument ensures all floats are formatted to 3 decimal places
all_gaps_df.to_csv("../../f1-strategy/data/processed/gaps_spain_data.csv", float_format="%.3f")

---

## Gap Data Processing: Summary and Next Steps

### What We've Accomplished

- **Data Extraction**
  - Loaded 2023 Spanish Grand Prix lap data with FastF1.
  - Implemented precise gap calculation at each lap completion.
  - Computed key metrics: gap to leader, gap to car ahead, and gap to car behind.

- **Strategic Metrics**
  - Added strategic flags: undercut window (gap < 2.5s) and DRS window (gap < 1.0s).
  - Calculated percentages for strategic opportunities.
  - Generated comprehensive statistics on gap distributions.

- **Data Cleaning**
  - Identified and filled missing values with appropriate placeholders.
  - Marked the leader (no car ahead) and the last position (no car behind).
  - Created a clean dataset ready for rule implementation.

- **Data Persistence**
  - Saved the processed gap data to CSV with consistent formatting.
  - Ensured the data is structured for integration with our expert system.

### Next Steps

- **Gap-Based Rule Implementation**
  - Define a `F1GapRules` class inheriting from `F1StrategyEngine` (similar to the approach in N04 for radio communication rules).
  - Implement four key strategic rules:
    - **Undercut Opportunity Rule:** Recommend early pit stops when following closely (e.g., `GapToCarAhead < 2.5s` along with favorable conditions like tire age and degradation).
    - **Defensive Pit Stop Rule:** Recommend defensive actions when being closely followed (e.g., when `GapToCarBehind < 2.5s` and the gap is decreasing).
    - **Overcut Strategy Rule:** Suggest staying out longer in clean air when leading cars pit.
    - **Traffic Management Rule:** Use gap data to predict post-pit positions and avoid pitting into traffic.

- **Testing and Integration**
  - Create a testing framework (similar to `test_radio_rules` in N04) to validate each rule using processed gap data.
  - Combine these rules with tire degradation (N02), lap time (N03), and radio rules (N04).
  - Implement conflict resolution and adjust priorities based on the reliability and importance of different data sources.

These gap-based rules will complete our strategic decision support system by addressing the critical aspect of relative positioning and the strategic opportunities it creates during races.


---

## 3. Defining the gap rules