# N05: Gap-Based Strategy Rules for F1

## 1. Introduction and Objective


Gap analysis is a critical component of Formula 1 race strategy. The time difference between cars (measured in seconds) directly influences strategic decisions like undercuts, overcuts, and defensive pit stops. This notebook implements rules that use these gaps to make strategic recommendations.


### Why Gap Analysis Matters

* **Undercut Opportunities:** When a car is close behind another (~1-2s), pitting earlier might allow overtaking through fresher tires
* **Overcut Potential:** Sometimes staying out longer works better, especially when the leading car has clean air
* **Defensive Strategy**: Teams must react when competitors attempt undercuts
* **Traffic Management**: Gaps determine whether a car will exit the pits into free air or traffic

### Our Objectives

Extract and process gap data from FastF1 (replacing our previous computer vision approach)
Analyze gap distributions to establish meaningful thresholds for strategic decisions
Implement four key rules:

1. **Undercut Opportunity Rule**: Identify when an undercut might succeed
2. **Defensive Pit Stop Rule**: Protect position against undercut attempts
3. **Overcut Strategy Rule:** Recognize when staying out longer is advantageous
4. **Traffic Management Rule**: Avoid exiting pit stops into traffic



These rules will complement our existing components (tire degradation, lap time analysis, and radio communication) to create a comprehensive F1 strategy system.

---

## What is an undercut or an overcut?

In this section, I will make an explanation of these two concepts, enhancing the explanation with two posters made with ChatGPT (they have some spelling mistakes but still provide good explanations).

<p float="left">
  <img src="images/undercut_poster.png" alt="Undercut Image" width="40%" style="margin-right: 10px;">
  <img src="images/overcut_poster.png" alt="Overcut Image" width="40%">
</p>


### Undercut

The F1 undercut is a strategic maneuver that can change the course of a race. When the chasing car is within about three seconds of the leader, the leader may choose to enter the pit lane early to switch to fresh tires.

Although this pit stop costs roughly 20–23 seconds, the significant performance boost from the fresh tires allows the leader to complete laps much faster than the car still running on worn tires. By the time the chaser paces into its own pit stop, the leader has gained enough of a time advantage to overtake on track. 

This short-term benefit—gaining extra pace due to better tire grip—is critical on circuits with high tire degradation, such as Barcelona.


### Overcut

In contrast, the overcut is a similar concept with an opposite approach. Rather than pitting early, a driver using the overcut strategy remains on the track with worn tires for a longer period while the competitors pit.

If the driver can maintain competitive lap times despite the older rubber—or if the competitors’ new tires take extra time to warm up—the longer stint can yield a time advantage. Once the rival makes its pit stop, the overcutting driver then pits later and, thanks to having run a more consistent pace in clean air, can gain positions on the circuit.




In [1]:
__author__ = "Víctor Vega Sobral"

---

In [2]:
# Standard data processing libraries
import pandas as pd              # For data manipulation and analysis
import numpy as np               # For numerical operations
import matplotlib.pyplot as plt  # For creating visualizations
import seaborn as sns            # For enhanced visualizations
from datetime import datetime    # For timestamp handling
import os                        # For operating system interactions
import sys                       # For system-specific parameters and functions
import requests
# Add parent directory to system path to make custom modules accessible
sys.path.append(os.path.abspath('../'))



# Import Experta components for building the rule engine
from experta import Rule, NOT, OR, AND, AS, MATCH, TEST, EXISTS  # Rule definition components
from experta import DefFacts, Fact, Field, KnowledgeEngine      # Core Experta classes


import utils.N01_agent_setup as agent_setup
from utils.N01_agent_setup import (
        TelemetryFact,              # For storing car performance data
        DegradationFact,            # For storing tire degradation information
        GapFact,                    # For storing gap information
        RaceStatusFact,             # For storing current race conditions
        StrategyRecommendation,     # For storing strategy recommendations
        F1StrategyEngine,           # Base engine class
        transform_gap_data,         # Function to convert gap data to facts
        load_gap_data               # Function to load gap data from FastF1
)


# Configure plotting style for better visualization
plt.style.use('seaborn-v0_8-darkgrid')  # Set default plot style
sns.set_context("notebook", font_scale=1.2)  # Increase font size slightly

print("Libraries and fact classes loaded successfully.")

Engine initialized with 2 facts
Initial facts: [InitialFact(), RaceStatusFact(lap=1, total_laps=60, race_phase='start', track_status='clear')]

=== TIRE DEGRADATION ANALYSIS ===
Using first predicted rate as current degradation: 0.07
Tire facts declared: {'degradation': DegradationFact(degradation_rate=0.07, predicted_rates=frozenlist([0.07, 0.09, 0.12])), 'telemetry': TelemetryFact(tire_age=4, compound_id=2, driver_number=44, position=1)}
Engine now has 4 facts

=== LAP TIME PREDICTION ===
Lap time facts declared: {'telemetry': TelemetryFact(driver_number=44, lap_time=80.3, predicted_lap_time=79.9, compound_id=2, tire_age=4, position=1)}
Engine now has 5 facts

=== RADIO ANALYSIS ===
Radio fact declared: <f-5>
Engine now has 6 facts

=== ALL ENGINE FACTS ===
Fact 1: InitialFact - <f-0>
Fact 2: RaceStatusFact - <f-1>
Fact 3: DegradationFact - <f-2>
Fact 4: TelemetryFact - <f-3>
Fact 5: TelemetryFact - <f-4>
Fact 6: RadioFact - <f-5>
Libraries and fact classes loaded successfully.


---

## 2. Data Extraction and Processing

### 2.1 Extracting basic Gaps

In [3]:
def get_session_key(year, gp_name):
    """
    Get the session_key for the specified race to use with OpenF1 API.
    
    Args:
        year (int): Year of the race
        gp_name (str): Name of the Grand Prix
        
    Returns:
        int: session_key for OpenF1 API
    """
    # For Spain 2023 GP, the correct session_key is 9102
    if year == 2023 and gp_name.lower() == "spain":
        return 9102
    
    # For other races, we would need to implement a lookup or API call
    # to find the correct session_key
    print(f"⚠️ Only Spain 2023 session_key is implemented")
    print(f"⚠️ For other races, you need to find the correct session_key")
    return None

def extract_basic_gaps(year, gp_name, max_interval=None):
    """
    Extract gap data between cars from OpenF1 API.
    
    Args:
        year (int): Year of the race
        gp_name (str): Name of the Grand Prix
        max_interval (float, optional): Filter for intervals less than this value
        
    Returns:
        DataFrame: Basic gap data between cars
    """
    base_url = "https://api.openf1.org/v1/intervals"
    
    # Get the session_key for this race
    session_key = get_session_key(year, gp_name)
    if session_key is None:
        print("❌ Could not get session_key for this race")
        return pd.DataFrame()
    
    print(f"Fetching data from OpenF1 for {gp_name} {year} (session_key: {session_key})")
    
    # Build the URL with optional filtering for interval size
    if max_interval is not None:
        url = f"{base_url}?session_key={session_key}&interval<{max_interval}"
    else:
        url = f"{base_url}?session_key={session_key}"
    
    try:
        print("Making request to OpenF1 API...")
        response = requests.get(url)
        response.raise_for_status()  # Raise exception for HTTP errors
        
        # Check if we received valid data
        if response.text and response.text.strip():
            intervals_data = response.json()
            print(f"✓ Found {len(intervals_data)} records for session_key={session_key}")
            
            # Convert to DataFrame
            gaps_df = pd.DataFrame(intervals_data)
            
            # Basic data processing
            if 'date' in gaps_df.columns:
                # Handle date conversion
                gaps_df['date'] = pd.to_datetime(gaps_df['date'], errors='coerce')
            
            # Ensure consistent column naming
            if 'interval' in gaps_df.columns:
                gaps_df.rename(columns={'interval': 'GapToCarAhead'}, inplace=True)
            
            # Extract driver numbers for consistency with our existing code
            if 'driver_number' in gaps_df.columns:
                gaps_df.rename(columns={'driver_number': 'DriverNumber'}, inplace=True)
            
            if 'driver_ahead' in gaps_df.columns:
                gaps_df.rename(columns={'driver_ahead': 'CarAheadNumber'}, inplace=True)
            
            return gaps_df
        else:
            print("Empty response from API")
            return pd.DataFrame()
    
    except Exception as e:
        print(f"Error querying OpenF1: {e}")
        return pd.DataFrame()

---

### 2.2 Calculating gap trends 

In [4]:
def calculate_gap_trends(gaps_df, window_size=3):
    """
    Calculate how gaps are evolving over time (opening or closing).
    
    Args:
        gaps_df: DataFrame with basic gap data
        window_size: Number of laps to calculate trend over
        
    Returns:
        DataFrame: Gap data with trend information added
    """
    # Make a copy to avoid modifying the original
    result_df = gaps_df.copy()
    
    # Add columns for trend data
    result_df['GapToCarAheadTrend'] = np.nan
    
    # Group by driver and sort by date to calculate trends over time
    for driver in result_df['DriverNumber'].unique():
        # Get all data for this driver
        driver_data = result_df[result_df['DriverNumber'] == driver].sort_values('date')
        
        if len(driver_data) <= window_size:
            continue  # Not enough data points
        
        # For each data point, calculate trend (change in gap over time window)
        for i in range(window_size, len(driver_data)):
            current_idx = driver_data.iloc[i].name
            previous_idx = driver_data.iloc[i - window_size].name
            
            # Gap trend (positive = gap increasing, negative = gap decreasing)
            current_gap = driver_data.iloc[i]['GapToCarAhead']
            previous_gap = driver_data.iloc[i - window_size]['GapToCarAhead']
            
            if not pd.isna(current_gap) and not pd.isna(previous_gap):
                # Calculate average change per data point in the window
                result_df.loc[current_idx, 'GapToCarAheadTrend'] = (current_gap - previous_gap) / window_size
    
    # Note: Here we're not calculating GapToCarBehindTrend since OpenF1 data 
    # only provides gaps to car ahead
    
    return result_df

---

### 2.3 Identifying strategic windows

In [5]:
def identify_strategic_windows(gaps_df):
    """
    Identify undercut and DRS windows based on gap sizes.
    
    Args:
        gaps_df: DataFrame with gap data
        
    Returns:
        DataFrame: Gap data with strategic window flags
    """
    # Make a copy to avoid modifying the original
    result_df = gaps_df.copy()
    
    # Add columns for strategic windows
    result_df['InUndercutWindow'] = False
    result_df['InDRSWindow'] = False
    
    # Define thresholds (based on F1 strategy knowledge)
    UNDERCUT_THRESHOLD = 3  # Gap less than 3s indicates undercut possibility
    DRS_THRESHOLD = 1.0       # Gap less than 1.0s enables DRS
    
    # Mark rows where car is within undercut window of car ahead
    mask = (result_df['GapToCarAhead'] < UNDERCUT_THRESHOLD)
    result_df.loc[mask, 'InUndercutWindow'] = True
    
    # Mark rows where car is within DRS window of car ahead
    mask = (result_df['GapToCarAhead'] < DRS_THRESHOLD)
    result_df.loc[mask, 'InDRSWindow'] = True
    
    # Add inferred lap number from date/time if not present
    if 'LapNumber' not in result_df.columns and 'date' in result_df.columns:
        # Group by driver and create sequential lap numbers
        for driver in result_df['DriverNumber'].unique():
            driver_data = result_df[result_df['DriverNumber'] == driver].sort_values('date')
            # Assign sequential numbers starting from 1
            result_df.loc[driver_data.index, 'LapNumber'] = range(1, len(driver_data) + 1)
    
    return result_df

---

### 2.4 Load and process race gaps

In [6]:
def load_process_all_gaps(year, gp_name, max_interval=None):
    """
    Load all gap data from OpenF1 sin filtrar por fin de vuelta.
    """
    # Básicamente mantener el código actual pero sin intentar filtrar por fin de vuelta
    # Usaremos todos los registros y luego los correlacionaremos según sea necesario
    
    # Extract basic gaps data from OpenF1
    print(f"Loading {gp_name} {year} gap data from OpenF1...")
    gaps_df = extract_basic_gaps(year, gp_name, max_interval)
    
    if gaps_df.empty:
        print("❌ Failed to load gap data")
        return gaps_df
    
    # Calculate trends
    print("\nCalculating gap trends...")
    gaps_df = calculate_gap_trends(gaps_df)
    
    # Identify strategic windows
    print("\nIdentifying strategic windows...")
    gaps_df = identify_strategic_windows(gaps_df)
    
    print(f"Processing complete. Dataset has {len(gaps_df)} records with full gap analysis.")
    print(f"Note: This includes multiple measurements per lap that will need to be correlated")
    
    return gaps_df

---

### 2.5 Executing the Pipeline

In [7]:
# Execute the full gap data processing pipeline
# Using the 2023 Spanish Grand Prix as our example
# We filter for gaps under 5 seconds which are most strategically relevant
processed_gaps = load_process_all_gaps(2023, "Spain", max_interval=5.0)

# Display a sample of the processed data
print("\nSample of processed gap data:")
print(processed_gaps.head())



Loading Spain 2023 gap data from OpenF1...
Fetching data from OpenF1 for Spain 2023 (session_key: 9102)
Making request to OpenF1 API...
✓ Found 18281 records for session_key=9102

Calculating gap trends...

Identifying strategic windows...
Processing complete. Dataset has 18281 records with full gap analysis.
Note: This includes multiple measurements per lap that will need to be correlated

Sample of processed gap data:
  gap_to_leader  GapToCarAhead  DriverNumber                             date  \
0         0.131          0.131            55 2023-06-04 13:03:22.688000+00:00   
1         0.276          0.145             4 2023-06-04 13:03:22.797000+00:00   
2         0.347          0.071            44 2023-06-04 13:03:22.797000+00:00   
3         0.469          0.122            18 2023-06-04 13:03:22.922000+00:00   
4         0.713          0.244            31 2023-06-04 13:03:23.359000+00:00   

   session_key  meeting_key  GapToCarAheadTrend  InUndercutWindow  \
0         9102      

In [8]:
# Save the processed data to CSV for later use
output_dir = "../../f1-strategy/data/processed"
os.makedirs(output_dir, exist_ok=True)
output_path = f"{output_dir}/spain_2023_gaps_openf1.csv"
processed_gaps.to_csv(output_path, index=False, float_format='%.3f')
print(f"\nSaved processed gap data to {output_path}")




Saved processed gap data to ../../f1-strategy/data/processed/spain_2023_gaps_openf1.csv


In [9]:
# Basic statistics about the gaps
print("\nGap statistics:")
gap_stats = processed_gaps['GapToCarAhead'].describe()
print(gap_stats)

# Count strategic windows
print("\nStrategic window summary:")
print(f"Undercut windows: {processed_gaps['InUndercutWindow'].sum()} ({processed_gaps['InUndercutWindow'].mean()*100:.1f}%)")
print(f"DRS windows: {processed_gaps['InDRSWindow'].sum()} ({processed_gaps['InDRSWindow'].mean()*100:.1f}%)")


Gap statistics:
count    18281.000000
mean         2.054774
std          1.277370
min          0.001000
25%          1.006000
50%          1.788000
75%          2.976000
max          4.999000
Name: GapToCarAhead, dtype: float64

Strategic window summary:
Undercut windows: 13789 (75.4%)
DRS windows: 4543 (24.9%)
