# F1 Report System

Two-agent system powered by Gemini 1.5 Flash:
- Agent 1: Data Collection (validates input, retrieves race data via FastF1)
- Agent 2: Report Generation (creates social media content from collected data)

## 1. Install Dependencies

In [7]:
# Install required packages
%pip install -q google-cloud-aiplatform
%pip install -q fastf1
%pip install -q pandas
%pip install -q python-dotenv

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## 2. Import Libraries

In [1]:
# Standard library imports
import os
import json
from datetime import datetime
from typing import Dict, List, Optional, Any

# Third-party imports
import pandas as pd
from dotenv import load_dotenv

# FastF1 for F1 data retrieval
import fastf1

# Google Cloud Vertex AI for Gemini
import vertexai
from vertexai.generative_models import GenerativeModel, Content, Part

print("Libraries imported successfully")

Libraries imported successfully


## 3. Configuration

In [2]:
# Load environment variables
load_dotenv()

# Google Cloud configuration
PROJECT_ID = os.getenv('GCP_PROJECT_ID', 'gen-lang-client-0467867580')
LOCATION = os.getenv('GCP_LOCATION', 'us-central1')
MODEL_NAME = 'gemini-2.5-flash'

# Initialize Vertex AI
vertexai.init(project=PROJECT_ID, location=LOCATION)
print(f"Vertex AI initialized for project: {PROJECT_ID}")

# Enable FastF1 cache
cache_dir = 'f1_cache'
os.makedirs(cache_dir, exist_ok=True)
fastf1.Cache.enable_cache(cache_dir)
print("FastF1 cache enabled")

Vertex AI initialized for project: gen-lang-client-0467867580
FastF1 cache enabled


## 4. Session Memory

In [3]:
# In-memory session storage
session_memory = {
    "reports": {},
    "query_history": []
}

def store_report(race_id: str, report_data: Dict[str, Any]) -> None:
    """Store a generated report in session memory."""
    session_memory["reports"][race_id] = {
        "data": report_data,
        "timestamp": datetime.now().isoformat()
    }
    session_memory["query_history"].append({
        "race_id": race_id,
        "timestamp": datetime.now().isoformat()
    })

def get_report(race_id: str) -> Optional[Dict[str, Any]]:
    """Retrieve a report from session memory."""
    return session_memory["reports"].get(race_id)

def get_history() -> List[Dict[str, Any]]:
    """Get the query history."""
    return session_memory["query_history"]

print("Session memory initialized")

Session memory initialized


## 5. F1 2025 Calendar

In [4]:
# F1 2025 Calendar for validation
F1_2025_CALENDAR = {
    1: {"name": "Bahrain Grand Prix", "circuit": "Bahrain International Circuit"},
    2: {"name": "Saudi Arabian Grand Prix", "circuit": "Jeddah Corniche Circuit"},
    3: {"name": "Australian Grand Prix", "circuit": "Albert Park Circuit"},
    4: {"name": "Japanese Grand Prix", "circuit": "Suzuka International Racing Course"},
    5: {"name": "Chinese Grand Prix", "circuit": "Shanghai International Circuit"},
    6: {"name": "Miami Grand Prix", "circuit": "Miami International Autodrome"},
    7: {"name": "Emilia Romagna Grand Prix", "circuit": "Autodromo Enzo e Dino Ferrari"},
    8: {"name": "Monaco Grand Prix", "circuit": "Circuit de Monaco"},
    9: {"name": "Spanish Grand Prix", "circuit": "Circuit de Barcelona-Catalunya"},
    10: {"name": "Canadian Grand Prix", "circuit": "Circuit Gilles Villeneuve"},
    11: {"name": "Austrian Grand Prix", "circuit": "Red Bull Ring"},
    12: {"name": "British Grand Prix", "circuit": "Silverstone Circuit"},
    13: {"name": "Belgian Grand Prix", "circuit": "Circuit de Spa-Francorchamps"},
    14: {"name": "Hungarian Grand Prix", "circuit": "Hungaroring"},
    15: {"name": "Dutch Grand Prix", "circuit": "Circuit Zandvoort"},
    16: {"name": "Italian Grand Prix", "circuit": "Autodromo Nazionale di Monza"},
    17: {"name": "Azerbaijan Grand Prix", "circuit": "Baku City Circuit"},
    18: {"name": "Singapore Grand Prix", "circuit": "Marina Bay Street Circuit"},
    19: {"name": "United States Grand Prix", "circuit": "Circuit of the Americas"},
    20: {"name": "Mexico City Grand Prix", "circuit": "Aut√≥dromo Hermanos Rodr√≠guez"},
    21: {"name": "S√£o Paulo Grand Prix", "circuit": "Aut√≥dromo Jos√© Carlos Pace"},
    22: {"name": "Las Vegas Grand Prix", "circuit": "Las Vegas Street Circuit"},
    23: {"name": "Qatar Grand Prix", "circuit": "Lusail International Circuit"},
    24: {"name": "Abu Dhabi Grand Prix", "circuit": "Yas Marina Circuit"}
}

print(f"F1 2025 calendar loaded with {len(F1_2025_CALENDAR)} races")

F1 2025 calendar loaded with 24 races


## 6. F1 Data Retrieval Tools

In [5]:
class F1DataTools:
    """F1 data retrieval using FastF1."""
    
    def get_event_info(self, year: int, round: int) -> Dict[str, Any]:
        """Get event information for a specific race."""
        try:
            event = fastf1.get_event(year, round)
            
            result = {
                "year": year,
                "round": round,
                "event_name": event.EventName,
                "country": event.Country,
                "location": event.Location,
                "official_event_name": event.OfficialEventName,
                "event_date": event.EventDate.isoformat() if hasattr(event.EventDate, 'isoformat') else str(event.EventDate),
                "event_format": event.EventFormat,
            }
            
            print(f"Event info retrieved: {event.EventName}")
            return result
            
        except Exception as e:
            print(f"Error getting event info: {e}")
            return {"error": str(e)}
    
    def get_session_results(self, year: int, round: int, session_type: str = "R") -> Dict[str, Any]:
        """Get session results for a specific race."""
        try:
            session = fastf1.get_session(year, round, session_type)
            session.load()
            
            results = session.results
            
            drivers_results = []
            for idx, row in results.iterrows():
                driver_result = {
                    "position": int(row['Position']) if pd.notna(row['Position']) else None,
                    "driver_number": str(row['DriverNumber']) if pd.notna(row['DriverNumber']) else None,
                    "abbreviation": str(row['Abbreviation']) if pd.notna(row['Abbreviation']) else None,
                    "full_name": str(row['FullName']) if pd.notna(row['FullName']) else None,
                    "team": str(row['TeamName']) if pd.notna(row['TeamName']) else None,
                    "grid_position": int(row['GridPosition']) if pd.notna(row['GridPosition']) else None,
                    "time": str(row['Time']) if pd.notna(row['Time']) else None,
                    "status": str(row['Status']) if pd.notna(row['Status']) else None,
                    "points": float(row['Points']) if pd.notna(row['Points']) else 0.0,
                }
                drivers_results.append(driver_result)
            
            result = {
                "year": year,
                "round": round,
                "session_type": session_type,
                "session_name": session.name,
                "results": drivers_results
            }
            
            print(f"Session results retrieved: {session.name} ({len(drivers_results)} drivers)")
            return result
            
        except Exception as e:
            print(f"Error getting session results: {e}")
            return {"error": str(e)}
    
    def get_driver_info(self, driver: str, year: int) -> Dict[str, Any]:
        """Get driver information."""
        try:
            session = fastf1.get_session(year, 1, 'R')
            session.load()
            
            driver_upper = driver.upper()
            driver_info = None
            
            for idx, row in session.results.iterrows():
                if (str(row['Abbreviation']).upper() == driver_upper or 
                    driver_upper in str(row['FullName']).upper()):
                    driver_info = {
                        "abbreviation": str(row['Abbreviation']),
                        "full_name": str(row['FullName']),
                        "driver_number": str(row['DriverNumber']),
                        "team": str(row['TeamName']),
                        "year": year
                    }
                    break
            
            if driver_info:
                print(f"Driver info retrieved: {driver_info['full_name']}")
                return driver_info
            else:
                return {"error": f"Driver '{driver}' not found in {year} season"}
                
        except Exception as e:
            print(f"Error getting driver info: {e}")
            return {"error": str(e)}

# Initialize tools
f1_tools = F1DataTools()
print("F1DataTools initialized")

F1DataTools initialized


## 7. Agent 1: Data Collection

In [19]:
class DataCollectionAgent:
    """Agent 1: Validates user input and collects comprehensive F1 race data."""
    
    def __init__(self, f1_tools: F1DataTools, calendar: Dict[int, Dict[str, str]]):
        self.f1_tools = f1_tools
        self.calendar = calendar
        self.year = 2025  # Default to 2025 season
    
    def validate_input(self, user_input: str) -> Optional[int]:
        """
        Validate user input and return the round number.
        Accepts either:
        - Round number (e.g., "1", "5", "24")
        - GP name (e.g., "Bahrain", "Monaco Grand Prix")
        
        Returns round number if valid, None otherwise.
        """
        user_input = user_input.strip()
        
        # Try to parse as round number
        try:
            round_num = int(user_input)
            if round_num in self.calendar:
                return round_num
            else:
                print(f"‚ùå Round {round_num} is not valid. Must be between 1 and {len(self.calendar)}.")
                return None
        except ValueError:
            pass
        
        # Try to match GP name
        user_input_lower = user_input.lower()
        for round_num, info in self.calendar.items():
            gp_name_lower = info['name'].lower()
            circuit_name_lower = info['circuit'].lower()
            
            # Match if user input is contained in GP name or circuit name
            if (user_input_lower in gp_name_lower or 
                gp_name_lower.find(user_input_lower) != -1 or
                user_input_lower in circuit_name_lower):
                return round_num
        
        print(f"‚ùå '{user_input}' doesn't match any GP in the {self.year} calendar.")
        return None
    
    def collect_race_data(self, round_num: int) -> Optional[Dict[str, Any]]:
        """
        Collect comprehensive race data for a given round.
        
        Returns dictionary with:
        - GP info (name, circuit, length, laps, fastest lap)
        - Starting grid
        - Final race positions
        - Podium finishers with details
        - Key race events (DNFs, penalties)
        """
        print(f"\nüîç Collecting data for Round {round_num}: {self.calendar[round_num]['name']}")
        
        try:
            # Get event information
            event_info = self.f1_tools.get_event_info(self.year, round_num)
            if "error" in event_info:
                # Try with 2024 data as fallback
                print(f"   ‚ö†Ô∏è {self.year} data not available, using 2024...")
                event_info = self.f1_tools.get_event_info(2024, round_num)
                if "error" in event_info:
                    print(f"   ‚ùå Failed to retrieve event info: {event_info['error']}")
                    return None
                self.year = 2024  # Update year for subsequent calls
            
            # Get race session data
            race_results = self.f1_tools.get_session_results(self.year, round_num, "R")
            if "error" in race_results:
                print(f"   ‚ùå Failed to retrieve race results: {race_results['error']}")
                return None
            
            # Load full session for additional data
            session = fastf1.get_session(self.year, round_num, "R")
            session.load()
            
            # Extract circuit info
            try:
                circuit_info = session.get_circuit_info()
                circuit_length = circuit_info.length  # Length in meters
            except (AttributeError, Exception):
                circuit_length = "N/A"
            
            total_laps = session.total_laps if hasattr(session, 'total_laps') else "N/A"
            
            # Get fastest lap
            laps = session.laps
            if not laps.empty:
                fastest_lap = laps.pick_fastest()
                fastest_lap_time = str(fastest_lap['LapTime']) if pd.notna(fastest_lap['LapTime']) else "N/A"
                fastest_lap_driver = fastest_lap['Driver'] if pd.notna(fastest_lap['Driver']) else "N/A"
            else:
                fastest_lap_time = "N/A"
                fastest_lap_driver = "N/A"
            
            # Process results
            results = race_results['results']
            
            # Extract starting grid (sorted by grid position)
            starting_grid = sorted(
                [r for r in results if r['grid_position'] is not None],
                key=lambda x: x['grid_position']
            )
            
            # Extract podium finishers (top 3)
            podium = [r for r in results if r['position'] in [1, 2, 3]]
            podium = sorted(podium, key=lambda x: x['position'])
            
            # Extract DNFs and key events
            dnfs = [r for r in results if r['status'] != 'Finished' and r['status'] != '+1 Lap' and r['status'] != '+2 Laps']
            
            # Identify key overtakes (drivers who gained significant positions)
            position_changes = []
            for r in results:
                if r['grid_position'] is not None and r['position'] is not None:
                    change = r['grid_position'] - r['position']
                    if change != 0:
                        position_changes.append({
                            'driver': r['full_name'],
                            'team': r['team'],
                            'grid': r['grid_position'],
                            'finish': r['position'],
                            'change': change
                        })
            
            # Sort by biggest gainers
            position_changes = sorted(position_changes, key=lambda x: x['change'], reverse=True)
            
            # Compile comprehensive data
            race_data = {
                "race_id": f"{self.year}_R{round_num}",
                "year": self.year,
                "round": round_num,
                "gp_info": {
                    "name": event_info['event_name'],
                    "official_name": event_info.get('official_event_name', event_info['event_name']),
                    "country": event_info['country'],
                    "location": event_info['location'],
                    "circuit": self.calendar[round_num]['circuit'],
                    "date": event_info['event_date'],
                    "circuit_length_km": circuit_length / 1000 if isinstance(circuit_length, (int, float)) else circuit_length,
                    "total_laps": total_laps,
                    "fastest_lap_time": fastest_lap_time,
                    "fastest_lap_driver": fastest_lap_driver
                },
                "starting_grid": [
                    {
                        "position": r['grid_position'],
                        "driver": r['full_name'],
                        "team": r['team'],
                        "driver_number": r['driver_number']
                    }
                    for r in starting_grid[:10]  # Top 10 starters
                ],
                "final_results": [
                    {
                        "position": r['position'],
                        "driver": r['full_name'],
                        "team": r['team'],
                        "time": r['time'],
                        "points": r['points'],
                        "grid_position": r['grid_position']
                    }
                    for r in results if r['position'] is not None
                ],
                "podium": [
                    {
                        "position": r['position'],
                        "driver": r['full_name'],
                        "team": r['team'],
                        "time": r['time'],
                        "points": r['points'],
                        "grid_position": r['grid_position']
                    }
                    for r in podium
                ],
                "key_events": {
                    "dnfs": [
                        {
                            "driver": r['full_name'],
                            "team": r['team'],
                            "status": r['status']
                        }
                        for r in dnfs
                    ],
                    "biggest_gainers": position_changes[:5],  # Top 5 position gainers
                    "biggest_losers": position_changes[-3:] if len(position_changes) > 3 else []  # Bottom 3
                }
            }
            
            print(f"   ‚úÖ Data collected successfully!")
            
            return race_data
            
        except Exception as e:
            print(f"   ‚ùå Error collecting race data: {e}")
            return None
    
    def run(self, user_input: str) -> Optional[Dict[str, Any]]:
        """
        Main execution method for Agent 1.
        Validates input and collects race data.
        """
        # Validate input
        round_num = self.validate_input(user_input)
        if round_num is None:
            return None
        
        # Collect data
        race_data = self.collect_race_data(round_num)
        return race_data

# Initialize Agent 1
agent1 = DataCollectionAgent(f1_tools, F1_2025_CALENDAR)
print("‚úÖ Agent 1 (Data Collection) initialized")

‚úÖ Agent 1 (Data Collection) initialized


## 8. Agent 2: Report Generation


In [14]:
class ReportGenerationAgent:
    """Agent 2: Generates complete, engaging F1 race reports for social media."""
    
    def __init__(self, model_name: str = 'gemini-1.5-flash'):
        self.model = GenerativeModel(model_name)
    
    def generate_report(self, race_data: Dict[str, Any]) -> Optional[str]:
        """Generate a complete social media post from race data."""
        try:
            # Extract key information
            gp_info = race_data['gp_info']
            podium = race_data['podium']
            winner = podium[0]
            dnfs = race_data['key_events']['dnfs']
            gainers = race_data['key_events']['biggest_gainers'][:3] if race_data['key_events']['biggest_gainers'] else []
            
            print(f"\n‚úçÔ∏è  Generating report for {gp_info['name']}...")
            
            # Create simple, direct prompt
            prompt = f"""Create an engaging Instagram post about this Formula 1 race:

RACE: {gp_info['name']} ({race_data['year']})
CIRCUIT: {gp_info['circuit']}
WINNER: {winner['driver']} ({winner['team']})

PODIUM:
1st: {podium[0]['driver']} ({podium[0]['team']}) - Started P{podium[0]['grid_position']}
2nd: {podium[1]['driver']} ({podium[1]['team']}) - Started P{podium[1]['grid_position']}
3rd: {podium[2]['driver']} ({podium[2]['team']}) - Started P{podium[2]['grid_position']}

{'TOP MOVERS:' if gainers else ''}
{chr(10).join([f"- {g['driver']}: P{g['grid']} ‚Üí P{g['finish']} (+{g['change']} places)" for g in gainers])}

{'RETIREMENTS: ' + str(len(dnfs)) if dnfs else ''}

Write a complete, engaging 200-250 word Instagram post that:
- Starts with an exciting hook with emojis (üèéÔ∏è üèÅ üèÜ)
- Tells the story of the race
- Highlights the podium finishers and key moments
- Ends with an engaging question for fans
- Uses line breaks for readability
- Is COMPLETE - no ellipsis or cutting off mid-sentence

Write ONLY the post content, no labels or sections."""

            # Generate with proper settings
            response = self.model.generate_content(
                prompt,
                generation_config={
                    "max_output_tokens": 2048,
                    "temperature": 0.8,
                    "top_p": 0.95,
                }
            )
            
            report = response.text.strip()
            print(f"   ‚úÖ Generated {len(report)} characters")
            
            return report
            
        except Exception as e:
            print(f"   ‚ùå Error: {e}")
            return None
    
    def run(self, race_data: Dict[str, Any]) -> Optional[Dict[str, Any]]:
        """Main execution - generates report and packages it."""
        if not race_data:
            print("‚ùå No race data provided")
            return None
        
        # Generate the social media report
        social_media_post = self.generate_report(race_data)
        
        if not social_media_post:
            return None
        
        # Package the result
        full_report = {
            "race_id": race_data['race_id'],
            "race_data": race_data,
            "social_media_post": social_media_post,
            "timestamp": datetime.now().isoformat()
        }
        
        return full_report


# Initialize Agent 2
agent2 = ReportGenerationAgent(MODEL_NAME)
print("‚úÖ Agent 2 (Report Generation) initialized")


‚úÖ Agent 2 (Report Generation) initialized




## 9. Complete Workflow: F1 Report System


In [20]:
def generate_f1_report(race_input: str) -> Optional[Dict[str, Any]]:
    """Generate F1 race report from user input."""
    race_data = agent1.run(race_input)
    if not race_data:
        return None
    
    full_report = agent2.run(race_data)
    if not full_report:
        return None
    
    store_report(full_report['race_id'], full_report)
    
    print("\n" + "=" * 70)
    print("üì± SOCIAL MEDIA POST")
    print("=" * 70)
    print(f"\n{full_report['social_media_post']}\n")
    print("=" * 70)
    
    return full_report

print("‚úÖ System ready")


‚úÖ System ready


In [23]:
# Interactive prompt
race_input = input("Which 2025 race do you want to analyze? (Enter round number or GP name): ")
report = generate_f1_report(race_input)


core           INFO 	Loading data for Miami Grand Prix - Race [v3.6.1]
req            INFO 	No cached data found for session_info. Loading data...
_api           INFO 	Fetching session info data...



üîç Collecting data for Round 6: Miami Grand Prix
Event info retrieved: Miami Grand Prix


req            INFO 	Data has been written to cache!
req            INFO 	No cached data found for driver_info. Loading data...
_api           INFO 	Fetching driver list...
req            INFO 	Data has been written to cache!
req            INFO 	No cached data found for session_status_data. Loading data...
_api           INFO 	Fetching session status data...
req            INFO 	Data has been written to cache!
req            INFO 	No cached data found for lap_count. Loading data...
_api           INFO 	Fetching lap count data...
req            INFO 	Data has been written to cache!
req            INFO 	No cached data found for track_status_data. Loading data...
_api           INFO 	Fetching track status data...
req            INFO 	Data has been written to cache!
req            INFO 	No cached data found for _extended_timing_data. Loading data...
_api           INFO 	Fetching timing data...
_api           INFO 	Parsing timing data...
req            INFO 	Data has been written to cache!

Session results retrieved: Race (20 drivers)


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['81', '4', '63', '1', '23', '12', '16', '44', '55', '22', '6', '31', '10', '27', '14', '18', '30', '5', '87', '7']


   ‚úÖ Data collected successfully!

‚úçÔ∏è  Generating report for Miami Grand Prix...
   ‚úÖ Generated 1396 characters

üì± SOCIAL MEDIA POST

üèéÔ∏è üèÅ üèÜ UNBELIEVABLE! The 2025 Miami Grand Prix at the Miami International Autodrome just served up an absolute thriller, culminating in a historic day for McLaren!

Starting from P4, Oscar Piastri delivered a drive of a lifetime, meticulously climbing through the field with surgical precision to claim a magnificent victory ‚Äì his first of the season! The young Aussie was simply unstoppable, showcasing incredible pace and racecraft. But the papaya party was far from over! Teammate Lando Norris, starting from P2, held his ground brilliantly, ensuring a sensational McLaren 1-2 finish! A truly dominant performance that will be talked about for years.

The battle for the final podium spot was fierce, but George Russell, starting P5, navigated the chaos masterfully to secure a hard-fought 3rd place for Mercedes, adding another trophy to 