# Scores & Fixtures Scraper - Initial Inference (2024-2025 Season)

This notebook scrapes match fixtures and results specifically for the **2024-2025 season** for inference purposes.
It replicates the logic from `scores_fixtures.ipynb` but adapted for:

- **Teams**: Inference teams from `inference/raw/all_teams.json` (20 Premier League teams)
- **Season**: 2024-2025 (completed/ongoing matches with actual results)
- **Purpose**: Get fixture data with `match_report_href` links for match stats scraping

## Imports and Setup

In [1]:
import json
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
import random
import re
from urllib.parse import urljoin, urlparse
from datetime import datetime
import os

# Headers to appear more like a regular browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

def get_page(url):
    """Fetch page with error handling and rate limiting"""
    time.sleep(random.uniform(3, 6))  # Be respectful - random delay
    
    try:
        response = requests.get(url, headers=headers, timeout=30)
        response.raise_for_status()
        return BeautifulSoup(response.content, 'html.parser')
    except requests.RequestException as e:
        print(f"Error fetching {url}: {e}")
        return None

print("✅ Libraries imported and basic functions defined")

  from pandas.core import (


✅ Libraries imported and basic functions defined


## Load Inference Teams Data

Load teams from inference directory with adapted structure.

In [2]:
def load_inference_teams(json_filename):
    """
    Load inference teams data from JSON file
    
    Args:
        json_filename (str): Path to inference teams JSON file
    
    Returns:
        dict: Teams data dictionary with structure:
        {
            "team_name": {
                "team_name": "Arsenal",
                "team_id": "18bb7c10", 
                "reference": "/en/squads/18bb7c10/Arsenal-Stats"
            }
        }
    """
    try:
        with open(json_filename, 'r', encoding='utf-8') as f:
            teams_data = json.load(f)
        print(f"✅ Loaded {len(teams_data)} inference teams from {json_filename}")
        return teams_data
    except FileNotFoundError:
        print(f"❌ File not found: {json_filename}")
        return {}
    except json.JSONDecodeError as e:
        print(f"❌ Error parsing JSON: {e}")
        return {}

def adapt_teams_for_fixtures(inference_teams, target_season="2024-2025"):
    """
    Adapt inference teams structure for fixtures extraction
    
    Args:
        inference_teams (dict): Teams from inference/raw/all_teams.json
        target_season (str): Season to extract (default: "2024-2025")
    
    Returns:
        dict: Adapted structure compatible with original extraction logic
        {
            "team_id": {
                "team_name": "Arsenal",
                "team_id": "18bb7c10",
                "seasons": ["2024-2025"]
            }
        }
    """
    adapted_teams = {}
    
    for team_name, team_info in inference_teams.items():
        team_id = team_info['team_id']
        
        adapted_teams[team_id] = {
            'team_name': team_info['team_name'],
            'team_id': team_id,
            'seasons': [target_season]  # Only target the specific season we need
        }
    
    print(f"✅ Adapted {len(adapted_teams)} teams for {target_season} season extraction")
    return adapted_teams

# Load the inference teams data
inference_teams_raw = load_inference_teams('../../../data/prod/inference/raw/all_teams.json')

if inference_teams_raw:
    # Adapt structure for extraction
    inference_teams = adapt_teams_for_fixtures(inference_teams_raw, "2024-2025")
    
    print(f"\n📊 Teams ready for fixtures extraction:")
    for i, (team_id, team_info) in enumerate(list(inference_teams.items())[:5], 1):
        print(f"  {i}. {team_info['team_name']} (ID: {team_id}) - Season: {team_info['seasons'][0]}")
    
    if len(inference_teams) > 5:
        print(f"  ... and {len(inference_teams) - 5} more teams")
else:
    print("❌ No teams data loaded")

✅ Loaded 20 inference teams from ../../../data/prod/inference/raw/all_teams.json
✅ Adapted 20 teams for 2024-2025 season extraction

📊 Teams ready for fixtures extraction:
  1. Arsenal (ID: 18bb7c10) - Season: 2024-2025
  2. Aston Villa (ID: 8602292d) - Season: 2024-2025
  3. Bournemouth (ID: 4ba7cbea) - Season: 2024-2025
  4. Brentford (ID: cd051869) - Season: 2024-2025
  5. Brighton (ID: d07537b9) - Season: 2024-2025
  ... and 15 more teams


## Core Fixtures Extraction Functions

Replicate the core extraction logic from the original notebook.

In [4]:
def extract_scores_fixtures(team_id, season, team_name):
    """
    Extract scores and fixtures information for a team in a specific season
    
    Args:
        team_id (str): FBRef team ID (e.g., 'b8fd03ef')
        season (str): Season in format '2024-2025'
        team_name (str): Team name for URL construction
    
    Returns:
        dict: Dictionary containing match data
    """
    # Construct the team fixtures URL
    team_name_url = team_name.replace(' ', '-').replace("'", "")
    url = f"https://fbref.com/en/squads/{team_id}/{season}/all_comps/{team_name_url}-Stats-All-Competitions"

    print(f"🔍 Fetching fixtures for {team_name} ({season})...")
    print(f"   URL: {url}")
    
    soup = get_page(url)
    if not soup:
        return {}
    
    # Look for fixtures table - uses 'matchlogs_for' table ID
    fixtures_table = soup.find('table', {'id': 'matchlogs_for'})
    
    if not fixtures_table:
        print(f"   ❌ No fixtures table found for {team_name} ({season})")
        # Debug: show available tables
        all_tables = soup.find_all('table')
        table_ids = [table.get('id') for table in all_tables if table.get('id')]
        print(f"   Available tables: {table_ids}")
        return {}
    
    print(f"   ✅ Found fixtures table")
    
    # Initialize fixtures data structure
    fixtures_data = {
        'team_id': team_id,
        'team_name': team_name,
        'season': season,
        'matches': []
    }
    
    # Process fixtures table
    tbody = fixtures_table.find('tbody')
    if tbody:
        rows = tbody.find_all('tr')
    else:
        rows = fixtures_table.find_all('tr')
        # Filter out header rows
        rows = [row for row in rows if row.find('td')]
    
    print(f"   📊 Found {len(rows)} fixture rows")
    
    for row in rows:
        match_data = {}
        
        # Extract all available data columns
        cells = row.find_all(['td', 'th'])
        for cell in cells:
            data_stat = cell.get('data-stat')
            if data_stat:
                cell_text = cell.text.strip()
                if cell_text and cell_text != '':
                    match_data[data_stat] = cell_text
                    
                # Special handling for links (opponent, competition, match_report, etc.)
                cell_link = cell.find('a')
                if cell_link and data_stat:
                    href = cell_link.get('href')
                    if href:
                        match_data[f"{data_stat}_href"] = href
        
        # Only add match if we have meaningful data
        if match_data.get('date') or match_data.get('opponent'):
            fixtures_data['matches'].append(match_data)
    
    print(f"   ✅ Total extracted: {len(fixtures_data['matches'])} matches")
    
    # Show sample match data for debugging
    if fixtures_data['matches']:
        sample_match = fixtures_data['matches'][0]
        print(f"   📋 Sample match data keys: {list(sample_match.keys())[:10]}...")
        if 'match_report_href' in sample_match:
            print(f"   🔗 Sample match report link: {sample_match['match_report_href']}")
    
    return fixtures_data

print("✅ Core extraction function defined")

✅ Core extraction function defined


## Batch Extraction Function

Extract fixtures for all inference teams in the target season.

In [5]:
def extract_all_team_fixtures_inference(teams_dict, target_season="2024-2025"):
    """
    Extract fixtures data for all inference teams in the target season
    
    Args:
        teams_dict (dict): Teams dictionary (adapted structure)
        target_season (str): Season to extract
    
    Returns:
        dict: Complete fixtures dataset organized by team_id and season
    """
    all_fixtures_data = {}
    total_extractions = len(teams_dict)  # Only one season per team
    current_extraction = 0
    successful_extractions = 0
    failed_teams = []
    
    print(f"🚀 Starting fixtures extraction for {len(teams_dict)} inference teams")
    print(f"🎯 Target season: {target_season}")
    print("=" * 80)
    
    for team_id, team_info in teams_dict.items():
        team_name = team_info['team_name']
        seasons = team_info['seasons']  # Should only contain target_season
        
        current_extraction += 1
        print(f"\n🏟️  [{current_extraction}/{total_extractions}] Processing {team_name} (ID: {team_id})")
        
        # Initialize team entry in results
        if team_id not in all_fixtures_data:
            all_fixtures_data[team_id] = {
                'team_name': team_name,
                'team_id': team_id,
                'seasons_data': {}
            }
        
        # Extract fixtures for the target season
        try:
            season_fixtures = extract_scores_fixtures(team_id, target_season, team_name)
            
            if season_fixtures and season_fixtures.get('matches'):
                all_fixtures_data[team_id]['seasons_data'][target_season] = season_fixtures
                match_count = len(season_fixtures['matches'])
                successful_extractions += 1
                print(f"   ✅ Success: {match_count} matches")
                
                # Count matches with match_report_href (completed matches)
                completed_matches = sum(1 for match in season_fixtures['matches'] 
                                      if match.get('match_report_href'))
                print(f"   🔗 Matches with report links: {completed_matches}/{match_count}")
            else:
                failed_teams.append(team_name)
                print(f"   ⚠️  No fixture data found for {team_name} in {target_season}")
                all_fixtures_data[team_id]['seasons_data'][target_season] = None
                
        except Exception as e:
            failed_teams.append(team_name)
            print(f"   ❌ Error extracting {team_name} {target_season}: {str(e)}")
            all_fixtures_data[team_id]['seasons_data'][target_season] = None
        
        # Small delay to be respectful to the server
        if current_extraction < total_extractions:
            print(f"   ⏳ Waiting before next request...")
            time.sleep(2)
    
    # Summary statistics
    total_matches = 0
    total_completed_matches = 0
    
    for team_data in all_fixtures_data.values():
        for season_data in team_data['seasons_data'].values():
            if season_data and season_data.get('matches'):
                matches = season_data['matches']
                total_matches += len(matches)
                total_completed_matches += sum(1 for match in matches 
                                             if match.get('match_report_href'))
    
    print("\n" + "=" * 80)
    print("📈 EXTRACTION SUMMARY:")
    print(f"   Total teams processed: {len(all_fixtures_data)}")
    print(f"   Successful extractions: {successful_extractions}/{total_extractions}")
    print(f"   Failed extractions: {len(failed_teams)}")
    print(f"   Total match records: {total_matches}")
    print(f"   Matches with report links: {total_completed_matches}")
    print(f"   Season: {target_season}")
    
    if failed_teams:
        print(f"\n❌ Teams that failed: {', '.join(failed_teams)}")
    
    print("=" * 80)
    
    return all_fixtures_data

print("✅ Batch extraction function defined")

✅ Batch extraction function defined


## Data Processing and Save Functions

Functions to process and save the extracted fixtures data.

In [6]:
def fixtures_data_to_dataframe(fixtures_data):
    """
    Convert fixtures data dictionary to a pandas DataFrame
    
    Args:
        fixtures_data (dict): Fixtures data from extract_all_team_fixtures_inference()
    
    Returns:
        pd.DataFrame: Flattened DataFrame with one row per match
    """
    all_records = []
    
    for team_id, team_data in fixtures_data.items():
        team_name = team_data['team_name']
        
        for season, season_data in team_data['seasons_data'].items():
            if season_data and season_data.get('matches'):
                
                for match in season_data['matches']:
                    # Create a record for each match
                    record = {
                        'team_id': team_id,
                        'team_name': team_name,
                        'season': season
                    }
                    
                    # Add all match data
                    record.update(match)
                    all_records.append(record)
    
    # Create DataFrame
    df = pd.DataFrame(all_records)
    
    # Reorder columns to match FBRef table structure
    if len(df) > 0:
        # First the team identification columns
        team_columns = ['team_name', 'season', 'team_id']
        
        # Then the actual FBRef fixtures table columns in order
        fixtures_columns = [
            'date', 'time', 'comp','round', 'day', 'venue', 'result', 
            'gf', 'ga', 'opponent', 'xg', 'xga', 'poss', 
            'attendance', 'captain', 'formation', 'formation_opp', 
            'referee', 'match_report', 'notes'
        ]
        
        # Include any additional columns (like href links)
        available_team_cols = [col for col in team_columns if col in df.columns]
        available_fixtures_cols = [col for col in fixtures_columns if col in df.columns]
        other_columns = [col for col in df.columns if col not in team_columns + fixtures_columns]
        
        # Final column order
        final_columns = available_team_cols + available_fixtures_cols + other_columns
        df = df[final_columns]
    
    return df

def save_inference_fixtures_data(fixtures_data, filename):
    """
    Save fixtures data to JSON file with proper serialization
    
    Args:
        fixtures_data (dict): Fixtures data to save
        filename (str): Output filename
    """
    # Create directory if it doesn't exist
    os.makedirs(os.path.dirname(filename), exist_ok=True)
    
    with open(filename, 'w', encoding='utf-8') as f:
        json.dump(fixtures_data, f, indent=2, ensure_ascii=False)
    
    print(f"💾 Fixtures data saved to {filename}")

def save_inference_fixtures_multiple_formats(fixtures_df, base_filename):
    """
    Save fixtures DataFrame in multiple formats for inference
    
    Args:
        fixtures_df (pd.DataFrame): DataFrame to save
        base_filename (str): Base filename without extension
    """
    if fixtures_df.empty:
        print("⚠️  No data to save")
        return
    
    # Create directory if it doesn't exist
    os.makedirs(os.path.dirname(base_filename), exist_ok=True)
    
    # Save as CSV
    csv_filename = f"{base_filename}.csv"
    fixtures_df.to_csv(csv_filename, index=False)
    print(f"💾 CSV saved: {csv_filename}")
    
    # Save as Parquet (efficient for large datasets)
    parquet_filename = f"{base_filename}.parquet"
    fixtures_df.to_parquet(parquet_filename, index=False)
    print(f"💾 Parquet saved: {parquet_filename}")
    
    print(f"✅ Fixtures data saved in multiple formats")

def load_fixtures_json_to_dataframe(json_filename):
    """
    Load fixtures data from JSON file and convert to DataFrame
    
    Args:
        json_filename (str): Path to JSON file
    
    Returns:
        pd.DataFrame: Fixtures data as DataFrame
    """
    with open(json_filename, 'r', encoding='utf-8') as f:
        fixtures_data = json.load(f)
    
    return fixtures_data_to_dataframe(fixtures_data)

print("✅ Data processing and save functions defined")

✅ Data processing and save functions defined


## Test Extraction with Single Team

Test the extraction logic with one team before running the full batch.

In [7]:
# Test with Arsenal first
if 'inference_teams' in locals() and inference_teams:
    print("🧪 TESTING EXTRACTION WITH SINGLE TEAM")
    print("=" * 50)
    
    # Get Arsenal's data for testing
    arsenal_id = None
    arsenal_info = None
    
    for team_id, team_info in inference_teams.items():
        if team_info['team_name'] == 'Arsenal':
            arsenal_id = team_id
            arsenal_info = team_info
            break
    
    if arsenal_info:
        print(f"🎯 Testing with {arsenal_info['team_name']} (ID: {arsenal_id})")
        
        # Test extraction
        test_fixtures = extract_scores_fixtures(
            arsenal_id, 
            "2024-2025", 
            arsenal_info['team_name']
        )
        
        if test_fixtures and test_fixtures.get('matches'):
            matches = test_fixtures['matches']
            completed_matches = [m for m in matches if m.get('match_report_href')]
            
            print(f"\n✅ TEST SUCCESSFUL!")
            print(f"   Total matches: {len(matches)}")
            print(f"   Completed matches (with report links): {len(completed_matches)}")
            
            # Show sample data
            if completed_matches:
                sample = completed_matches[0]
                print(f"\n📋 Sample completed match:")
                print(f"   Date: {sample.get('date', 'N/A')}")
                print(f"   Opponent: {sample.get('opponent', 'N/A')}")
                print(f"   Result: {sample.get('result', 'N/A')}")
                print(f"   Competition: {sample.get('comp', 'N/A')}")
                print(f"   Report link: {sample.get('match_report_href', 'N/A')}")
            
            # Show upcoming fixtures
            upcoming = [m for m in matches if not m.get('match_report_href')]
            if upcoming:
                print(f"\n🔮 Upcoming fixtures: {len(upcoming)} matches")
                sample_upcoming = upcoming[0]
                print(f"   Next: {sample_upcoming.get('date', 'N/A')} vs {sample_upcoming.get('opponent', 'N/A')}")
        else:
            print(f"\n❌ TEST FAILED - No fixtures data extracted")
    else:
        print(f"❌ Arsenal not found in teams data")
else:
    print("❌ No teams data available for testing")

🧪 TESTING EXTRACTION WITH SINGLE TEAM
🎯 Testing with Arsenal (ID: 18bb7c10)
🔍 Fetching fixtures for Arsenal (2024-2025)...
   URL: https://fbref.com/en/squads/18bb7c10/2024-2025/all_comps/Arsenal-Stats-All-Competitions
   ✅ Found fixtures table
   📊 Found 58 fixture rows
   ✅ Total extracted: 58 matches
   📋 Sample match data keys: ['date', 'date_href', 'start_time', 'comp', 'comp_href', 'round', 'round_href', 'dayofweek', 'venue', 'result']...
   🔗 Sample match report link: /en/matches/c0e3342a/Arsenal-Wolverhampton-Wanderers-August-17-2024-Premier-League

✅ TEST SUCCESSFUL!
   Total matches: 58
   Completed matches (with report links): 58

📋 Sample completed match:
   Date: 2024-08-17
   Opponent: Wolves
   Result: W
   Competition: Premier League
   Report link: /en/matches/c0e3342a/Arsenal-Wolverhampton-Wanderers-August-17-2024-Premier-League


## Execute Full Fixtures Extraction

Run the complete extraction for all inference teams.

In [9]:
# Execute full extraction for all inference teams in 2024-2025 season
if 'inference_teams' in locals() and inference_teams:
    print("🚀 STARTING FULL EXTRACTION FOR ALL INFERENCE TEAMS")
    print("=" * 60)
    
    # Run the extraction
    complete_fixtures_dataset = extract_all_team_fixtures_inference(
        inference_teams, 
        target_season="2024-2025"
    )
    
    if complete_fixtures_dataset:
        # Save raw JSON data
        json_filename = '../../../data/prod/inference/raw/fixtures_2024_2025.json'
        save_inference_fixtures_data(complete_fixtures_dataset, json_filename)
        
        # Convert to DataFrame
        fixtures_df = fixtures_data_to_dataframe(complete_fixtures_dataset)
        
        if not fixtures_df.empty:
            print(f"\n✅ SUCCESS! DataFrame created")
            print(f"   Shape: {fixtures_df.shape}")
            print(f"   Columns: {list(fixtures_df.columns)}")
            
            # Save DataFrame in multiple formats
            base_filename = '../../../data/prod/inference/raw/fixtures_2024_2025'
            save_inference_fixtures_multiple_formats(fixtures_df, base_filename)
            
            # Analysis
            print(f"\n📊 DATASET ANALYSIS:")
            print(f"   Total matches: {len(fixtures_df)}")
            print(f"   Teams covered: {fixtures_df['team_name'].nunique()}")
            print(f"   Season: {fixtures_df['season'].iloc[0]}")
            
            # Count matches with report links (completed matches)
            if 'match_report_href' in fixtures_df.columns:
                completed = fixtures_df['match_report_href'].notna().sum()
                print(f"   Completed matches (with report links): {completed}")
                print(f"   Upcoming fixtures: {len(fixtures_df) - completed}")
            
            # Show competitions
            if 'comp' in fixtures_df.columns:
                competitions = fixtures_df['comp'].value_counts()
                print(f"\n🏆 Competitions:")
                for comp, count in competitions.items():
                    print(f"   {comp}: {count} matches")
            
            # Show sample data
            print(f"\n📋 Sample data:")
            display_cols = ['team_name', 'date', 'opponent', 'venue', 'result', 'comp']
            available_cols = [col for col in display_cols if col in fixtures_df.columns]
            print(fixtures_df[available_cols].head())
            
        else:
            print(f"\n❌ No data in DataFrame")
    else:
        print(f"\n❌ No fixtures data extracted")
else:
    print("❌ No teams data available. Please load teams first.")

🚀 STARTING FULL EXTRACTION FOR ALL INFERENCE TEAMS
🚀 Starting fixtures extraction for 20 inference teams
🎯 Target season: 2024-2025

🏟️  [1/20] Processing Arsenal (ID: 18bb7c10)
🔍 Fetching fixtures for Arsenal (2024-2025)...
   URL: https://fbref.com/en/squads/18bb7c10/2024-2025/all_comps/Arsenal-Stats-All-Competitions
   ✅ Found fixtures table
   📊 Found 58 fixture rows
   ✅ Total extracted: 58 matches
   📋 Sample match data keys: ['date', 'date_href', 'start_time', 'comp', 'comp_href', 'round', 'round_href', 'dayofweek', 'venue', 'result']...
   🔗 Sample match report link: /en/matches/c0e3342a/Arsenal-Wolverhampton-Wanderers-August-17-2024-Premier-League
   ✅ Success: 58 matches
   🔗 Matches with report links: 58/58
   ⏳ Waiting before next request...

🏟️  [2/20] Processing Aston Villa (ID: 8602292d)
🔍 Fetching fixtures for Aston Villa (2024-2025)...
   URL: https://fbref.com/en/squads/8602292d/2024-2025/all_comps/Aston-Villa-Stats-All-Competitions
   ✅ Found fixtures table
   📊 Foun

## Results Summary and Next Steps

Summary of extraction results and preparation for match stats scraping.

In [12]:
# Final summary and next steps
print("📋 EXTRACTION COMPLETE - SUMMARY & NEXT STEPS")
print("=" * 60)

# Check if we have the data
results_dir = '../../../data/prod/inference/raw/'
json_file = f'{results_dir}fixtures_2024_2025.json'
csv_file = f'{results_dir}fixtures_2024_2025.csv'

if os.path.exists(json_file) and os.path.exists(csv_file):
    # Load and analyze final results
    final_df = pd.read_csv(csv_file)
    
    print(f"✅ EXTRACTION SUCCESSFUL!")
    print(f"\n📊 Final Dataset Summary:")
    print(f"   • Total records: {len(final_df):,}")
    print(f"   • Teams: {final_df['team_name'].nunique()}")
    print(f"   • Season: 2024-2025")
    
    # Count matches ready for stats scraping
    if 'match_report_href' in final_df.columns:
        completed_matches = final_df['match_report_href'].notna().sum()
        unique_completed = final_df[final_df['match_report_href'].notna()]['match_report_href'].nunique()
        
        print(f"\n🎯 Ready for Match Stats Scraping:")
        print(f"   • Completed match records: {completed_matches:,}")
        print(f"   • Unique completed matches: {unique_completed:,}")
        
        # Create match URLs for stats scraping
        match_urls = final_df[final_df['match_report_href'].notna()]['match_report_href'].unique()
        full_match_urls = ['https://fbref.com' + url for url in match_urls]
        
        print(f"\n🔗 Sample match URLs for stats scraping:")
        for i, url in enumerate(full_match_urls[:3], 1):
            print(f"   {i}. {url}")
        
        print(f"\n💾 Data saved to:")
        print(f"   • JSON: {json_file}")
        print(f"   • CSV: {csv_file}")
        print(f"   • Parquet: {results_dir}fixtures_2024_2025.parquet")
        
        print(f"\n🚀 NEXT STEPS:")
        print(f"   1. ✅ Fixtures data extracted for 2024-2025 season")
        print(f"   2. 🎯 Ready to use match_stats_scraper_inference.ipynb")
        print(f"   3. 🔧 Update match_stats_scraper_inference.ipynb to load this fixtures data")
        print(f"   4. 🏃 Run match stats scraping for {unique_completed} unique matches")
    else:
        print(f"\n⚠️  No match_report_href column found")
else:
    print(f"❌ No results files found. Please run the extraction first.")

print(f"\n" + "=" * 60)

📋 EXTRACTION COMPLETE - SUMMARY & NEXT STEPS
✅ EXTRACTION SUCCESSFUL!

📊 Final Dataset Summary:
   • Total records: 996
   • Teams: 20
   • Season: 2024-2025

🎯 Ready for Match Stats Scraping:
   • Completed match records: 996
   • Unique completed matches: 685

🔗 Sample match URLs for stats scraping:
   1. https://fbref.com/en/matches/c0e3342a/Arsenal-Wolverhampton-Wanderers-August-17-2024-Premier-League
   2. https://fbref.com/en/matches/4692171a/Aston-Villa-Arsenal-August-24-2024-Premier-League
   3. https://fbref.com/en/matches/a843d023/Arsenal-Brighton-and-Hove-Albion-August-31-2024-Premier-League

💾 Data saved to:
   • JSON: ../../../data/prod/inference/raw/fixtures_2024_2025.json
   • CSV: ../../../data/prod/inference/raw/fixtures_2024_2025.csv
   • Parquet: ../../../data/prod/inference/raw/fixtures_2024_2025.parquet

🚀 NEXT STEPS:
   1. ✅ Fixtures data extracted for 2024-2025 season
   2. 🎯 Ready to use match_stats_scraper_inference.ipynb
   3. 🔧 Update match_stats_scraper_infe