# F1DB Data Tutorial

This notebook demonstrates how to use the official F1DB repository data (https://github.com/f1db/f1db) in your F1 analysis projects.

F1DB provides comprehensive Formula 1 data from 1950 to present in multiple formats:
- CSV (recommended for data science)
- JSON
- SQL dumps
- SQLite database

## 1. Setup and Installation

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Import the F1DB data loader
from f1db_data_loader import load_f1db_data, F1DBDataLoader

## 2. Load F1DB Data

The data loader will automatically:
1. Check for the latest F1DB release
2. Download the data if not already present
3. Extract and load all CSV files

In [None]:
# Load the latest F1 data
# This will download ~50MB on first run
f1_data = load_f1db_data(data_dir='../../data/f1db')

# Display available datasets
print(f"\nLoaded {len(f1_data)} datasets:")
for name, df in f1_data.items():
    print(f"  • {name}: {len(df):,} rows")

## 3. Explore F1DB Data Structure

F1DB uses a well-structured schema with consistent naming conventions.

In [None]:
# Key datasets
races = f1_data.get('races', pd.DataFrame())
drivers = f1_data.get('drivers', pd.DataFrame())
constructors = f1_data.get('constructors', pd.DataFrame())
results = f1_data.get('results', pd.DataFrame())

# Display sample data
print("Sample Races Data:")
print(races.head())
print(f"\nColumns: {list(races.columns)}")

print("\n" + "="*50 + "\n")

print("Sample Drivers Data:")
print(drivers.head())
print(f"\nColumns: {list(drivers.columns)}")

## 4. Working with F1DB Data

Example: Analyze driver performance over time

In [None]:
# Merge datasets for analysis
if not results.empty and not races.empty and not drivers.empty:
    # Create a comprehensive race results dataset
    race_data = results.merge(races, on='raceId', how='left')
    race_data = race_data.merge(drivers[['driverId', 'driverRef', 'surname', 'forename']], on='driverId', how='left')
    
    # Filter recent years
    recent_data = race_data[race_data['year'] >= 2020]
    
    # Top drivers by points
    driver_points = recent_data.groupby(['driverRef', 'year'])['points'].sum().reset_index()
    
    # Visualize
    plt.figure(figsize=(12, 6))
    top_drivers = driver_points.groupby('driverRef')['points'].sum().nlargest(10).index
    
    for driver in top_drivers:
        driver_data = driver_points[driver_points['driverRef'] == driver]
        plt.plot(driver_data['year'], driver_data['points'], marker='o', label=driver)
    
    plt.xlabel('Year')
    plt.ylabel('Total Points')
    plt.title('Top 10 Drivers Points by Year (2020-present)')
    plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
else:
    print("Data not loaded properly. Please check the data loader.")

## 5. Advanced Usage

### Force data refresh

In [None]:
# Force download of latest data (useful after race weekends)
# f1_data = load_f1db_data(data_dir='../../data/f1db', force_download=True)

### Use specific data format

In [None]:
# Load JSON format instead of CSV
# json_loader = F1DBDataLoader(data_dir='../../data/f1db_json', format='json')
# json_data = json_loader.load_json_data()

### Direct API usage

In [None]:
# Check latest release information
loader = F1DBDataLoader()
release_info = loader.get_latest_release_info()
print(f"Latest F1DB Release: {release_info['tag_name']}")
print(f"Published: {release_info['published_at']}")
print(f"Release Notes: {release_info['name']}")

## 6. Integration with Other Notebooks

To use F1DB data in the other advanced notebooks:

1. Copy `f1db_data_loader.py` to your notebook directory
2. Replace the data loading section with:

```python
from f1db_data_loader import load_f1db_data

# Load F1DB data
f1_data = load_f1db_data()

# Map to expected variable names
results = f1_data['results']
races = f1_data['races']
drivers = f1_data['drivers']
# ... etc
```

The F1DB data is more comprehensive and up-to-date than static CSV files!

## Key Advantages of F1DB

1. **Always Up-to-Date**: New data after every race weekend
2. **Comprehensive**: Includes practice, sprint races, and more detailed data
3. **Multiple Formats**: Choose the best format for your use case
4. **Version Controlled**: Track changes and updates
5. **Open Source**: Free to use with clear licensing (CC BY 4.0)

For more information, visit: https://github.com/f1db/f1db