# F1 Cross-Era Driver Performance Analysis Plan

## 1. Data Source and Collection

### Lap Time Data
- Ergast API provides lap time data from 1996 season
- API endpoint example:
```bash
http://ergast.com/api/f1/2011/5/laps/1.json
```

### Track Selection
- Filter data with circuitId=monza for Monza GP historical records

## 2. Data Preprocessing and Standardization

### 2.1 Missing Value Handling
- Linear interpolation or direct removal for missing data

### 2.2 Lap-wise Standardization (Z-score)
For each lap N, calculate mean $\mu_N$ and standard deviation $\sigma_N$ of all drivers' lap times, then convert each driver's time $t_{iN}$ to:

$$z_{iN} = \frac{t_{iN} - \mu_N}{\sigma_N}$$

### 2.3 Season/Era Baseline Normalization

#### Relative Delta
For each lap, calculate relative difference to the fastest lap of the season:

$$\delta_{iN} = \frac{t_{iN} - \min_j t_{jN}}{\min_j t_{jN}}$$

## 3. Feature Engineering

Based on standardized $z$ or relative delta $\delta$, construct following statistical features:

### Mean Z-score or δ value
$$\bar{z}_i = \frac{1}{L}\sum_{N=1}^{L} z_{iN}$$

### Variance
- $\mathrm{Var}(z_i)$ or $\mathrm{Var}(\delta_i)$ for performance stability

### Additional Features
- Best/worst lap performance
- Median values
- Performance decay slope
- Pit stop/safety car detection

## 4. Regression Model Construction

### 4.1 Target Variable
- Total time difference or transformed race position

### 4.2 Model Selection
- Multiple Linear Regression (OLS)
- Regularized Regression (Ridge, Lasso, Elastic Net)
- Linear Mixed Model:
$$y_i = \beta_0 + \sum_{j} \beta_j x_{ij} + u_{\text{season}(i)} + \varepsilon_i$$

### 4.3 Model Training and Validation
- Time-based split (e.g., 1996-2018 for training, 2019-2024 for testing)
- Grid search for hyperparameter tuning
- Evaluation metrics: MSE, MAE, Spearman ρ

## 5. Cross-Era Prediction Process
1. Input new driver lap time data
2. Season baseline normalization and standardization
3. Feature extraction
4. Model prediction

## 6. Feasibility and Limitations

### Feasibility
- Effective normalization methods
- Statistical features capture driver performance

### Limitations
- Strategy factors missing
- Sudden events impact
- Nonlinear effects

### Future Improvements
- Include tactical data and weather information
- Consider nonlinear regression methods

## Summary
This approach uses lap time data to predict cross-era driver performance at Monza through:
1. Data collection from Ergast API
2. Standardization and normalization
3. Statistical feature construction
4. Regression modeling
5. New data prediction