<a href="https://colab.research.google.com/github/HannesKock/RaceTeam2_CHP/blob/main/Race_Team_Project_Notebook_12-06_V2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Race Team Project Notebook**
Names: Paula Kussauer, Cedric Schwandt, Hannes Kock

Matr.: Eintragen, 7410658, 7421447

Please write down the Name of the Group member you worked on each section of code. This is necessary for grading by Studienbüro.

# **Preface**

The following texts and code snippets relate to our work for the seminar: "Driven by Data: Leveraging Artificial Intelligence in NOVA Business School’s Racing Simulation."

At the beginning of each Analytics section, we provided an overview of our motivations, goals, and procedures for that particular week. The corresponding code used to implement the described activities can be found within the respective structure.

Throughout the texts and within comments in the code, we distinguish parameters into two categories: Car Parameters and Track/Weather Parameters.  
Car Parameters include: Rear Wing, Front Wing, Engine, Differential, Brake Balance, and Suspension.  
Track/Weather Parameters include: Cornering, Inclines, Camber, Grip, Wind (Average Speed), Temperature, Humidity, Air Density, Air Pressure, Wind (Gusts), Altitude, Roughness, and Width.

Early in the analysis for Race 1, we observed that it is more effective to use the average speed as a target variable rather than lap time. We compute the average speed in km/h by dividing the lap distance by the lap time, then multiplying the result by 3,600. Although this approach is not explicitly mentioned in the texts, it is consistently applied throughout the analysis.

Our optimization approaches can be categorized into two types: "All-In-One Optimization" and "Sequential Optimization."

- "All-In-One Optimization" refers to an approach in which all six car-parameters are optimized simultaneously using a single machine learning (ML) model.
- "Sequential Optimization" refers to an approach where the car-parameters are optimized one after another, potentially using a single or multiple ML models.

These two approaches are discussed in more detail within the respective sections. After their initial explanation, they are not reiterated each time they are referenced.

Within the code, various types of datasets are used. Files that include the term "newnames" in their filenames indicate that the column names have been modified to ensure compatibility with our code. Specifically, the column names were changed to be consistent throughout the files and to eliminate blank spaces within the car-parameter columns.

Additionally, the Practice_Data files are labeled with identifiers such as "V3_2." The "V3" indicates that the data pertains to Race 3, and the "2" signifies that it is the second practice data file exported for that race.

Sources are cited at the beginning of code cells whenever new and relevant components, such as libraries, code logic, or concepts are introduced. After that, the contents and logic are assumed to be understood.

Whenever code utilizes Optuna or Selenium, the corresponding library needs to be installed once per Google Colab runtime. This can be done by running the following commands in a code cell:

!pip install optuna

!pip install selenium


For code cleanliness, these installation cells are included only once, immediately before the section “Analytics for Race 1.”

At certain points in our code, it is required to execute previous code cells to train and save a model, or to generate a merged dataset.


In [None]:
!pip install optuna
!pip install selenium

# **Analytics for Race 1**

The activities from week one can be divided into four main categories:

1. Data Visualization and Understanding the Simulation  


2. Initial Machine Learning Experiments  
Next, we made our first attempts at training and applying machine learning (ML) models. These models helped us visualize the relationships in the data more clearly and served as preliminary steps toward optimizing car parameters.

3. Exploring Sequential Optimization  


4. Developing a Better Strategy Optimization Tool  
Finally, we realized that the existing Excel-based "Race Simulator" was insufficient for strategic optimization. Therefore, we developed a new code that optimizes strategies based on tire endurance, total laps, and a linear relationship between lap time and fuel consumption. The code evaluates all possible strategies, calculates their resulting lap times, and identifies the best one.

**Data Visualization**

We began by analyzing the dataset to gain insights into the simulation environment. First, we read the CSV file "simulator_data.csv" into a pandas DataFrame. We displayed the first few rows, listed the column names, provided summary statistics, and gave a concise overview of the dataset's structure.  

To better understand the data, we created scatter plots to visualize the relationship between each parameter and lap time. During this process, we noticed that the lap distances in "simulator_data.csv" varied. To account for this, we calculated the average speed by dividing "Lap Distance" by "Lap Time" and multiplying by 3600 to convert it to km/h.

Next, we generated scatter plots to examine the relationship between each parameter (excluding "Lap Time" and "Avg. Speed") and the calculated average speed.  

Since these scatter plots did not provide many clear insights, we decided to explore the correlations among the parameters. We computed the correlation matrix and visualized it with a heatmap, which allowed us to easily identify the strength of linear relationships between variables. The analysis revealed that the simulation does not appear to be based on real physics—for example, parameters like air density and air pressure showed little to no correlation.


**Sequential Optimization Process**

After reading the Gordy interview on the Team-Analytics website, we were inspired by his statement: **"We always optimized each decision separately, ensuring that each part of the car was at its best without unnecessary compromises."** This motivated us to try optimizing each car parameter individually, focusing only on the parameters relevant to each specific aspect. This marked the beginning of our pursuit of sequential optimization.

1. Feature Importance Analysis for Overall Influence  
A model using an XGBoost Regressor is trained to predict the average speed based on various environmental and track features. After training, it evaluates feature importances and plots a bar chart illustrating the relative impact of each feature. This helps us understand which variables most strongly influence the vehicle's average speed.

2. Feature Importance Analysis for Car Setup Parameters  
Next, we perform a similar analysis for each individual car setup parameter. Using an XGBoost Regressor, it models the relationship between track/weather parameters and each car parameter. The feature importances are extracted and visualized as bar charts. These insights help identify which track and weather variables most significantly affect each setup parameter.

3. Sequential Optimization of Car Parameters  

- Order of Optimization:  
  Based on the feature importance analyses, we predefined an order for optimizing the car parameters, from most to least influential:
  
  - Engine: Grip, Altitude, Humidity, Air Density, Temperature, Air Pressure, Inclines
  - Differential: Cornering, Width, Inclines, Grip, Temperature, Air Density
  - Rear Wing: Air Pressure, Air Density, Cornering, Inclines, Wind (Average Speed), Humidity, Roughness
  - Brake Balance: Width, Cornering, Roughness, Temperature
  - Front Wing: Air Pressure, Cornering, Air Density, Inclines, Wind (Average Speed), Humidity, Wind (Gusts)
  - Suspension: Grip, Inclines, Cornering, Camber, Width, Roughness  

- Inclusion of Previously Optimized Parameters:  
  In each step of the optimization, we include all parameters that have already been optimized in previous steps. This ensures that the models account for their influence when predicting the impact of the current parameter.

- Optimization Procedure:  
  For each car parameter:
  1. We train a predictive model (using an XGBoost Regressor) to estimate the average speed, utilizing relevant track/weather variables and previously optimized parameters.
  2. Using Optuna, we search for the optimal value of the current parameter to maximize the predicted average speed.

- Testing Different Optimization Orders:  
  We also experimented with reversing the order—starting from the features with the highest importance to the lowest—but this did not improve the results. The original highest-to-lowest order proved to be more effective.



**Race Strategy**  
After working with the Racing_Simulation Excel for a while, we realized that there must be a better way to optimize the strategy, as the Excel approach seemed somewhat imprecise. We thought about this for some time and decided that the most straightforward solution would be to brute-force simulate or calculate every possible strategy and then select the best one.  

To do this, we needed the following data:  
- Total number of race laps  
- Tire durability (number of laps each tire can last)  
- Pit stop time (duration of each pit stop)  
- The relationship between the amount of fuel in the car and lap times (which we assumed to be linear)  

We also assumed that tire condition has no effect on lap times. We made this assumption because there was no way to test or accurately model this before the first race. After the race, we confirmed this assumption by comparing race data with practice data.

In summary, the strategy optimizer systematically explores every feasible combination of tire, fuel, and pit strategies by modeling how fuel load impacts lap times, simulating each scenario, and selecting the most efficient plan. This comprehensive brute-force approach ensures that the optimal strategy is identified based on the parameters and assumptions we have established.

This provides a more precise description of how the process works:


1. Modeling Fuel Load and Its Effect on Lap Times:
  - The relationship between fuel in the car and lap times was established through practice laps.  
  - During these practice runs, data was collected with both minimal and maximal fuel loads:  
    - The minimal fuel load (corresponding to the start of a stint) produces the fastest lap times.  
    - The maximal fuel load (the highest amount of fuel needed to complete the entire stint or race) results in the slowest lap times.  
  - The lap times from these runs were used to fit linear functions for each tire compound, where:  
    - The constant term (intercept) represents the lap time when the car has minimal fuel.  
    - The variable coefficient reflects how lap times increase proportionally with the amount of fuel carried.  
    - If `x = 1`, the function estimates the lap time when the car has one lap of fuel remaining; if `x = 30`, it corresponds to the lap time when the car has 30 laps of fuel remaining.  
    - By summing the estimated lap times for `x` values from 1 to the total number of laps in a stint, the total time for that stint can be approximated.  

2. Generating Feasible Strategies:
   - The algorithm considers all possible ways to split the total race laps into multiple stints, each with a specific fuel load and tire choice.
   - Each strategy consists of a sequence of stints, with each stint:
     - Using a specific tire compound (soft, medium, or hard).
     - Running for a number of laps determined based on tire lifespan and strategic considerations.
   - Constraints ensure no stint exceeds the maximum number of laps the tire can sustain (its durability), and the total combined laps cover the entire race.

3. Simulating Each Strategy:
   - For every candidate strategy, the code simulates the race by:
     - Calculating lap times for each stint using the fuel load linear functions, reflecting the impact of fuel on speed.
     - Summing all laps within each stint to get the total time for that segment.
     - Adding pit stop times (penalties) between stints, except after the last to reflect real race conditions.
   - This process estimates the total race duration associated with each specific combination of tires and fuel loads.

4. Evaluating and Selecting the Optimal Strategy:
   - After simulating all possible strategies, the code compares their total race times.
   - It identifies the strategy with the shortest overall race time as the optimal plan.
   - The output includes:
     - The sequence of tire choices and stint lengths.
     - The planned pit stops.
     - The expected total race time.


In [None]:
# Team members working on this code: Paula Kussauer, Cedric Schwandt, Hannes Kock

## Visualizing the Data

First we read the CSV file"simulator_data.csv" into a pandas DataFrame for analysis. We displays the first few rows, lists the column names, provides summary statistics, and shows a concise overview of the dataset's structure.


In [None]:
#Reading Data and summarizing contens
#https://www.datacamp.com/tutorial/pandas-read-csv
import pandas as pd

df = pd.read_csv("simulator_data.csv")
print(df.head())
print(df.columns)
print(df.describe())
print(df.info())


To see what the Data looks like we create scatter plots showing the
relationship between each parameter in the dataset and lap time.

In [None]:
# Scatterplots (Parameter vs. Lap Time)
#https://www.w3schools.com/python/python_ml_scatterplot.asp

import matplotlib.pyplot as plt
import seaborn as sns

param_cols = df.columns.drop("Lap Time")

for col in param_cols:
    plt.figure()
    sns.scatterplot(x=df[col], y=df["Lap Time"], alpha=0.3)
    plt.title(f"{col} vs. Lap Time")
    plt.show()


We noticed that the Laps within the "Simulator_Data.csv" have different lengths, therefore we divide "Lap Distance" by "Lap_time" and multiply by 3600 to get "Avg. Speed" in km/h.

Then we again generate scatter plots to visualize the relationship between each parameter (excluding "Lap Time" and "Avg. Speed") and the calculated average speed.

In [None]:
# Scatterplots (Parameter vs. Avg. Speed)

import matplotlib.pyplot as plt
import seaborn as sns

# calculating avg. speed (falls noch nicht vorhanden)
if "Avg. Speed" not in df.columns:
    df["Avg. Speed"] = df["Lap Distance"] / (df["Lap Time"] / 3600)

# Drop coulums exept "Lap Time" and "Avg. Speed"
param_cols = df.columns.drop(["Lap Time", "Avg. Speed"])

# create scatterplots.
for col in param_cols:
    plt.figure()
    sns.scatterplot(x=df[col], y=df["Avg. Speed"], alpha=0.3)
    plt.title(f"{col} vs. Avg. Speed")
    plt.xlabel(col)
    plt.ylabel("Avg. Speed (km/h)")
    plt.grid(True)
    plt.tight_layout()
    plt.show()

Because the scatter plots did not lead to much insights, we wanted to see how the different parameters correlate with each other. We hoped to be able to use logic to figure out how to set the car paramters.

We calculated the correlation matrix to explore how the variables relate to each other. The correlations are visualized using a heatmap to easily identify weak or strong linear relationships between parameters.

In [None]:
#https://www.geeksforgeeks.org/how-to-create-a-seaborn-correlation-heatmap-in-python/
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Step 1: Load your CSV file into a pandas DataFrame
df = pd.read_csv("simulator_data.csv")  # replace with your file path

# Step 2: Calculate the correlation matrix
corr_matrix = df.corr()

# Step 3: Plot the correlation matrix
plt.figure(figsize=(20, 20))  # adjust size as needed
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", fmt=".2f", square=True, vmin=-0.05, vmax=0.05)
plt.title("Correlation Matrix")
plt.show()

plt.figure(figsize=(20, 20))  # adjust size as needed
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", fmt=".2f", square=True, vmin=-0.5, vmax=0.5)
plt.title("Correlation Matrix")
plt.show()

## First Tries at using ML Models

The next four code blocks are first trys at using ML Models to predict and further understand the Parameters.

These attempts did not lead to much, they are still in here because they are an important step on the way to our Final Solution for this weeks race analytics.

The four Codes are:
1. first try to use a randomForrest Model and optuna to find best car parameters
2. To further understand the model parameters, we used a ML Model to create a SHAP-Diagram
3. Using the Model from the SHAP-Diagram, we try to find optimal Carparameters using optuna
4. to improve the ML Model-Parameters, we do a Grid-Search which took forever and did not lead to noticably better results.

In [None]:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import optuna

# === 1. Daten einlesen ===
df = pd.read_csv("simulator_data.csv")

# Ziel- und Feature-Spalten
target_col = "Lap Time"
feature_cols = df.columns.drop(target_col)

# === 2. Modell trainieren ===
X = df[feature_cols]
y = df[target_col]

# Optional: Skalieren
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Training/Test-Split (z. B. für Validierung)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Random Forest Regressor
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# === 3. Bayesian Optimization Setup (Optuna) ===

# Gegeben: Umgebungsbedingungen
fixed_conditions = {
    'Lap Distance': 3.7,
    'Cornering': 6,
    'Inclines': 20,
    'Camber': 44,
    'Grip': 1,
    'Wind (Avg. Speed)': 97,
    'Temperature': 29,
    'Humidity': 23,
    'Air Density': 70,
    'Air Pressure': 98,
    'Wind (Gusts)': 49,
    'Altitude': 31,
    'Roughness': 49,
    'Width': 29
}

def objective(trial):
    # Optimierbare Fahrzeugparameter
    params = {
        'Rear Wing': trial.suggest_float('Rear Wing', 0.0, 500),
        'Engine': trial.suggest_float('Engine', 0.0, 500),
        'Front Wing': trial.suggest_float('Front Wing', 0.0, 500),
        'Brake Balance': trial.suggest_float('Brake Balance', 0.0, 500),
        'Differential': trial.suggest_float('Differential', 0.0, 500),
        'Suspension': trial.suggest_float('Suspension', 0.0, 500),
    }

    # Kombinieren mit festen Bedingungen
    full_input = {**fixed_conditions, **params}
    X_input = pd.DataFrame([full_input])
    X_input_scaled = scaler.transform(X_input)

    # Vorhersage durch Modell
    lap_time = model.predict(X_input_scaled)[0]
    return lap_time

# Optuna-Studie starten
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)

# Ergebnisse
print("Beste Parameterkombination:")
for key, value in study.best_params.items():
    print(f"  {key}: {value:.4f}")
print(f"Erwartete Rundenzeit: {study.best_value:.4f} Sekunden")

In [None]:
import shap
import numpy  as np
import xgboost as xgb
from sklearn.model_selection import train_test_split

# 1. Falls noch nicht geschehen: Durchschnittsgeschwindigkeit berechnen
if "Avg. Speed" not in df.columns:
    df["Avg. Speed"] = df["Lap Distance"] / (df["Lap Time"] / 3600)

# 2. Features & Ziel definieren
feature_cols = [
    'Lap Distance', 'Cornering', 'Inclines', 'Camber', 'Grip',
    'Wind (Avg. Speed)', 'Temperature', 'Humidity', 'Air Density',
    'Air Pressure', 'Wind (Gusts)', 'Altitude', 'Roughness', 'Width',
    'Rear Wing', 'Engine', 'Front Wing', 'Brake Balance', 'Differential',
    'Suspension'
]

X = df[feature_cols]
y = df["Avg. Speed"]

# 3. Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# 4. Modell trainieren (XGBoost)
#model = xgb.XGBRegressor(n_estimators=100, max_depth=5, random_state=42)
model = xgb.XGBRegressor(colsample_bytree=1.0, learning_rate = 0.05, max_depth = 6, min_child_weight = 1, n_estimators = 300, subsample = 0.8)
model.fit(X_train, y_train)
y_hat = model.predict(X_test)
error = np.mean(np.abs(y_test - y_hat))
print(error)

# # 5. SHAP-Werte berechnen
explainer = shap.Explainer(model)
shap_values = explainer(X_test)

# # 6. SHAP Summary Plot
shap.summary_plot(shap_values, X_test)


In [None]:

import optuna
import numpy as np
import pandas as pd

# You can get this from df.mean().values or pick a sample row
base_input = df[feature_cols].mean().values

for fc in fixed_conditions:
    base_input[feature_cols.index(fc)] = fixed_conditions[fc]

# Indices of the features we want to optimize
optim_features = ['Rear Wing', 'Engine', 'Front Wing', 'Brake Balance', 'Differential', 'Suspension']
optim_indices = [feature_cols.index(f) for f in optim_features]

# Define bounds from your dataset (here we use min/max)
feature_bounds = {
    'Rear Wing': (1, 500),
    'Engine': (1, 500),
    'Front Wing': (1, 500),
    'Brake Balance': (1, 500),
    'Differential': (1, 500),
    'Suspension': (1, 500),
}

# Objective function for Optuna
def objective(trial):
    x = base_input.copy()

    for f in optim_features:
        val = trial.suggest_int(f, int(feature_bounds[f][0]), int(feature_bounds[f][1]))
        x[feature_cols.index(f)] = val

    pred = model.predict(np.array([x]))[0]
    return pred  # Optuna will maximize if we tell it to

# Run Optuna study
optuna.logging.disable_default_handler()
study = optuna.create_study(direction='maximize')  # use 'minimize' if lower lap time is better
study.optimize(objective, n_trials=255)


# Show results
print("Best params:")
for k, v in study.best_params.items():
    print(f"{k}: {v}")

print(f"Max predicted avg speed: {study.best_value}")


In [None]:
#Grid Search for best Model Parameters

import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error
from xgboost import XGBRegressor
import numpy as np

# Load the dataset
df = pd.read_csv("simulator_data.csv")

# Create the target variable: average speed
df['Avg Speed'] = df['Lap Distance'] / df['Lap Time']

# Drop unused columns
X = df.drop(columns=['Lap Distance', 'Lap Time', 'Avg Speed'])
y = df['Avg Speed']

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Define XGBoost Regressor
model = XGBRegressor(objective='reg:squarederror', random_state=42)

# Define grid search parameters
param_grid = {
    'max_depth': [4, 6, 8],
    'learning_rate': [0.01, 0.05, 0.1],
    'n_estimators': [100, 300],
    'subsample': [0.8, 1.0],
    'colsample_bytree': [0.8, 1.0],
    'min_child_weight': [1, 5]
}

# Set up GridSearchCV
grid_search = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    scoring='neg_root_mean_squared_error',
    cv=3,
    verbose=2,
    n_jobs=-1
)

# Fit model
grid_search.fit(X_train, y_train)

# Best model
best_model = grid_search.best_estimator_
print("Best Parameters:", grid_search.best_params_)

# Evaluate
y_pred = best_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print(f"Test RMSE: {rmse:.4f}")


## Starting the work on the sequential Optimizer

### Determining, which Parameters have the biggest impacts on Lap Times / Speeds.



This code trains an XGBoost Regressor model to predict the average speed of a car based on various track and environmental features. First, it loads the data and calculates the average speed by dividing the lap distance by the lap time. The code then preprocesses the data by removing unnecessary columns and splits it into training and testing sets. After training the model, it evaluates the feature importances to determine which variables most strongly influence the predicted average speed. **Finally, it plots a bar chart to visually display the feature importances, providing insights into the relative impact of each feature.**

In [None]:
#https://stackabuse.com/bytes/get-feature-importance-from-xgbregressor-with-xgboost/
import pandas as pd
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Load and preprocess
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = df["Lap Distance"] / df["Lap Time"]

X = df.drop(columns=["Lap Distance", "Lap Time", "Avg Speed"])
y = df["Avg Speed"]

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = XGBRegressor()
model.fit(X_train, y_train)

# Plot feature importance
importances = model.feature_importances_
feat_importance = pd.Series(importances, index=X.columns).sort_values(ascending=False)

feat_importance.plot(kind="bar", figsize=(12, 6), title="Feature Importance")
plt.tight_layout()
plt.show()



### Determining, which Track/Weather Parameters have the biggest impacts on each car parameter

This code performs feature importance analysis for various car setup parameters based on environmental and track/weather conditions. It uses the XGBoost Regressor to model the relationship between the track/weather variables and each car setup parameter. The data is split into training and test sets, and the model is evaluated using the R² score to measure prediction accuracy. After training, the code extracts and plots the feature importances for each car parameter, helping to identify which track/weather variables have the most significant impact on each car setup.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Load your data
df = pd.read_csv("simulator_data.csv")

# Define inputs (track/weather features) and targets (car parameters)
track_weather_features = [
    'Lap Distance', 'Cornering', 'Inclines', 'Camber', 'Grip',
    'Wind (Avg. Speed)', 'Temperature', 'Humidity', 'Air Density',
    'Air Pressure', 'Wind (Gusts)', 'Altitude', 'Roughness', 'Width'
]

car_setup_params = ['Rear Wing', 'Engine', 'Front Wing', 'Brake Balance', 'Differential', 'Suspension']

# Store feature importances for each car parameter
feature_importance_dict = {}

# Loop through car parameters
for param in car_setup_params:
    X = df[track_weather_features]
    y = df[param]

    # Train/test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train model
    model = XGBRegressor(random_state=42)
    model.fit(X_train, y_train)

    # Predict and calculate R² score
    y_pred = model.predict(X_test)
    score = r2_score(y_test, y_pred)
    print(f"\n{param} - R²: {score:.3f}")

    # Get feature importances and filter by importance > 0.1
    importances = model.feature_importances_
    importance_series = pd.Series(importances, index=X.columns).sort_values(ascending=False)

    # Store importances
    feature_importance_dict[param] = importance_series

    # Display the features with importance > 0.1
    print(f"\nFor {param}, track/weather parameters with importance > 0.1:")
    for feature, importance in importance_series.items():
        if importance > 0.075:
            print(f" - {feature}: {importance:.3f}")

    # Optional: Plot feature importances
    importance_series.plot(kind='barh', title=f"Feature Importance for {param}", figsize=(8, 5))
    plt.gca().invert_yaxis()
    plt.tight_layout()
    plt.show()

In [None]:
#Basiclly the same as the code before, but the other car parameters are also included in the models to determine the parameter importance
#This is done to see, how different car parameters might influence each other

import pandas as pd
import matplotlib.pyplot as plt
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Load your data
df = pd.read_csv("simulator_data.csv")

# Define inputs (track/weather features) and targets (car parameters)
track_weather_features = [
    'Lap Distance', 'Cornering', 'Inclines', 'Camber', 'Grip',
    'Wind (Avg. Speed)', 'Temperature', 'Humidity', 'Air Density',
    'Air Pressure', 'Wind (Gusts)', 'Altitude', 'Roughness', 'Width'
]

car_setup_params = ['Rear Wing', 'Engine', 'Front Wing', 'Brake Balance', 'Differential', 'Suspension']

# Store feature importances for each car parameter
feature_importance_dict = {}

# Loop through car parameters
for param in car_setup_params:
    # Include track/weather features and other car parameters (excluding the target parameter itself)
    other_car_params = [p for p in car_setup_params if p != param]
    X = df[track_weather_features + other_car_params]
    y = df[param]

    # Train/test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train model
    model = XGBRegressor(random_state=42)
    model.fit(X_train, y_train)

    # Predict and calculate R² score
    y_pred = model.predict(X_test)
    score = r2_score(y_test, y_pred)
    print(f"\n{param} - R²: {score:.3f}")

    # Get feature importances and sort them
    importances = model.feature_importances_
    importance_series = pd.Series(importances, index=X.columns).sort_values(ascending=False)

    # Store importances
    feature_importance_dict[param] = importance_series

    # Display the features with importance > 0.075
    print(f"\nFor {param}, all parameters with importance > 0.075:")
    for feature, importance in importance_series.items():
        if importance > 0.075:
            print(f" - {feature}: {importance:.3f}")

    # Optional: Plot feature importances
    importance_series.plot(kind='barh', title=f"Feature Importance for {param}", figsize=(8, 5))
    plt.gca().invert_yaxis()  # Invert y-axis for better readability
    plt.tight_layout()
    plt.show()


### Sequential optimization of car-parameters.

We optimize each Car-parameter in a predefined order, ensuring that each optimization builds upon the previous ones. (Order: Feature Importance (highest to lowest)(see above))

For each car parameter, we first train a predictive model for average speed using relevant track/weather variables and already optimized parameters. Then, using Optuna, we search for the optimal value of the current car parameter that maximizes the predicted average speed.

What track and wheather parameters are important for each Car-Parameter was determined before (see above)

Order of optimization and relevant track/weather parameters:

1. Engine: Grip, Altitude, Humidity, Air Density, Temperature, Air Pressure, Inclines
2. Differetial: Cornering, Width, Inclines, Grip, Temprature, Air Density
3. Rear Wing: Air Pressure, Air Density, Cornering, Inclines, Wind (Avg. Speed), Humidity, Roughness
4. Break Balance: Width, Cornering, Roughness, Temperature
5. Front Wing: Air Pressure, Cornering, Air Density, Inclines, Wind (Avg. Speed), Humidity, Wind (Gusts)
6. Suspension: Grip, Inclines, Cornering, Camber, Width, Roughness


In [None]:
import pandas as pd
import numpy as np
import optuna
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"])*3600

# Fixed track & weather settings — user-defined!
fixed_conditions = {
    "Cornering": 6,
    "Inclines": 20,
    "Camber": 44,
    "Grip": 1,
    "Altitude": 31,
    "Roughness": 49,
    "Width": 29,
    "Temperature": 29,
    "Humidity": 23,
    "Wind (Avg. Speed)": 97,
    "Wind (Gusts)": 49,
    "Air Density": 70,
    "Air Pressure": 98,
}

# Order of optimization and relevant features
optimization_order = [
    ("Engine", ["Grip", "Altitude", "Humidity", "Air Density", "Temperature", "Air Pressure", "Inclines"]),
    ("Differential", ["Cornering", "Width", "Inclines", "Grip", "Temperature", "Air Density"]),
    ("Rear Wing", ["Air Pressure", "Air Density", "Cornering", "Inclines", "Wind (Avg. Speed)", "Humidity", "Roughness"]),
    ("Brake Balance", ["Width", "Cornering", "Roughness", "Temperature"]),
    ("Front Wing", ["Air Pressure", "Cornering", "Air Density", "Inclines", "Wind (Avg. Speed)", "Humidity", "Wind (Gusts)"]),
    ("Suspension", ["Grip", "Inclines", "Cornering", "Camber", "Width", "Roughness"]),
]

# Storage for optimized parameters
optimized_params = {}


# Optimization loop
for param, relevant_features in optimization_order:
    print(f"\n Optimizing {param}...")

    # Features for model = track/weather + already optimized + current param
    model_features = relevant_features + list(optimized_params.keys()) + [param]
    model_df = df[model_features + ["Avg Speed"]].dropna()

    X = model_df[model_features]
    y = model_df["Avg Speed"]

    # Train model
    model = XGBRegressor(random_state=42)
    model.fit(X, y)

    # Prepare fixed input for this stage
    input_row = {f: fixed_conditions[f] for f in relevant_features}
    input_row.update(optimized_params)  # Include already optimized parameters

    def objective(trial):
        trial_value = trial.suggest_int(param, 1, 500)
        row = input_row.copy()
        row[param] = trial_value
        df_input = pd.DataFrame([row])
        pred = model.predict(df_input)[0]
        return pred  # Maximizing Avg Speed

    optuna.logging.disable_default_handler()
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=10)
    print(f"Max predicted avg speed: {study.best_value}")

    best_val = study.best_params[param]
    optimized_params[param] = best_val

    print(f" Best {param}: {best_val}")

# Final output
print("\n All optimized parameters:")
for k, v in optimized_params.items():
    print(f"{k}: {v}")


In [None]:
All optimized parameters:
Engine: 30
Differential: 1
Rear Wing: 48
Brake Balance: 3
Front Wing: 37
Suspension: 124

In [None]:
#Same code as above, but the optimization order of the Car-Parameters is flipped.

import pandas as pd
import numpy as np
import optuna
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"])*3600

# Fixed track & weather settings — user-defined!
fixed_conditions = {
    "Cornering": 6,
    "Inclines": 20,
    "Camber": 44,
    "Grip": 1,
    "Altitude": 31,
    "Roughness": 49,
    "Width": 29,
    "Temperature": 29,
    "Humidity": 23,
    "Wind (Avg. Speed)": 97,
    "Wind (Gusts)": 49,
    "Air Density": 70,
    "Air Pressure": 98,
}

# Order of optimization and relevant features
optimization_order = [
    ("Suspension", ["Grip", "Inclines", "Cornering", "Camber", "Width", "Roughness"]),
    ("Front Wing", ["Air Pressure", "Cornering", "Air Density", "Inclines", "Wind (Avg. Speed)", "Humidity", "Wind (Gusts)"]),
    ("Brake Balance", ["Width", "Cornering", "Roughness", "Temperature"]),
    ("Rear Wing", ["Air Pressure", "Air Density", "Cornering", "Inclines", "Wind (Avg. Speed)", "Humidity", "Roughness"]),
    ("Differential", ["Cornering", "Width", "Inclines", "Grip", "Temperature", "Air Density"]),
    ("Engine", ["Grip", "Altitude", "Humidity", "Air Density", "Temperature", "Air Pressure", "Inclines"])
]

# Storage for optimized parameters
optimized_params = {}


# Optimization loop
for param, relevant_features in optimization_order:
    print(f"\n Optimizing {param}...")

    # Features for model = track/weather + already optimized + current param
    model_features = relevant_features + list(optimized_params.keys()) + [param]
    model_df = df[model_features + ["Avg Speed"]].dropna()

    X = model_df[model_features]
    y = model_df["Avg Speed"]

    # Train model
    model = XGBRegressor(random_state=42)
    model.fit(X, y)

    # Prepare fixed input for this stage
    input_row = {f: fixed_conditions[f] for f in relevant_features}
    input_row.update(optimized_params)  # Include already optimized parameters

    def objective(trial):
        trial_value = trial.suggest_int(param, 1, 500)
        row = input_row.copy()
        row[param] = trial_value
        df_input = pd.DataFrame([row])
        pred = model.predict(df_input)[0]
        return pred  # Maximizing Avg Speed

    #optuna.logging.disable_default_handler()
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=1000)
    print(f"Max predicted avg speed: {study.best_value}")

    best_val = study.best_params[param]
    optimized_params[param] = best_val

    print(f" Best {param}: {best_val}")

# Final output
print("\n All optimized parameters:")
for k, v in optimized_params.items():
    print(f"{k}: {v}")


## Race Strategy

In [None]:
import math

# Define the tire parameters and their lap time formulas
def lap_time_super_soft(X):
    return (69.269-0.141812499999999) + 0.141812499999999 * X

def lap_time_soft(X):
    return (70.53-0.0667741935483869) + 0.0667741935483869 * X

def lap_time_medium(X):
    return (70.927-0.0670857142857143) + 0.0670857142857143 * X

def lap_time_hard(X):
    return (70.257-0.109609756097561) + 0.109609756097561 * X

# Tire data with their lifespan
tire_lifespan = {
    "super_soft": 16,
    "soft": 30,
    "medium": 35,
    "hard": 41
}

# Pit stop penalty
pit_stop_time = 30  # seconds

# Function to calculate the total race time for a given strategy
def calculate_race_time(laps, strategy):
    total_time = 0
    total_pit_stops = 0
    lap_index = 0
    lap_counter = 0

    while lap_counter < laps:
        tire, stint_laps = strategy[lap_index]

        # Ensure we don't exceed the total laps
        if lap_counter + stint_laps > laps:
            stint_laps = laps - lap_counter

        # Calculate the lap times for this stint
        lap_times = []
        for i in range(stint_laps):
            if tire == "super_soft":
                lap_times.append(lap_time_super_soft(i + 1))
            elif tire == "soft":
                lap_times.append(lap_time_soft(i + 1))
            elif tire == "medium":
                lap_times.append(lap_time_medium(i + 1))
            elif tire == "hard":
                lap_times.append(lap_time_hard(i + 1))

        total_time += sum(lap_times)  # Add the lap times of this stint
        lap_counter += stint_laps

        # If we are not at the last stint, account for a pit stop
        if lap_counter < laps:
            total_time += pit_stop_time  # Pit stop penalty
            total_pit_stops += 1

        lap_index += 1
        if lap_index >= len(strategy):
            break

    return total_time, total_pit_stops

# Function to generate possible strategies dynamically
def generate_strategies(laps):
    strategies = []
    tire_choices = ["super_soft", "soft", "medium", "hard"]

    # Generate strategies by breaking the laps into multiple stints
    for tire1 in tire_choices:
        for tire2 in tire_choices:
            for tire3 in tire_choices:
                #for tire4 in tire_choices:
                  #for tire5 in tire_choices:
                    strategy = []
                    remaining_laps = laps

                    # Create dynamic stints for each tire
                    #for tire in [tire1, tire2, tire3, tire4, tire5]:
                    #for tire in [tire1, tire2, tire3, tire4]:
                    for tire in [tire1, tire2, tire3]:
                    #for tire in [tire1, tire2]:
                        stint_laps = tire_lifespan[tire]

                        if remaining_laps > stint_laps:
                            strategy.append((tire, stint_laps))
                            remaining_laps -= stint_laps
                        else:
                            strategy.append((tire, remaining_laps))
                            break

                    if sum([stint[1] for stint in strategy]) == laps:
                        strategies.append(strategy)

    return strategies

# Function to find the best strategy
def optimize_strategy(laps):
    best_time = math.inf
    best_strategy = None

    strategies = generate_strategies(laps)

    for strategy in strategies:
        total_time, pit_stops = calculate_race_time(laps, strategy)
        if total_time < best_time:
            best_time = total_time
            best_strategy = strategy
            best_pit_stops = pit_stops

    return best_strategy, best_time, best_pit_stops


# Main function
if __name__ == "__main__":
    race_laps = 83
    best_strategy, best_time, total_pit_stops = optimize_strategy(race_laps)
    print(f"Best Strategy: {best_strategy}")
    print(f"Best Total Time: {best_time} seconds")
    print(f"Total Pit Stops: {total_pit_stops}")


In [None]:
Best Strategy: [('super_soft', 16), ('super_soft', 16), ('super_soft', 16), ('super_soft', 16), ('soft', 19)]
Best Total Time: 5972.774387096774 seconds

Best Strategy: [('super_soft', 16), ('super_soft', 16), ('soft', 30), ('soft', 21)]
Best Total Time: 5980.742354838709 seconds

Best Strategy: [('soft', 30), ('soft', 30), ('soft', 23)]
Best Total Time: 5988.977419354838 seconds

Random Car-Parameters, to see, if optimized values are actually better than other values.

In [None]:
from random import randint

print(f'Rear Wing {randint(1, 500)}')
print(f'Engine {randint(1, 500)}')
print(f'Front Wing {randint(1, 500)}')
print(f'Brakebalance {randint(1, 500)}')
print(f'Differential {randint(1, 500)}')
print(f'Suspension {randint(1, 500)}')



# **Analytics for Race 2**

**Overview of Our Approach for Race Two**

Our work for Race Two can be divided into three main parts:

1. We attempted to build an "All-In-One" optimization method again.
2. We further developed our sequential optimization approach.
3. We applied our race-strategy code from the first race to decide on an optimal strategy.


One significant change was switching from XGBoost to LightGBM, based on discussions in class. This switch affected both our "All-In-One" and sequential optimization methods. Additionally, instead of training models within the code during runtime, we now train them beforehand and load the pre-trained models as needed.


**1: "All-In-One"-Optimization:**

We learned online that, for LightGBM, hyperparameter tuning plays a crucial role. Therefore, we began by optimizing hyperparameters specifically for our use case and dataset.

After tuning the hyperparameters, we evaluated the model's performance using the R² metric, both with and without hyperparameter tuning. Once satisfied with the hyperparameters, we trained a LightGBM model on the full dataset and stored it for later use.

We then used this trained model in conjunction with Optuna to optimize car parameters, fixing track and weather conditions. Finally, we developed code to import the trained model and further train it using practice data for improved performance.




**2: Sequential optimization:**

This week, we reflected on how to improve our sequential optimization approach. We devised a plan that not only involved switching to LightGBM but also included six detailed steps:

1. Which Car-Parameters should use which track/weather-parameters

2. In what order should the Car-Parameters be optimized?

3. Write Code, that optimizes each Models Hyperparameters using Optuna and K-Fold-cross-validation.

4. Train and Save each Model with full data set (no Train/Test-Split)

5. Optimize Models using Practice_Data

6. Write Code, that sequentially optimizes Car-Parameters with existing Models

The first and second steps of this plan involve using feature importance to determine the order and parameters relevant for optimizing the car parameters.

To select which Track and Weather parameters should be used for optimizing each car parameter, we plotted the feature importance for each car parameter. All resulting graphs revealed a clear cutoff point, where the importance of subsequent Track and Weather parameters significantly decreased. We chose to use the parameters above each cutoff point for the optimization.

Regarding the order of optimization, we decided to proceed from the most important to the least important parameters. This approach ensures that the parameters with the greatest influence are optimized first, with the lowest possible bias from other already fixed parameters.

In addition to the selected Track and Weather parameters for each optimization step, we also include each previously optimized car parameter as part of the parameter set for subsequent optimization steps.

As a third step, we developed code to optimize the hyperparameters for each of the six parameter combinations. After completing this process, we trained and saved each model.

With the trained models in hand, we used a modified version of our sequential optimizer from week one to initiate the first round of optimization. Following several practice rounds, we employed the code from Step 5 to further train the six models using the practice data.



**3: Race Strategy:**

We used the same Race Strategy Calculator from week one with updated Racedata.  

**Hyperparameter Choice, Parameter Variation, Track/Weather as .csv and optimization procedure:**

**Hyperparameter Choice:**

We chose these hyperparameters because they have a significant impact on the model’s performance and ability to generalize to unseen data. Controlling model complexity through `num_leaves`, `max_depth`, and `min_data_in_leaf` helps prevent overfitting and underfitting, ensuring the model is sufficiently expressive for the problem at hand. The `learning_rate` was selected to balance training speed and convergence, allowing the model to learn effectively without overshooting optimal solutions. Sampling parameters such as `feature_fraction`, `bagging_fraction`, and `bagging_freq` were included to promote diversity during training, which enhances robustness and reduces overfitting. Finally, `lambda_l1` and `lambda_l2` introduce regularization, which is crucial when working with high-dimensional data, as it helps prevent the model from becoming overly complex. Together, these parameters provide a comprehensive strategy to improve model performance, stability, and generalization capabilities.


**Parameter Variation:**
We encountered a situation where our sequential optimizer produced the same set of parameters every time it was run. While this consistency indicates that the approach yields low variability in the results— which is not necessarily a problem—it posed a challenge when trying to create practice data for further training of the LightGBM models. Fortunately, our all-in-one optimizer exhibited considerable variability in the parameter sets it generated. To leverage this advantage, we limited the parameter ranges within the all-in-one optimizer to focus around the parameters suggested by the sequential optimizer. This approach aimed to generate parameter combinations that the all-in-one optimizer considered optimal. These combinations were then used to create practice data for further model training. The size of the parameter ranges in the all-in-one optimizer was determined by the sensitivity of the individual car parameters, with higher sensitivity leading to narrower ranges.

This method was based on the assumption that our sequential optimizer identified values that were reasonably close to optimal, meaning the parameters it suggested were not entirely incorrect. This assumption allowed us to focus the optimization process and refine the models effectively.

**Track/Weather as .csv**
Another improvement we made this week was to import the track and weather parameters from a .csv file, instead of setting them directly within the code. This approach not only cleans up the code but also makes it easier to adapt for future races


**Optimization Procedure:**
We performed several iterations of retraining our models, during which we varied the parameters using the all-in-one optimizer. The resulting data from each iteration was fed back into the models to further refine and improve their performance.

In [None]:
# Team members working on this code: Paula Kussauer, Cedric Schwandt, Hannes Kock

## AllInOne Optimization

In [None]:
# Team members working on this code: Names..

### Hyperparameter Tuning for Full LGBM

In [None]:
#https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html

import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"])*3600

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    X = df.drop(columns=['Lap Distance', 'Lap Time', 'Avg Speed'])
    y = df["Avg Speed"]
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train)
        dvalid = lgb.Dataset(X_val, label=y_val)

        model = lgb.train(param, dtrain, num_boost_round=100,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        #rmse = mean_squared_error(y_val, preds, squared=False)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=500)

print("Best trial:")
print(study.best_trial.params)


In [None]:
#Hyperparameter Tuning for two datasets

import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load both datasets
sim_df = pd.read_csv("simulator_data.csv")
prac_df = pd.read_csv("practice_dataV2_6.csv")

# Calculate target: Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0  # Adjust this ratio to control importance (10000/40 = 250)

# Merge datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Prepare features and target
X = df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time","Round", "Track", "Qualifying", "Stint", "Lap", "Fuel", "Tyre Remaining", "Tyre Choice"])
y = df["Avg Speed"]
weights = df["weight"]

# Objective function for Optuna
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True
    }

    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        w_train, w_val = weights.iloc[train_idx], weights.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train, weight=w_train)
        dvalid = lgb.Dataset(X_val, label=y_val, weight=w_val)

        model = lgb.train(param, dtrain, num_boost_round=1000,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)

print("Best trial:")
print(study.best_trial.params)


### R^2 Test LightGBM

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import KFold
from sklearn.metrics import r2_score
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time"])
y = df["Avg Speed"]


params = {
    'objective': 'regression',
    'metric': 'rmse',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 242,
    'max_depth': 9,
    'learning_rate': 0.0645474493988489,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9427185636727835,
    'bagging_fraction': 0.785711983453578,
    'bagging_freq': 2,
    'lambda_l1': 0.6006835353397709,
    'lambda_l2': 1.4905819957883624,
    'objective': 'regression',
    'metric': 'l2',
    }


# K-Fold Cross Validation
kf = KFold(n_splits=5, shuffle=True, random_state=42)
r2_scores = []

for train_index, val_index in kf.split(X):
    X_train, X_val = X.iloc[train_index], X.iloc[val_index]
    y_train, y_val = y.iloc[train_index], y.iloc[val_index]

    # LightGBM Dataset
    lgb_train = lgb.Dataset(X_train, y_train)

    # Train model
    model = lgb.train(
        params,
        train_set=lgb_train,
        num_boost_round=300  )

    # Predict and evaluate
    y_pred = model.predict(X_val)
    r2 = r2_score(y_val, y_pred)
    r2_scores.append(r2)
    print(f"Fold R^2 Score: {r2:.4f}")

# Overall R^2
print(f"\nAverage R^2 Score: {np.mean(r2_scores):.4f}")


### Training LGBM-Model with Simulator_Data and optm. Hyperparameters

In [None]:
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time"])
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 242,
    'max_depth': 9,
    'learning_rate': 0.0645474493988489,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9427185636727835,
    'bagging_fraction': 0.785711983453578,
    'bagging_freq': 2,
    'lambda_l1': 0.6006835353397709,
    'lambda_l2': 1.4905819957883624,
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("best_lgbm_model.txt")

print("Model training complete and saved to 'best_lgbm_model.txt'")


In [None]:
#Directly trained LGBM-Model with Simulator_Data and Practice_data using optm. Hyperparameters
#Hyperparameters optm. for Simulator and Practice_data

import pandas as pd
import numpy as np
import lightgbm as lgb

# Load both datasets
sim_df = pd.read_csv("simulator_data.csv")
prac_df = pd.read_csv("practice_dataV2_6.csv")

# Compute Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0

# Make sure both have the same columns in the same order
required_cols = ["Lap Distance", "Lap Time", "Avg Speed", "weight", "Rear Wing", "Front Wing", "Engine", "Brake Balance", "Differential", "Suspension", "Cornering", "Inclines", "Camber", "Grip", "Altitude", "Roughness", "Width", "Temperature", "Humidity", "Wind (Avg. Speed)", "Wind (Gusts)", "Air Density", "Air Pressure"] + \
                [col for col in sim_df.columns if col not in ["Lap Distance", "Lap Time", "Avg Speed", "weight", "Rear Wing", "Front Wing", "Engine", "Brake Balance", "Differential", "Suspension", "Cornering", "Inclines", "Camber", "Grip", "Altitude", "Roughness", "Width", "Temperature", "Humidity", "Wind (Avg. Speed)", "Wind (Gusts)", "Air Density", "Air Pressure"]]

sim_df = sim_df[required_cols]
prac_df = prac_df[required_cols]

# Combine datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Features, target, and weights
X = df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time", "weight"])
y = df["Avg Speed"]
weights = df["weight"]

# Define LightGBM dataset with weights
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (already tuned)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 74,
    'max_depth': 11,
    'learning_rate': 0.028870630893942515,
    'min_data_in_leaf': 13,
    'feature_fraction': 0.7399238619191133,
    'bagging_fraction': 0.9203471068533383,
    'bagging_freq': 6,
    'lambda_l1': 0.025758523534949052,
    'lambda_l2': 0.9073066525703107}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("sim_prac_lgbm_model.txt")

print("Model training complete and saved to 'sim_prac_lgbm_model.txt'")


### Improving previously traind LGBM-Model using Pracice_Data

In [None]:
import pandas as pd
import lightgbm as lgb

# Load the original model
model = lgb.Booster(model_file="best_lgbm_model.txt")

# Load the new data
new_df = pd.read_csv("practice_dataV2_5.csv")

# Compute Avg Speed for new data
new_df["Avg Speed"] = (new_df["Lap Distance"] / new_df["Lap Time"]) * 3600

# Prepare features and target
X_new = new_df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time","Round", "Track", "Qualifying", "Stint", "Lap", "Fuel", "Tyre Remaining", "Tyre Choice"])
y_new = new_df["Avg Speed"]

# Assign higher weights (e.g., 10x more important)
sample_weight = [1000] * len(y_new)

# Create new LightGBM Dataset
new_data = lgb.Dataset(X_new, label=y_new, weight=sample_weight)

# Continue training from previous model
model = lgb.train(
    params={},  # Empty here since model already knows them
    train_set=new_data,
    init_model=model,
    num_boost_round=1000,  # You can increase if needed
)

# Save updated model
model.save_model("best_lgbm_model_updated.txt")

print("Model updated with new data and saved to 'best_lgbm_model_updated.txt'")


### Used to test prediction performance against practice_data

In [None]:
import pandas as pd
import lightgbm as lgb
import numpy as np

# Load the trained model
model = lgb.Booster(model_file="best_lgbm_model.txt")

# Load the test data
test_df = pd.read_csv("practice_data_snd_half.csv")

# Calculate actual average speed
test_df["Actual Avg Speed"] = (test_df["Lap Distance"] / test_df["Lap Time"]) * 3600

# Prepare the test features (same feature engineering as training)
X_test = test_df.drop(columns=["Actual Avg Speed", "Lap Distance", "Lap Time","Round", "Track", "Qualifying", "Stint", "Lap", "Fuel", "Tyre Remaining", "Tyre Choice"])

# Predict using the model
test_df["Predicted Avg Speed"] = model.predict(X_test)

# Calculate difference
test_df["Difference"] = test_df["Predicted Avg Speed"] - test_df["Actual Avg Speed"]

# Output the results
print(test_df[["Predicted Avg Speed", "Actual Avg Speed", "Difference"]])

# Summary statistics
mae = np.mean(np.abs(test_df["Difference"]))
rmse = np.sqrt(np.mean(test_df["Difference"] ** 2))
max_diff = np.max(np.abs(test_df["Difference"]))

print("\nSummary:")
print(f"Mean Absolute Error: {mae:.2f}")
print(f"Root Mean Squared Error: {rmse:.2f}")
print(f"Maximum Absolute Difference: {max_diff:.2f}")


### Parameter Optimization

In [None]:
import pandas as pd
import lightgbm as lgb
import optuna


# Load trained model
#model = lgb.Booster(model_file="best_lgbm_model_updated.txt")
model = lgb.Booster(model_file="sim_prac_lgbm_model.txt")

# Load track/weather data (single row)
track_weather = pd.read_csv("track_weather_germany.csv")

# Define the objective function
def objective(trial):
    # Suggest car parameters
    params = {
        "Engine": trial.suggest_int("Engine", 1, 500),
        "Rear Wing": trial.suggest_int("Rear Wing", 1, 500),
        "Front Wing": trial.suggest_int("Front Wing", 1, 500),
        "Brake Balance": trial.suggest_int("Brake Balance", 1, 500),
        "Suspension": trial.suggest_int("Suspension", 1, 500),
        "Differential": trial.suggest_int("Differential", 1, 500),
    }
#Engine: 91
#Differential: 101
#RearWing: 489
#FrontWing: 274
#Suspension: 45
#BrakeBalance: 101

    # Combine with static track/weather parameters
    input_data = pd.concat([track_weather, pd.DataFrame([params])], axis=1)

    # Predict average speed
    predicted_avg_speed = model.predict(input_data)[0]

    # We want to maximize speed
    return -predicted_avg_speed

# Create study
study = optuna.create_study(direction="minimize")

# Optimize
study.optimize(objective, n_trials=100)

# Output best results
best_trial = study.best_trial
print("\nBest Parameters:")
for key, value in best_trial.params.items():
    print(f"{key}: {value}")
print(f"Predicted Avg Speed: {-best_trial.value:.2f}")

In [None]:

Best Parameters:
Engine: 133
Rear Wing: 39
Front Wing: 67
Brake: 217
Suspension: 95
Differential: 88
Predicted Avg Speed: --187.46

Engine: 91
Differential: 101
RearWing: 489
FrontWing: 274
Suspension: 45
BrakeBalance: 101

In [None]:
#Set the intervals around the supposedly optimal parameters


In [None]:
import pandas as pd
import lightgbm as lgb
import optuna
import threading

# Load trained model
#model = lgb.Booster(model_file="best_lgbm_model_updated.txt")
model = lgb.Booster(model_file="best_lgbm_model.txt")

# Load track/weather data (single row)
track_weather = pd.read_csv("track_weather_germany.csv")

# Store all best results
all_best_results = []

# Optional: Start dashboard for one of the runs
def run_dashboard(study):
    run_server(study)

# Run 20 optimization loops
for run in range(1, 11):
    print(f"\n🔁 Optimization Run {run}/20")

    # Define objective function (wrapped inside loop to keep it clean)
    def objective(trial):
        params = {
            "Engine": trial.suggest_int("Engine", 85, 115),
            "Rear Wing": trial.suggest_int("Rear Wing", 499, 500),
            "Front Wing": trial.suggest_int("Front Wing", 70, 90),
            "Brake": trial.suggest_int("Brake", 85, 115),
            "Suspension": trial.suggest_int("Suspension", 20, 50),
            "Differential": trial.suggest_int("Differential", 68, 98),
        }

#Engine:         91  +-30    => 85 - 115
#Differential:  101  +-30    => 68 - 98
#RearWing:      489  +-125   => 499 - 500
#FrontWing:     274  +-125   => 70 - 90
#Suspension:     45  +-125  => 20 - 50
#BrakeBalance:  101  +-125   => 85 - 115

        input_data = pd.concat([track_weather, pd.DataFrame([params])], axis=1)
        predicted_avg_speed = model.predict(input_data)[0]
        return predicted_avg_speed  # maximize speed

    # Create a new study for each run
    study = optuna.create_study(direction="maximize")


    # Run optimization (adjust trials as needed)
    study.optimize(objective, n_trials=100)

    # Collect results
    best_trial = study.best_trial
    result = {
        "Run": run,
        "Predicted Avg Speed": best_trial.value,
        **best_trial.params
    }
    all_best_results.append(result)

# Convert to DataFrame
results_df = pd.DataFrame(all_best_results)
print("\n📊 Summary of All Runs:")
print(results_df)

# Optionally save to CSV
results_df.to_csv("optimization_results.csv", index=False)


In [None]:
#Engine:         91  +-30    => 61 - 121
#Differential:  101  +-30    => 51 - 151
#RearWing:      489  +-125   => 364 - 500
#FrontWing:     274  +-125   => 149 - 399
#Suspension:     45  +-125  => 5 - 170
#BrakeBalance:  101  +-125   => 0 - 226

In [None]:
📊 Summary of All Runs: (ohne update)
   Run  Predicted Avg Speed  Engine  Rear Wing  Front Wing  Brake  Suspension  \
0    1           186.038712      89        411         170    100          91
1    2           185.970222     111        369         156    210          80
2    3           186.081557     111        478         163    101          82
3    4           185.960439     105        496         154    176          97
4    5           185.998318      93        445         198    121          99
5    6           186.093750     109        427         152    107          98
6    7           186.047609      89        463         198    116          92
7    8           186.013795     110        459         149     99          88
8    9           186.048086      94        437         155    101         106
9   10           186.066058      90        388         158    114          94

   Differential
0            86
1            82
2            83
3            94
4            82
5            82
6            83
7           100
8            91
9            85

## Sequential Optimization LightGBM

Steps:

1. Which Car-Parameters should use which track/weather-parameters

2. In what order should the Car-Parameters be optimized?

3. Write Code, that optimizes each Models Hyperparameters using Optuna and K-Fold-cross-validation.

4. Train and Save each Model with full data set (no Train/Test-Split)

5. Optimize Models using Practice_Data

6. Write Code, that sequentially optimizes Car-Parameters with existing Models







### First Step: Which Car-Parameters should use which track/weather-parameters?



Rear Wing
 - Air Density
 - Cornering
 - Air Pressure
 - Inclines
 - Wind (Avg. Speed)
 - Humidity
 - Roughness

Engine
 - Grip
 - Humidity
 - Air Density
 - Altitude
 - Temperature
 - Inclines
 - Air Pressure
 - Cornering

Front Wing
 - Cornering
 - Air Pressure
 - Air Density
 - Inclines
 - Wind (Avg. Speed)
 - Humidity
 - Wind (Gusts)

Brake Balance
 - Cornering
 - Width
 - Roughness
 - Temperature

Differential
 - Cornering
 - Width
 - Inclines
 - Grip
 - Temperature
 - Air Density

Suspension
 - Grip
 - Inclines
 - Cornering
 - Camber
 - Roughness
 - Width

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Load your data
df = pd.read_csv("simulator_data.csv")

# Define inputs (track/weather features) and targets (car parameters)
track_weather_features = [
    'Cornering', 'Inclines', 'Camber', 'Grip',
    'Wind (Avg. Speed)', 'Temperature', 'Humidity', 'Air Density',
    'Air Pressure', 'Wind (Gusts)', 'Altitude', 'Roughness', 'Width'
]

car_setup_params = ['Rear Wing', 'Engine', 'Front Wing', 'Brake Balance', 'Differential', 'Suspension']

# Store feature importances for each car parameter
feature_importance_dict = {}

# Loop through car parameters
for param in car_setup_params:
    X = df[track_weather_features]
    y = df[param]

    # Train/test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train model
    model = LGBMRegressor(random_state=42)
    model.fit(X_train, y_train)

    # Predict and calculate R² score
    y_pred = model.predict(X_test)
    score = r2_score(y_test, y_pred)
    print(f"\n{param} - R²: {score:.3f}")

    # Get feature importances and filter by importance > 0.1
    importances = model.booster_.feature_importance(importance_type='gain')
    importance_series = pd.Series(importances, index=X.columns).sort_values(ascending=False)

    # Store importances
    feature_importance_dict[param] = importance_series

    # Display the features with importance > 0.075
    print(f"\nFor {param}, track/weather parameters with importance > 0.075:")
    for feature, importance in importance_series.items():
        if importance > 0.075:
            print(f" - {feature}: {importance:.3f}")

    # Optional: Plot feature importances
    importance_series.plot(kind='barh', title=f"Feature Importance for {param}", figsize=(8, 5))
    plt.gca().invert_yaxis()
    plt.tight_layout()
    plt.show()


### Second Step: In what order should the Car-Parameters be optimized?

1. Engine
2. Differential
3. Rear Wing
4. Front Wing
5. Suspension
6. Break Balance

In [None]:
import pandas as pd
import lightgbm as lgb
import matplotlib.pyplot as plt

# Load the data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Define X and y
X = df.drop(columns=['Lap Distance', 'Lap Time', 'Avg Speed'])
y = df["Avg Speed"]

# Set the hyperparameters
params = {
    'objective': 'regression',
    'metric': 'rmse',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 242,
    'max_depth': 9,
    'learning_rate': 0.0645474493988489,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9427185636727835,
    'bagging_fraction': 0.785711983453578,
    'bagging_freq': 2,
    'lambda_l1': 0.6006835353397709,
    'lambda_l2': 1.4905819957883624,
    'objective': 'regression',
    'metric': 'l2',
}

# Train the model
train_data = lgb.Dataset(X, label=y)
model = lgb.train(params,
                  train_data,
                  num_boost_round=100)

# Get feature importance by 'split' and 'gain'
importance_split = model.feature_importance(importance_type='split')
importance_gain = model.feature_importance(importance_type='gain')

# Sort feature importance by 'split'
sorted_split_idx = importance_split.argsort()[::1]
sorted_split_importance = importance_split[sorted_split_idx]
sorted_split_features = X.columns[sorted_split_idx]

# Sort feature importance by 'gain'
sorted_gain_idx = importance_gain.argsort()[::1]
sorted_gain_importance = importance_gain[sorted_gain_idx]
sorted_gain_features = X.columns[sorted_gain_idx]

# Plot feature importance by 'split'
plt.figure(figsize=(12, 6))
plt.barh(sorted_split_features, sorted_split_importance , color='red')
plt.title('Feature Importance by Split')
plt.xlabel('Number of Splits')
plt.ylabel('Features')
plt.show()

# Plot feature importance by 'gain'
plt.figure(figsize=(12, 6))
plt.barh(sorted_gain_features, sorted_gain_importance, color='blue')
plt.title('Feature Importance by Gain')
plt.xlabel('Gain')
plt.ylabel('Features')
plt.show()


### Third Step: Write Code, that optimizes each Models Hyperparameters using Optuna and K-Fold-cross-validation.

In [None]:
#Optimized Hyperparameters:
Engine:
{'num_leaves': 281, 'max_depth': 11, 'learning_rate': 0.04083240077095524, 'min_data_in_leaf': 96, 'feature_fraction': 0.9534759457760267, 'bagging_fraction': 0.6637656869307713, 'bagging_freq': 7, 'lambda_l1': 0.48486124811843856, 'lambda_l2': 1.0469435993282974}
Differential:
{'num_leaves': 126, 'max_depth': 8, 'learning_rate': 0.05725939263903659, 'min_data_in_leaf': 96, 'feature_fraction': 0.9048274481436259, 'bagging_fraction': 0.6871315558035545, 'bagging_freq': 1, 'lambda_l1': 2.9809700145880162, 'lambda_l2': 2.893530172381254}
Rear Wing:
{'num_leaves': 41, 'max_depth': 14, 'learning_rate': 0.07379289439034513, 'min_data_in_leaf': 14, 'feature_fraction': 0.9805583874353474, 'bagging_fraction': 0.757494844528942, 'bagging_freq': 1, 'lambda_l1': 3.3280185770912905, 'lambda_l2': 4.668868493827121}
Front Wing:
{'num_leaves': 228, 'max_depth': 6, 'learning_rate': 0.10991909073754202, 'min_data_in_leaf': 94, 'feature_fraction': 0.954670793846081, 'bagging_fraction': 0.7659843231590522, 'bagging_freq': 1, 'lambda_l1': 3.8616883301361553, 'lambda_l2': 4.896984209036385}
Suspension:
{'num_leaves': 31, 'max_depth': 10, 'learning_rate': 0.14051851191638579, 'min_data_in_leaf': 75, 'feature_fraction': 0.999888882811791, 'bagging_fraction': 0.7009625088481276, 'bagging_freq': 1, 'lambda_l1': 4.8851994495913145, 'lambda_l2': 3.6824821872767783}
Break Balance:
{'num_leaves': 27, 'max_depth': 11, 'learning_rate': 0.18696777761444966, 'min_data_in_leaf': 73, 'feature_fraction': 0.964098272670725, 'bagging_fraction': 0.8852821315081366, 'bagging_freq': 10, 'lambda_l1': 3.486399193949678, 'lambda_l2': 3.7053586457221366}

#### HP-optimization code for each Car-Param.

In [None]:
#Hyperparameter Optimization Engine
#https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html

import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"])*3600

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    X = df[['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure' , 'Cornering']]
    y = df["Avg Speed"]
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train)
        dvalid = lgb.Dataset(X_val, label=y_val)

        model = lgb.train(param, dtrain, num_boost_round=100,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        #rmse = mean_squared_error(y_val, preds, squared=False)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=500)

print("Best trial:")
print(study.best_trial.params)


In [None]:
#Hyperparameter Optimization Differential
#https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html

import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"])*3600

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    X = df[['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density']]
    y = df["Avg Speed"]
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train)
        dvalid = lgb.Dataset(X_val, label=y_val)

        model = lgb.train(param, dtrain, num_boost_round=100,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        #rmse = mean_squared_error(y_val, preds, squared=False)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=500)

print("Best trial:")
print(study.best_trial.params)


In [None]:
#Hyperparameter Optimization Rear Wing
#https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html

import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"])*3600

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    X = df[['Rear Wing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure' , 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness']]
    y = df["Avg Speed"]
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train)
        dvalid = lgb.Dataset(X_val, label=y_val)

        model = lgb.train(param, dtrain, num_boost_round=100,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        #rmse = mean_squared_error(y_val, preds, squared=False)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=500)

print("Best trial:")
print(study.best_trial.params)


In [None]:
#Hyperparameter Optimization Front Wing
#https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html

import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"])*3600

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    X = df[['Front Wing', 'Rear Wing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)']]
    y = df["Avg Speed"]
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train)
        dvalid = lgb.Dataset(X_val, label=y_val)

        model = lgb.train(param, dtrain, num_boost_round=100,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        #rmse = mean_squared_error(y_val, preds, squared=False)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=500)

print("Best trial:")
print(study.best_trial.params)


In [None]:
#Hyperparameter Optimization Suspension
#https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html

import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"])*3600

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    X = df[['Suspension', 'Front Wing', 'Rear Wing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width']]
    y = df["Avg Speed"]
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train)
        dvalid = lgb.Dataset(X_val, label=y_val)

        model = lgb.train(param, dtrain, num_boost_round=100,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        #rmse = mean_squared_error(y_val, preds, squared=False)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=500)

print("Best trial:")
print(study.best_trial.params)


In [None]:
#Hyperparameter Optimization Brake Balance
#https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html

import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"])*3600

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    X = df[['Brake Balance', 'Suspension', 'Front Wing', 'Rear Wing', 'Differential', 'Engine','Cornering', 'Width', 'Roughness', 'Temperature']]
    y = df["Avg Speed"]
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train)
        dvalid = lgb.Dataset(X_val, label=y_val)

        model = lgb.train(param, dtrain, num_boost_round=100,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        #rmse = mean_squared_error(y_val, preds, squared=False)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=500)

print("Best trial:")
print(study.best_trial.params)


### Fourth Step: Train and Save each Model with full data set (no Train/Test-Split)

In [None]:
Brake Balance: 	X = df[['Brake_Balance', 'Suspension', 'Front_Wing', 'Rear_Wing', 'Differential', 'Engine','Cornering', 'Width', 'Roughness', 'Temperature']]
Suspension: 	X = df[['Suspension', 'Front_Wing', 'Rear_Wing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width']]
Front Wing: 	X = df[['Front_Wing', 'Rear_Wing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)']]
Rear Wing: 	X = df[['Rear_Wing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure' , 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness']]
Differential: 	X = df[['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density']]
Engine: 	X = df[['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure' , 'Cornering']]

In [None]:
#Optimized Hyperparameters:
Engine:
{'num_leaves': 281, 'max_depth': 11, 'learning_rate': 0.04083240077095524, 'min_data_in_leaf': 96, 'feature_fraction': 0.9534759457760267, 'bagging_fraction': 0.6637656869307713, 'bagging_freq': 7, 'lambda_l1': 0.48486124811843856, 'lambda_l2': 1.0469435993282974}
Differential:
{'num_leaves': 126, 'max_depth': 8, 'learning_rate': 0.05725939263903659, 'min_data_in_leaf': 96, 'feature_fraction': 0.9048274481436259, 'bagging_fraction': 0.6871315558035545, 'bagging_freq': 1, 'lambda_l1': 2.9809700145880162, 'lambda_l2': 2.893530172381254}
Rear Wing:
{'num_leaves': 41, 'max_depth': 14, 'learning_rate': 0.07379289439034513, 'min_data_in_leaf': 14, 'feature_fraction': 0.9805583874353474, 'bagging_fraction': 0.757494844528942, 'bagging_freq': 1, 'lambda_l1': 3.3280185770912905, 'lambda_l2': 4.668868493827121}
Front Wing:
{'num_leaves': 228, 'max_depth': 6, 'learning_rate': 0.10991909073754202, 'min_data_in_leaf': 94, 'feature_fraction': 0.954670793846081, 'bagging_fraction': 0.7659843231590522, 'bagging_freq': 1, 'lambda_l1': 3.8616883301361553, 'lambda_l2': 4.896984209036385}
Suspension:
{'num_leaves': 31, 'max_depth': 10, 'learning_rate': 0.14051851191638579, 'min_data_in_leaf': 75, 'feature_fraction': 0.999888882811791, 'bagging_fraction': 0.7009625088481276, 'bagging_freq': 1, 'lambda_l1': 4.8851994495913145, 'lambda_l2': 3.6824821872767783}
Break Balance:
{'num_leaves': 27, 'max_depth': 11, 'learning_rate': 0.18696777761444966, 'min_data_in_leaf': 73, 'feature_fraction': 0.964098272670725, 'bagging_fraction': 0.8852821315081366, 'bagging_freq': 10, 'lambda_l1': 3.486399193949678, 'lambda_l2': 3.7053586457221366}

#### Code to build Model for each Car-Param.

**"_newnames" explained:**
because of how the following code works, the coloums in the Data need to have names, that fit a certain scheme. The coloums cant have blank spaces between words, the blank spaces can also not be replaced with underscores (_).
Also the trained Models need to have the exact same name, as their corresponding car-parameter.

Rear Wing: RearWing

Front Wing: FrontWing

Brake: BrakeBalance

This needs to be done for Simulator and Practice_Data.

In [None]:
#Engine (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure' , 'Cornering']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 281,
    'max_depth': 11,
    'learning_rate': 0.04083240077095524,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9534759457760267,
    'bagging_fraction': 0.6637656869307713,
    'bagging_freq': 7,
    'lambda_l1': 0.48486124811843856,
    'lambda_l2': 1.0469435993282974
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("Engine_lgbm_model.txt")

print("Model training complete and saved to 'Engine_lgbm_model.txt'")


In [None]:
#Differential (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 126,
    'max_depth': 8,
    'learning_rate': 0.05725939263903659,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9048274481436259,
    'bagging_fraction': 0.6871315558035545,
    'bagging_freq': 1,
    'lambda_l1': 2.9809700145880162,
    'lambda_l2': 2.893530172381254
    }


# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("Differential_lgbm_model.txt")

print("Model training complete and saved to 'Differential_lgbm_model.txt'")


In [None]:
#Rear Wing (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure' , 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 41,
    'max_depth': 14,
    'learning_rate': 0.07379289439034513,
    'min_data_in_leaf': 14,
    'feature_fraction': 0.9805583874353474,
    'bagging_fraction': 0.757494844528942,
    'bagging_freq': 1,
    'lambda_l1': 3.3280185770912905,
    'lambda_l2': 4.668868493827121
    }

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("RearWing_lgbm_model.txt")

print("Model training complete and saved to 'RearWing_lgbm_model.txt'")


In [None]:
#Front Wing (hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 228,
    'max_depth': 6,
    'learning_rate': 0.10991909073754202,
    'min_data_in_leaf': 94,
    'feature_fraction': 0.954670793846081,
    'bagging_fraction': 0.7659843231590522,
    'bagging_freq': 1, 'lambda_l1': 3.8616883301361553,
    'lambda_l2': 4.896984209036385
    }


# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("FrontWing_lgbm_model.txt")

print("Model training complete and saved to 'FrontWing_lgbm_model.txt'")


In [None]:
#Suspension
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 31,
    'max_depth': 10,
    'learning_rate': 0.14051851191638579,
    'min_data_in_leaf': 75,
    'feature_fraction': 0.999888882811791,
    'bagging_fraction': 0.7009625088481276,
    'bagging_freq': 1,
    'lambda_l1': 4.8851994495913145,
    'lambda_l2': 3.6824821872767783
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("Suspension_lgbm_model.txt")

print("Model training complete and saved to 'Suspension_lgbm_model.txt'")


In [None]:
#Brake Balance (hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine','Cornering', 'Width', 'Roughness', 'Temperature']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 27,
    'max_depth': 11,
    'learning_rate': 0.18696777761444966,
    'min_data_in_leaf': 73,
    'feature_fraction': 0.964098272670725,
    'bagging_fraction': 0.8852821315081366,
    'bagging_freq': 10,
    'lambda_l1': 3.486399193949678,
    'lambda_l2': 3.7053586457221366
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("BrakeBalance_lgbm_model.txt")

print("Model training complete and saved to 'BrakeBalance_lgbm_model.txt'")


### Fifth Step: Optimize Models using Practice_Data

In [None]:
import pandas as pd
import lightgbm as lgb

# Model names
model_names = [
    "Engine_lgbm_model.txt",
    "Differential_lgbm_model.txt",
    "RearWing_lgbm_model.txt",
    "FrontWing_lgbm_model.txt",
    "Suspension_lgbm_model.txt",
    "BrakeBalance_lgbm_model.txt"
]

# Corresponding feature sets
feature_sets = {
    "Engine": ['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure', 'Cornering'],
    "Differential": ['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density'],
    "RearWing": ['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness'],
    "FrontWing": ['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)'],
    "Suspension": ['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width'],
    "BrakeBalance": ['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Width', 'Roughness', 'Temperature']
}

# Load new data
df = pd.read_csv("practice_data_newnamesV2_5.csv")

# Compute Avg Speed for new data
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Target variable
y_new = df["Avg Speed"]

# Sample weight
sample_weight = [1000] * len(y_new)

# Training each model with its feature set
for model_file in model_names:
    model_key = model_file.split("_")[0]  # Extract the prefix e.g., "Engine"
    features = feature_sets[model_key]

    print(f"\nUpdating {model_file} with features: {features}")

    # Prepare feature matrix
    X_new = df[features]

    # Load existing model
    model = lgb.Booster(model_file=model_file)

    # Create dataset
    new_data = lgb.Dataset(X_new, label=y_new, weight=sample_weight)

    # Continue training
    updated_model = lgb.train(
        params={},
        train_set=new_data,
        init_model=model,
        num_boost_round=1000
    )

    # Save updated model
    updated_file = model_file.replace(".txt", "_updated.txt")
    updated_model.save_model(updated_file)

    print(f"Model saved to {updated_file}")

print("\nAll models updated successfully.")


### Sixth Step: Write Code, that sequentially optimizes Car-Parameters with existing Models

In [None]:
import pandas as pd
import lightgbm as lgb
import optuna

# Ordered list of parameters to optimize
params_to_optimize = ["Engine", "Differential", "RearWing", "FrontWing", "Suspension", "BrakeBalance"]

# Corresponding feature sets for each model
feature_sets = {
    "Engine": ['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure', 'Cornering'],
    "Differential": ['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density'],
    "RearWing": ['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness'],
    "FrontWing": ['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)'],
    "Suspension": ['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width'],
    "BrakeBalance": ['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Width', 'Roughness', 'Temperature']
}

# Load static track & weather data
base_data = pd.read_csv("track_weather_germany.csv")

# This will hold the best parameter values as we optimize them
optimized_params = {}

# Begin sequential optimization
for car_part in params_to_optimize:
    print(f"\n🔧 Optimizing {car_part}...")

    model_path = f"{car_part}_lgbm_model_updated.txt"
    model = lgb.Booster(model_file=model_path)
    features = feature_sets[car_part]

    def objective(trial):
        # Set the current parameter being optimized

        params = {
            "Engine": trial.suggest_int("Engine", 85, 115),
            "RearWing": trial.suggest_int("RearWing", 499, 500),
            "FrontWing": trial.suggest_int("FrontWing", 70, 90),
            "BrakeBalance": trial.suggest_int("BrakeBalance", 85, 115),
            "Suspension": trial.suggest_int("Suspension", 20, 50),
            "Differential": trial.suggest_int("Differential", 68, 98),}

        current_value = params[car_part]

        # Create a single row input with all needed features
        input_data = base_data.copy()

        # Add previously optimized params
        for param, value in optimized_params.items():
            input_data[param] = value

        # Add current trial value
        input_data[car_part] = current_value

        # If any missing feature, fill with 0 or a safe default
        for col in features:
            if col not in input_data.columns:
                input_data[col] = 0

        # Ensure correct order of features
        X = input_data[features]

        # Predict Avg Speed
        avg_speed = model.predict(X)[0]
        return avg_speed  # Optuna maximizes this

    # Run Optuna
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=300, show_progress_bar=True)

    best_value = study.best_params[car_part]
    optimized_params[car_part] = best_value
    print(f"✅ Best {car_part}: {best_value}")

# Final results
print("\n🎯 Final Optimized Parameters:")
for k, v in optimized_params.items():
    print(f"{k}: {v}")


In [None]:
477	166	89	97	83	35

🎯 Final Optimized Parameters: (model without update)
Engine: 91
Differential: 101
RearWing: 489
FrontWing: 274
Suspension: 45
BrakeBalance: 101

🎯 Final Optimized Parameters:(model with update)
Engine: 32
Differential: 1
RearWing: 43
FrontWing: 68
Suspension: 29
BrakeBalance: 268

## Race Strategy

In [None]:
import math

# Define the tire parameters and their lap time formulas
def lap_time_super_soft(X):
    return (67.1830587725926) + 0.18352607407407 * X

def lap_time_soft(X):
    return (67.5640573104347) + 0.140194202898552 * X

def lap_time_medium(X):
    return (68.2222646857471) + 0.142527494252874 * X

def lap_time_hard(X):
    return (68.9335449948148) + 0.108381851851852 * X

# Tire data with their lifespan
tire_lifespan = {
    "super_soft": 9,
    "soft": 23,
    "medium": 29,
    "hard": 36
}

# Pit stop penalty
pit_stop_time = 30  # seconds

# Function to calculate the total race time for a given strategy
def calculate_race_time(laps, strategy):
    total_time = 0
    total_pit_stops = 0
    lap_index = 0
    lap_counter = 0

    while lap_counter < laps:
        tire, stint_laps = strategy[lap_index]

        # Ensure we don't exceed the total laps
        if lap_counter + stint_laps > laps:
            stint_laps = laps - lap_counter

        # Calculate the lap times for this stint
        lap_times = []
        for i in range(stint_laps):
            if tire == "super_soft":
                lap_times.append(lap_time_super_soft(i + 1))
            elif tire == "soft":
                lap_times.append(lap_time_soft(i + 1))
            elif tire == "medium":
                lap_times.append(lap_time_medium(i + 1))
            elif tire == "hard":
                lap_times.append(lap_time_hard(i + 1))

        total_time += sum(lap_times)  # Add the lap times of this stint
        lap_counter += stint_laps

        # If we are not at the last stint, account for a pit stop
        if lap_counter < laps:
            total_time += pit_stop_time  # Pit stop penalty
            total_pit_stops += 1

        lap_index += 1
        if lap_index >= len(strategy):
            break

    return total_time, total_pit_stops

# Function to generate possible strategies dynamically
def generate_strategies(laps):
    strategies = []
    tire_choices = ["super_soft", "soft", "medium", "hard"]

    # Generate strategies by breaking the laps into multiple stints
    for tire1 in tire_choices:
        for tire2 in tire_choices:
            for tire3 in tire_choices:
                for tire4 in tire_choices:
                  for tire5 in tire_choices:
                    strategy = []
                    remaining_laps = laps

                    # Create dynamic stints for each tire
                    for tire in [tire1, tire2, tire3, tire4, tire5]:
                    #for tire in [tire1, tire2, tire3, tire4]:
                    #for tire in [tire1, tire2, tire3]:
                    #for tire in [tire1, tire2]:
                        stint_laps = tire_lifespan[tire]

                        if remaining_laps > stint_laps:
                            strategy.append((tire, stint_laps))
                            remaining_laps -= stint_laps
                        else:
                            strategy.append((tire, remaining_laps))
                            break

                    if sum([stint[1] for stint in strategy]) == laps:
                        strategies.append(strategy)

    return strategies

# Function to find the best strategy
def optimize_strategy(laps):
    best_time = math.inf
    best_strategy = None

    strategies = generate_strategies(laps)

    for strategy in strategies:
        total_time, pit_stops = calculate_race_time(laps, strategy)
        if total_time < best_time:
            best_time = total_time
            best_strategy = strategy
            best_pit_stops = pit_stops

    return best_strategy, best_time, best_pit_stops


# Main function
if __name__ == "__main__":
    race_laps = 85
    best_strategy, best_time, total_pit_stops = optimize_strategy(race_laps)
    print(f"Best Strategy: {best_strategy}")
    print(f"Best Total Time: {best_time} seconds")
    print(f"Total Pit Stops: {total_pit_stops}")

In [None]:
Best Strategy: [('soft', 23), ('soft', 23), ('soft', 23), ('soft', 14)]
Best Total Time: 5828.617948070432 seconds
Total Pit Stops: 3

# **Analytics for Race 3**

For Race 3, we aimed to build upon our results from Race 2.  
In the analytics for Race 2, we retrained the LightGBM models using data from our practice laps.  

We identified a potential issue: retaining these models on this different type of data could lead to changes in the optimal hyperparameters. Therefore, we decided to optimize the hyperparameters of the models by incorporating the practice data.


We applied this approach to both our Sequential Optimization Models and the AllInOne Model.


To do this, we combined the datasets using `pd.concat` and assigned a weight to each dataset.  
The simulation data was given a weight of 1.  
To ensure the practice data was sufficiently influential in the model training, we calculated its weight by dividing the number of lines in the simulation data by the number of lines in the practice data (for example, 10000 / 40 = 250).

Aside from this, we followed the same process as in Race 2 to optimize the car parameters and race strategy.




**Foreshadowing:**
In the feedback session after Race 3, we learned that other teams used much lower dataset weights and achieved better results. Upon further research, we discovered that assigning excessively high weights to certain datasets—such as the practice data—may not be the most effective strategy. While increasing the weight of practice data highlights its importance during model training, overly high weights can lead to overfitting. This causes the model to become too tailored to that specific data and diminishes its ability to generalize well.

As a result, we decided to prioritize a more balanced approach to dataset weighting for Race 4, aiming to improve the robustness and generalization of our models.


In [None]:
# Team members working on this code: Paula Kussauer, Cedric Schwandt, Hannes Kock

## Sequential Optimization

### Hyperparameter Optimization Sim Prac Combo

In [None]:
#Engine SimPrac Hyperparameter Optimization
import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load both datasets
sim_df = pd.read_csv("simulator_data.csv")
prac_df = pd.read_csv("practice_dataV3_3.csv")

# Calculate target: Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0  # Adjust this ratio to control importance (10000/40 = 250)

# Merge datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    X = df[['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure' , 'Cornering']]
    y = df["Avg Speed"]
    weights = df["weight"]

    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        w_train, w_val = weights.iloc[train_idx], weights.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train, weight=w_train)
        dvalid = lgb.Dataset(X_val, label=y_val, weight=w_val)

        model = lgb.train(param, dtrain, num_boost_round=1000,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=250)

print("Best trial:")
print(study.best_trial.params)


In [None]:
#Differential SimPrac Hyperparameter Optimization
import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load both datasets
sim_df = pd.read_csv("simulator_data.csv")
prac_df = pd.read_csv("practice_dataV3_3.csv")

# Calculate target: Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0  # Adjust this ratio to control importance (10000/40 = 250)

# Merge datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    X = df[['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density']]
    y = df["Avg Speed"]
    weights = df["weight"]

    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        w_train, w_val = weights.iloc[train_idx], weights.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train, weight=w_train)
        dvalid = lgb.Dataset(X_val, label=y_val, weight=w_val)

        model = lgb.train(param, dtrain, num_boost_round=1000,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=250)

print("Best trial:")
print(study.best_trial.params)


In [None]:
#Rear Wing SimPrac Hyperparameter Optimization
import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load both datasets
sim_df = pd.read_csv("simulator_data.csv")
prac_df = pd.read_csv("practice_dataV3_3.csv")

# Calculate target: Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0  # Adjust this ratio to control importance (10000/40 = 250)

# Merge datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    X = df[['Rear Wing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure' , 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness']]
    y = df["Avg Speed"]
    weights = df["weight"]

    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        w_train, w_val = weights.iloc[train_idx], weights.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train, weight=w_train)
        dvalid = lgb.Dataset(X_val, label=y_val, weight=w_val)

        model = lgb.train(param, dtrain, num_boost_round=1000,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=250)

print("Best trial:")
print(study.best_trial.params)


In [None]:
#Front Wing SimPrac Hyperparameter Optimization
import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load both datasets
sim_df = pd.read_csv("simulator_data.csv")
prac_df = pd.read_csv("practice_dataV3_3.csv")

# Calculate target: Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0  # Adjust this ratio to control importance (10000/40 = 250)

# Merge datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    X = df[['Front Wing', 'Rear Wing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)']]
    y = df["Avg Speed"]
    weights = df["weight"]

    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        w_train, w_val = weights.iloc[train_idx], weights.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train, weight=w_train)
        dvalid = lgb.Dataset(X_val, label=y_val, weight=w_val)

        model = lgb.train(param, dtrain, num_boost_round=1000,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=250)

print("Best trial:")
print(study.best_trial.params)


In [None]:
#Suspension SimPrac Hyperparameter Optimization
import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load both datasets
sim_df = pd.read_csv("simulator_data.csv")
prac_df = pd.read_csv("practice_dataV3_3.csv")

# Calculate target: Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0  # Adjust this ratio to control importance (10000/40 = 250)

# Merge datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    X = df[['Suspension', 'Front Wing', 'Rear Wing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width']]
    y = df["Avg Speed"]
    weights = df["weight"]

    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        w_train, w_val = weights.iloc[train_idx], weights.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train, weight=w_train)
        dvalid = lgb.Dataset(X_val, label=y_val, weight=w_val)

        model = lgb.train(param, dtrain, num_boost_round=1000,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=250)

print("Best trial:")
print(study.best_trial.params)


In [None]:
#Brake Balance SimPrac Hyperparameter Optimization
import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load both datasets
sim_df = pd.read_csv("simulator_data.csv")
prac_df = pd.read_csv("practice_dataV3_3.csv")

# Calculate target: Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0  # Adjust this ratio to control importance (10000/40 = 250)

# Merge datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    X = df[['Brake Balance', 'Suspension', 'Front Wing', 'Rear Wing', 'Differential', 'Engine','Cornering', 'Width', 'Roughness', 'Temperature']]
    y = df["Avg Speed"]
    weights = df["weight"]

    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        w_train, w_val = weights.iloc[train_idx], weights.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train, weight=w_train)
        dvalid = lgb.Dataset(X_val, label=y_val, weight=w_val)

        model = lgb.train(param, dtrain, num_boost_round=1000,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=300)

print("Best trial:")
print(study.best_trial.params)


In [None]:
Brake Balance: 	X = df[['Brake_Balance', 'Suspension', 'Front_Wing', 'Rear_Wing', 'Differential', 'Engine','Cornering', 'Width', 'Roughness', 'Temperature']]
Suspension: 	X = df[['Suspension', 'Front_Wing', 'Rear_Wing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width']]
Front Wing: 	X = df[['Front_Wing', 'Rear_Wing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)']]
Rear Wing: 	X = df[['Rear_Wing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure' , 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness']]
Differential: 	X = df[['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density']]
Engine: 	X = df[['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure' , 'Cornering']]

### Training and saving LGBM-Models for each Car-Parameter (SimPrac Combo)

In [None]:
Best Engine trial:
{'num_leaves': 46, 'max_depth': 5, 'learning_rate': 0.033632492994555285, 'min_data_in_leaf': 10, 'feature_fraction': 0.9292816238478901, 'bagging_fraction': 0.6029213489987132, 'bagging_freq': 1, 'lambda_l1': 4.9435895890172095, 'lambda_l2': 1.5835241201372334}

Best Differential trial:
{'num_leaves': 120, 'max_depth': 4, 'learning_rate': 0.06188740190651351, 'min_data_in_leaf': 10, 'feature_fraction': 0.8516027132719757, 'bagging_fraction': 0.7665333052378347, 'bagging_freq': 2, 'lambda_l1': 4.842140375323366, 'lambda_l2': 0.8572408057266877}

Best Rear Wing trial:
{'num_leaves': 234, 'max_depth': 8, 'learning_rate': 0.009001605992864, 'min_data_in_leaf': 12, 'feature_fraction': 0.7155500136774263, 'bagging_fraction': 0.9782633351035283, 'bagging_freq': 3, 'lambda_l1': 4.0691300429761625, 'lambda_l2': 0.4270601685583596}

Best Front Win trail:
{'num_leaves': 75, 'max_depth': 15, 'learning_rate': 0.008393206404444, 'min_data_in_leaf': 10, 'feature_fraction': 0.636593633990795, 'bagging_fraction': 0.8763178958731057, 'bagging_freq': 1, 'lambda_l1': 3.37372395608384, 'lambda_l2': 3.5508540427910322}

Best Suspension Trail:
{'num_leaves': 94, 'max_depth': 7, 'learning_rate': 0.053319349444258306, 'min_data_in_leaf': 11, 'feature_fraction': 0.7588234798454238, 'bagging_fraction': 0.9663253274432543, 'bagging_freq': 1, 'lambda_l1': 4.443326692620447, 'lambda_l2': 2.017979694136418}

Best Brake Balance Trail:
{'num_leaves': 52, 'max_depth': 14, 'learning_rate': 0.02166755029862027, 'min_data_in_leaf': 10, 'feature_fraction': 0.5221020637935397, 'bagging_fraction': 0.938010385446846, 'bagging_freq': 1, 'lambda_l1': 2.531857885425292, 'lambda_l2': 4.788757729441045}

In [None]:
#Engine (Hyperparameters and X set) Check
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load both datasets
sim_df = pd.read_csv("simulator_data_newnames.csv")
prac_df = pd.read_csv("practice_data_newnamesV3_2.csv")

# Compute Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0


# Combine datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Features, target, and weights
X = df[['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure' , 'Cornering']]
y = df["Avg Speed"]
weights = df["weight"]

# Define LightGBM dataset with weights
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 46,
    'max_depth': 5,
    'learning_rate': 0.033632492994555285,
    'min_data_in_leaf': 10,
    'feature_fraction': 0.9292816238478901,
    'bagging_fraction': 0.6029213489987132,
    'bagging_freq': 1,
    'lambda_l1': 4.9435895890172095,
    'lambda_l2': 1.5835241201372334
    }

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("Engine_lgbm_model_R3.txt")

print("Model training complete and saved to 'Engine_lgbm_model_R3.txt'")


In [None]:
#Differential (Hyperparameters and X set) check
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load both datasets
sim_df = pd.read_csv("simulator_data_newnames.csv")
prac_df = pd.read_csv("practice_data_newnamesV3_2.csv")

# Compute Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0


# Combine datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Features, target, and weights
X = df[['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density']]
y = df["Avg Speed"]
weights = df["weight"]

# Define LightGBM dataset with weights
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 120,
    'max_depth': 4,
    'learning_rate': 0.06188740190651351,
    'min_data_in_leaf': 10,
    'feature_fraction': 0.8516027132719757,
    'bagging_fraction': 0.7665333052378347,
    'bagging_freq': 2,
    'lambda_l1': 4.842140375323366,
    'lambda_l2': 0.8572408057266877
    }


# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("Differential_lgbm_model_R3.txt")

print("Model training complete and saved to 'Differential_lgbm_model_R3.txt'")


In [None]:
#Rear Wing (Hyperparameters and X set) check
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load both datasets
sim_df = pd.read_csv("simulator_data_newnames.csv")
prac_df = pd.read_csv("practice_data_newnamesV3_2.csv")

# Compute Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0


# Combine datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Features, target, and weights
X = df[['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure' , 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness']]
y = df["Avg Speed"]
weights = df["weight"]

# Define LightGBM dataset with weights
lgb_data = lgb.Dataset(X, label=y, weight=weights)


# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 234,
    'max_depth': 8,
    'learning_rate': 0.009001605992864,
    'min_data_in_leaf': 12,
    'feature_fraction': 0.7155500136774263,
    'bagging_fraction': 0.9782633351035283,
    'bagging_freq': 3,
    'lambda_l1': 4.0691300429761625,
    'lambda_l2': 0.4270601685583596}


# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("RearWing_lgbm_model_R3.txt")

print("Model training complete and saved to 'RearWing_lgbm_model_R3.txt'")


In [None]:
#Front Wing (hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load both datasets
sim_df = pd.read_csv("simulator_data_newnames.csv")
prac_df = pd.read_csv("practice_data_newnamesV3_2.csv")

# Compute Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0


# Combine datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Features, target, and weights
X = df[['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)']]
y = df["Avg Speed"]
weights = df["weight"]

# Define LightGBM dataset with weights
lgb_data = lgb.Dataset(X, label=y, weight=weights)


# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 75,
    'max_depth': 15,
    'learning_rate': 0.008393206404444,
    'min_data_in_leaf': 10,
    'feature_fraction': 0.636593633990795,
    'bagging_fraction': 0.8763178958731057,
    'bagging_freq': 1,
    'lambda_l1': 3.37372395608384,
    'lambda_l2': 3.5508540427910322
    }



# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("FrontWing_lgbm_model_R3.txt")

print("Model training complete and saved to 'FrontWing_lgbm_model_R3.txt'")


In [None]:
#Suspension
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load both datasets
sim_df = pd.read_csv("simulator_data_newnames.csv")
prac_df = pd.read_csv("practice_data_newnamesV3_2.csv")

# Compute Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0


# Combine datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Features, target, and weights
X = df[['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width']]
y = df["Avg Speed"]
weights = df["weight"]

# Define LightGBM dataset with weights
lgb_data = lgb.Dataset(X, label=y, weight=weights)


# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 52,
    'max_depth': 14,
    'learning_rate': 0.02166755029862027,
    'min_data_in_leaf': 10,
    'feature_fraction': 0.5221020637935397,
    'bagging_fraction': 0.938010385446846,
    'bagging_freq': 1,
    'lambda_l1': 2.531857885425292,
    'lambda_l2': 4.788757729441045
    }

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("Suspension_lgbm_model_R3.txt")

print("Model training complete and saved to 'Suspension_lgbm_model_R3.txt'")


In [None]:
#Brake Balance (hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load both datasets
sim_df = pd.read_csv("simulator_data_newnames.csv")
prac_df = pd.read_csv("practice_data_newnamesV3_2.csv")

# Compute Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0


# Combine datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Features, target, and weights
X = df[['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine','Cornering', 'Width', 'Roughness', 'Temperature']]
y = df["Avg Speed"]
weights = df["weight"]

# Define LightGBM dataset with weights
lgb_data = lgb.Dataset(X, label=y, weight=weights)


# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 27,
    'max_depth': 11,
    'learning_rate': 0.18696777761444966,
    'min_data_in_leaf': 73,
    'feature_fraction': 0.964098272670725,
    'bagging_fraction': 0.8852821315081366,
    'bagging_freq': 10,
    'lambda_l1': 3.486399193949678,
    'lambda_l2': 3.7053586457221366
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("BrakeBalance_lgbm_model_R3.txt")

print("Model training complete and saved to 'BrakeBalance_lgbm_model_R3.txt'")


### Training and saving LGBM-Models for each Car-Parameter (Not SimPrac Combo)

In [None]:
#Engine (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure' , 'Cornering']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 281,
    'max_depth': 11,
    'learning_rate': 0.04083240077095524,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9534759457760267,
    'bagging_fraction': 0.6637656869307713,
    'bagging_freq': 7,
    'lambda_l1': 0.48486124811843856,
    'lambda_l2': 1.0469435993282974
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("Engine_lgbm_model_R3.txt")

print("Model training complete and saved to 'Engine_lgbm_model_R3.txt'")


In [None]:
#Differential (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 126,
    'max_depth': 8,
    'learning_rate': 0.05725939263903659,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9048274481436259,
    'bagging_fraction': 0.6871315558035545,
    'bagging_freq': 1,
    'lambda_l1': 2.9809700145880162,
    'lambda_l2': 2.893530172381254
    }


# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("Differential_lgbm_model_R3.txt")

print("Model training complete and saved to 'Differential_lgbm_model_R3.txt'")


In [None]:
#Rear Wing (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure' , 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 41,
    'max_depth': 14,
    'learning_rate': 0.07379289439034513,
    'min_data_in_leaf': 14,
    'feature_fraction': 0.9805583874353474,
    'bagging_fraction': 0.757494844528942,
    'bagging_freq': 1,
    'lambda_l1': 3.3280185770912905,
    'lambda_l2': 4.668868493827121
    }

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("RearWing_lgbm_model_R3.txt")

print("Model training complete and saved to 'RearWing_lgbm_model_R3.txt'")


In [None]:
#Front Wing (hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 228,
    'max_depth': 6,
    'learning_rate': 0.10991909073754202,
    'min_data_in_leaf': 94,
    'feature_fraction': 0.954670793846081,
    'bagging_fraction': 0.7659843231590522,
    'bagging_freq': 1, 'lambda_l1': 3.8616883301361553,
    'lambda_l2': 4.896984209036385
    }


# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("FrontWing_lgbm_model_R3.txt")

print("Model training complete and saved to 'FrontWing_lgbm_model_R3.txt'")


In [None]:
#Suspension
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 31,
    'max_depth': 10,
    'learning_rate': 0.14051851191638579,
    'min_data_in_leaf': 75,
    'feature_fraction': 0.999888882811791,
    'bagging_fraction': 0.7009625088481276,
    'bagging_freq': 1,
    'lambda_l1': 4.8851994495913145,
    'lambda_l2': 3.6824821872767783
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("Suspension_lgbm_model_R3.txt")

print("Model training complete and saved to 'Suspension_lgbm_model_R3.txt'")


In [None]:
#Brake Balance (hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine','Cornering', 'Width', 'Roughness', 'Temperature']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 27,
    'max_depth': 11,
    'learning_rate': 0.18696777761444966,
    'min_data_in_leaf': 73,
    'feature_fraction': 0.964098272670725,
    'bagging_fraction': 0.8852821315081366,
    'bagging_freq': 10,
    'lambda_l1': 3.486399193949678,
    'lambda_l2': 3.7053586457221366
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("BrakeBalance_lgbm_model_R3.txt")

print("Model training complete and saved to 'BrakeBalance_lgbm_model_R3.txt'")


Optimizing Models using Practice Data

In [None]:
import pandas as pd
import lightgbm as lgb

# Model names
model_names = [
    "Engine_lgbm_model_R3.txt",
    "Differential_lgbm_model_R3.txt",
    "RearWing_lgbm_model_R3.txt",
    "FrontWing_lgbm_model_R3.txt",
    "Suspension_lgbm_model_R3.txt",
    "BrakeBalance_lgbm_model_R3.txt"
]

# Corresponding feature sets
feature_sets = {
    "Engine": ['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure', 'Cornering'],
    "Differential": ['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density'],
    "RearWing": ['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness'],
    "FrontWing": ['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)'],
    "Suspension": ['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width'],
    "BrakeBalance": ['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Width', 'Roughness', 'Temperature']
}

# Load new data
df = pd.read_csv("practice_data_newnamesV3_1.csv")

# Compute Avg Speed for new data
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Target variable
y_new = df["Avg Speed"]

# Sample weight
sample_weight = [1000] * len(y_new)

# Training each model with its feature set
for model_file in model_names:
    model_key = model_file.split("_")[0]  # Extract the prefix e.g., "Engine"
    features = feature_sets[model_key]

    print(f"\nUpdating {model_file} with features: {features}")

    # Prepare feature matrix
    X_new = df[features]

    # Load existing model
    model = lgb.Booster(model_file=model_file)

    # Create dataset
    new_data = lgb.Dataset(X_new, label=y_new, weight=sample_weight)

    # Continue training
    updated_model = lgb.train(
        params={},
        train_set=new_data,
        init_model=model,
        num_boost_round=1000
    )

    # Save updated model
    updated_file = model_file.replace(".txt", "_updated.txt")
    updated_model.save_model(updated_file)

    print(f"Model saved to {updated_file}")

print("\nAll models updated successfully.")


### Sequentially optimizing Car Parameters.

In [None]:
import pandas as pd
import lightgbm as lgb
import optuna

# Ordered list of parameters to optimize
params_to_optimize = ["Engine", "Differential", "RearWing", "FrontWing", "Suspension", "BrakeBalance"]

# Corresponding feature sets for each model
feature_sets = {
    "Engine": ['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure', 'Cornering'],
    "Differential": ['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density'],
    "RearWing": ['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness'],
    "FrontWing": ['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)'],
    "Suspension": ['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width'],
    "BrakeBalance": ['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Width', 'Roughness', 'Temperature']
}

# Load static track & weather data
base_data = pd.read_csv("track_weather_netherlands.csv")

# This will hold the best parameter values as we optimize them
optimized_params = {}

# Begin sequential optimization
for car_part in params_to_optimize:
    print(f"\n🔧 Optimizing {car_part}...")

    model_path = f"{car_part}_lgbm_model_R3.txt"
    model = lgb.Booster(model_file=model_path)
    features = feature_sets[car_part]

    def objective(trial):
        # Set the current parameter being optimized

        params = {
            "Engine": trial.suggest_int("Engine", 1, 500),
            "RearWing": trial.suggest_int("RearWing", 1, 500),
            "FrontWing": trial.suggest_int("FrontWing", 1, 500),
            "BrakeBalance": trial.suggest_int("BrakeBalance", 1, 500),
            "Suspension": trial.suggest_int("Suspension", 1, 500),
            "Differential": trial.suggest_int("Differential", 1, 500),}

        current_value = params[car_part]

        # Create a single row input with all needed features
        input_data = base_data.copy()

        # Add previously optimized params
        for param, value in optimized_params.items():
            input_data[param] = value

        # Add current trial value
        input_data[car_part] = current_value

        # If any missing feature, fill with 0 or a safe default
        for col in features:
            if col not in input_data.columns:
                input_data[col] = 0

        # Ensure correct order of features
        X = input_data[features]

        # Predict Avg Speed
        avg_speed = model.predict(X)[0]
        return avg_speed  # Optuna maximizes this

    # Run Optuna
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=500, show_progress_bar=True)

    best_value = study.best_params[car_part]
    optimized_params[car_part] = best_value
    print(f"✅ Best {car_part}: {best_value}")

# Final results
print("\n🎯 Final Optimized Parameters:")
for k, v in optimized_params.items():
    print(f"{k}: {v}")


In [None]:
🎯 Final Optimized Parameters:
Engine: 77
Differential: 101
RearWing: 161
FrontWing: 80
Suspension: 41
BrakeBalance: 68

🎯 Final Optimized Parameters:
Engine: 45
Differential: 35
RearWing: 233
FrontWing: 72
Suspension: 160
BrakeBalance: 53

In [None]:
import pandas as pd
import lightgbm as lgb
import optuna

# Ordered list of car parameters to optimize
params_to_optimize = [
    "Engine", "Differential", "RearWing",
    "FrontWing", "Suspension", "BrakeBalance"
]

# Feature sets used by each model for prediction
feature_sets = {
    "Engine": ['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure', 'Cornering'],
    "Differential": ['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density'],
    "RearWing": ['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness'],
    "FrontWing": ['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)'],
    "Suspension": ['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width'],
    "BrakeBalance": ['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Width', 'Roughness', 'Temperature']
}

# Load static base data (track & weather)
base_data = pd.read_csv("track_weather_netherlands.csv")

# Dictionary to store optimized values
optimized_params = {}

# Sequential optimization loop
for car_part in params_to_optimize:
    print(f"\n🔧 Optimizing {car_part}...")

    # Load the pretrained LightGBM model
    model = lgb.Booster(model_file=f"{car_part}_lgbm_model_R3.txt")
    features = feature_sets[car_part]

    def objective(trial):
        # Clone base data for this trial
        input_data = base_data.copy()

        # Add fixed parameters (already optimized)
        for param, value in optimized_params.items():
            input_data[param] = value

        # Suggest value only for the current parameter
        trial_value = trial.suggest_int(car_part, 1, 500)
        input_data[car_part] = trial_value

        # Fill in missing feature columns with default (0)
        for feature in features:
            if feature not in input_data.columns:
                input_data[feature] = 0

        # Predict average speed
        X = input_data[features]
        avg_speed = model.predict(X)[0]
        return avg_speed

    # Run Optuna optimization for this parameter
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=100, show_progress_bar=True)

    # Save best value
    best_value = study.best_params[car_part]
    optimized_params[car_part] = best_value
    print(f"✅ Best {car_part}: {best_value}")

# Final results
print("\n🎯 Final Optimized Parameters:")
for k, v in optimized_params.items():
    print(f"{k}: {v}")


## AllInOne Optimization

In [None]:
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time"])
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 242,
    'max_depth': 9,
    'learning_rate': 0.0645474493988489,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9427185636727835,
    'bagging_fraction': 0.785711983453578,
    'bagging_freq': 2,
    'lambda_l1': 0.6006835353397709,
    'lambda_l2': 1.4905819957883624,
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("best_lgbm_model_R3.txt")

print("Model training complete and saved to 'best_lgbm_model_R3.txt'")


In [None]:
#Directly trained LGBM-Model with Simulator_Data and Practice_data using optm. Hyperparameters
#Hyperparameters optm. for Simulator and Practice_data

import pandas as pd
import numpy as np
import lightgbm as lgb

# Load both datasets
sim_df = pd.read_csv("simulator_data.csv")
prac_df = pd.read_csv("practice_dataV3_2.csv")

# Compute Avg Speed
sim_df["Avg Speed"] = (sim_df["Lap Distance"] / sim_df["Lap Time"]) * 3600
prac_df["Avg Speed"] = (prac_df["Lap Distance"] / prac_df["Lap Time"]) * 3600

# Assign weights
sim_df["weight"] = 1.0
prac_df["weight"] = 250.0

# Make sure both have the same columns in the same order
required_cols = ["Lap Distance", "Lap Time", "Avg Speed", "weight", "Rear Wing", "Front Wing", "Engine", "Brake Balance", "Differential", "Suspension", "Cornering", "Inclines", "Camber", "Grip", "Altitude", "Roughness", "Width", "Temperature", "Humidity", "Wind (Avg. Speed)", "Wind (Gusts)", "Air Density", "Air Pressure"] + \
                [col for col in sim_df.columns if col not in ["Lap Distance", "Lap Time", "Avg Speed", "weight", "Rear Wing", "Front Wing", "Engine", "Brake Balance", "Differential", "Suspension", "Cornering", "Inclines", "Camber", "Grip", "Altitude", "Roughness", "Width", "Temperature", "Humidity", "Wind (Avg. Speed)", "Wind (Gusts)", "Air Density", "Air Pressure"]]

sim_df = sim_df[required_cols]
prac_df = prac_df[required_cols]

# Combine datasets
df = pd.concat([sim_df, prac_df], ignore_index=True)

# Features, target, and weights
X = df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time", "weight"])
y = df["Avg Speed"]
weights = df["weight"]

# Define LightGBM dataset with weights
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (already tuned)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 74,
    'max_depth': 11,
    'learning_rate': 0.028870630893942515,
    'min_data_in_leaf': 13,
    'feature_fraction': 0.7399238619191133,
    'bagging_fraction': 0.9203471068533383,
    'bagging_freq': 6,
    'lambda_l1': 0.025758523534949052,
    'lambda_l2': 0.9073066525703107}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("sim_prac_lgbm_model_R3.txt")

print("Model training complete and saved to 'sim_prac_lgbm_model_R3.txt'")

In [None]:
import pandas as pd
import lightgbm as lgb

# Load the original model
model = lgb.Booster(model_file="best_lgbm_model_R3.txt")
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 242,
    'max_depth': 9,
    'learning_rate': 0.0645474493988489,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9427185636727835,
    'bagging_fraction': 0.785711983453578,
    'bagging_freq': 2,
    'lambda_l1': 0.6006835353397709,
    'lambda_l2': 1.4905819957883624,
}
# Load the new data
new_df = pd.read_csv("practice_dataV3_1.csv")

# Compute Avg Speed for new data
new_df["Avg Speed"] = (new_df["Lap Distance"] / new_df["Lap Time"]) * 3600

# Prepare features and target
X_new = new_df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time","Round", "Track", "Qualifying", "Stint", "Lap", "Fuel", "Tyre Remaining", "Tyre Choice"])
y_new = new_df["Avg Speed"]

# Assign higher weights (e.g., 10x more important)
sample_weight = [5] * len(y_new)

# Create new LightGBM Dataset
new_data = lgb.Dataset(X_new, label=y_new, weight=sample_weight)

# Continue training from previous model
model = lgb.train(
    params={},  # Empty here since model already knows them
    train_set=new_data,
    init_model=model,
    num_boost_round=1000,  # You can increase if needed
)

# Save updated model
model.save_model("best_lgbm_model_R3_updated.txt")

print("Model updated with new data and saved to 'best_lgbm_model_R3_updated.txt'")


In [None]:
import pandas as pd
import lightgbm as lgb
import optuna


# Load trained model
#model = lgb.Booster(model_file="best_lgbm_model_R3_updated.txt")
model = lgb.Booster(model_file="sim_prac_lgbm_model_R3.txt")

# Load track/weather data (single row)
track_weather = pd.read_csv("track_weather_netherlands.csv")

# Define the objective function
def objective(trial):
    # Suggest car parameters
    params = {
        "Engine": trial.suggest_int("Engine", 20, 60),
        "Rear Wing": trial.suggest_int("Rear Wing", 200, 300),
        "Front Wing": trial.suggest_int("Front Wing", 48, 88),
        "Brake Balance": trial.suggest_int("Brake Balance", 18, 118),
        "Suspension": trial.suggest_int("Suspension", 10, 110),
        "Differential": trial.suggest_int("Differential", 10, 100),
    }

#Engine: 45
#RearWing: 250
#FrontWing: 68
#BrakeBalance: 68
#Suspension: 60
#Differential: 35

    # Combine with static track/weather parameters
    input_data = pd.concat([track_weather, pd.DataFrame([params])], axis=1)

    # Predict average speed
    predicted_avg_speed = model.predict(input_data)[0]

    # We want to maximize speed
    return -predicted_avg_speed

# Create study
study = optuna.create_study(direction="minimize")

# Optimize
study.optimize(objective, n_trials=100)

# Output best results
best_trial = study.best_trial
print("\nBest Parameters:")
for key, value in best_trial.params.items():
    print(f"{key}: {value}")
print(f"Predicted Avg Speed: {-best_trial.value:.2f}")

## Race Strategy

In [None]:
import math

# Define the tire parameters and their lap time formulas
def lap_time_super_soft(X):
    return (78.0565192575555) + 0.114973111111111 * X

def lap_time_soft(X):
    return (78.8737020908889) + 0.136290277777778 * X

def lap_time_medium(X):
    return (79.1300868242222) + 0.131553711111111 * X

def lap_time_hard(X):
    return (79.2600146591429) + 0.137075542857143 * X

# Tire data with their lifespan
tire_lifespan = {
    "super_soft": 12,
    "soft": 24,
    "medium": 30,
    "hard": 35
}

# Pit stop penalty
pit_stop_time = 30  # seconds

# Function to calculate the total race time for a given strategy
def calculate_race_time(laps, strategy):
    total_time = 0
    total_pit_stops = 0
    lap_index = 0
    lap_counter = 0

    while lap_counter < laps:
        tire, stint_laps = strategy[lap_index]

        # Ensure we don't exceed the total laps
        if lap_counter + stint_laps > laps:
            stint_laps = laps - lap_counter

        # Calculate the lap times for this stint
        lap_times = []
        for i in range(stint_laps):
            if tire == "super_soft":
                lap_times.append(lap_time_super_soft(i + 1))
            elif tire == "soft":
                lap_times.append(lap_time_soft(i + 1))
            elif tire == "medium":
                lap_times.append(lap_time_medium(i + 1))
            elif tire == "hard":
                lap_times.append(lap_time_hard(i + 1))

        total_time += sum(lap_times)  # Add the lap times of this stint
        lap_counter += stint_laps

        # If we are not at the last stint, account for a pit stop
        if lap_counter < laps:
            total_time += pit_stop_time  # Pit stop penalty
            total_pit_stops += 1

        lap_index += 1
        if lap_index >= len(strategy):
            break

    return total_time, total_pit_stops

# Function to generate possible strategies dynamically
def generate_strategies(laps):
    strategies = []
    tire_choices = ["super_soft", "soft", "medium", "hard"]

    # Generate strategies by breaking the laps into multiple stints
    for tire1 in tire_choices:
        for tire2 in tire_choices:
            for tire3 in tire_choices:
                for tire4 in tire_choices:
                  for tire5 in tire_choices:
                    strategy = []
                    remaining_laps = laps

                    # Create dynamic stints for each tire
                    for tire in [tire1, tire2, tire3, tire4, tire5]:
                    #for tire in [tire1, tire2, tire3, tire4]:
                    #for tire in [tire1, tire2, tire3]:
                    #for tire in [tire1, tire2]:
                        stint_laps = tire_lifespan[tire]

                        if remaining_laps > stint_laps:
                            strategy.append((tire, stint_laps))
                            remaining_laps -= stint_laps
                        else:
                            strategy.append((tire, remaining_laps))
                            break

                    if sum([stint[1] for stint in strategy]) == laps:
                        strategies.append(strategy)

    return strategies

# Function to find the best strategy
def optimize_strategy(laps):
    best_time = math.inf
    best_strategy = None

    strategies = generate_strategies(laps)

    for strategy in strategies:
        total_time, pit_stops = calculate_race_time(laps, strategy)
        if total_time < best_time:
            best_time = total_time
            best_strategy = strategy
            best_pit_stops = pit_stops

    return best_strategy, best_time, best_pit_stops


# Main function
if __name__ == "__main__":
    race_laps = 75
    best_strategy, best_time, total_pit_stops = optimize_strategy(race_laps)
    print(f"Best Strategy: {best_strategy}")
    print(f"Best Total Time: {best_time} seconds")
    print(f"Total Pit Stops: {total_pit_stops}")

# **Analytics for Race 4**

This weeks Focus can be split up into three sections:
1.   Tuning weights of Practice_Data     
2.   Trying and Tuning Lasso Regression
3.   Direct optimization with Selenium

**Tuning Weights of Practice_Data**
During the Feedback-Session after Race 3 we noticed that the other teams were weighing their practice data differently/lighter within their Models and were getting better results doing this. This lead us to research how the practice_data should be weighed, realising that we had chose to heavy weights in the past. **(Quelle)**

Hier einfügen, wo drin steht, wie hoch die Gewichtung maximal sein darf etc.

To decide which weights to use with the specific datasets, we build a code, that uses optuna to optimize the model quality based on the weights of the practice_data.

Once the weights were tuned, we proceeeded with the optimization process that we already used in the earlier Races.

**Trying and Tuning Lasso Regression**
Another topic discussed during the feedback session was Lasso Regression, and that Prof. Heitmann had once mentioned that using Lasso Regression could potentially yield good results.  
Following this suggestion, we attempted to build a Lasso Regression model and to tune the alpha parameter.  
The tuned Lasso Regression model achieved an R² of less than 0.1. Despite this low performance, we proceeded to implement the model into our All-In-One optimizer. Technically, it was possible to integrate, but the suggested parameters did not produce meaningful results.  
We did not try to incorporate Lasso Regression into our Sequential Optimizer, as we had low expectations for good outcomes and believed it would be too time-consuming.

**Direct Optimization with Selenium**
During the analytics for Race 1, we unexpectedly discovered that the practice laps are not limited to 80, as the website suggests. The counter for remaining practice laps continues to count into the negative.  
Following this observation, it appears that the practice laps are not actually limited, making it feasible to replace the LightGBM model with direct interaction with the website for optimization.

Building on this idea, we discovered Selenium, which allows interaction with the website through a browser window. It then became a matter of writing code that:

- Opens the team-analytics.com website,  
- Logs in and navigates to the practice page,  
- Pastes the car parameters and stint information,  
- Clicks the submit practice button,  
- Reads the resulting average lap time.  

Once these steps were operational, it was straightforward to create callable functions encapsulating this process and integrate them with an optimization routine.

This resulted in the code found under "Direct Optimization with Selenium." Please note that this code does not run in Google Colab; it must be executed on a local Python environment.







In [None]:
# Team members working on this code: Paula Kussauer, Cedric Schwandt, Hannes Kock

## Sequential Optimization

### Once again: Train and Save each Model with full Dataset (exactly the same as R2&R3)

In [None]:
#Engine (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure' , 'Cornering']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 281,
    'max_depth': 11,
    'learning_rate': 0.04083240077095524,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9534759457760267,
    'bagging_fraction': 0.6637656869307713,
    'bagging_freq': 7,
    'lambda_l1': 0.48486124811843856,
    'lambda_l2': 1.0469435993282974
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("Engine_lgbm_model_R4.txt")

print("Model training complete and saved to 'Engine_lgbm_model_R4.txt'")


In [None]:
#Differential (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 126,
    'max_depth': 8,
    'learning_rate': 0.05725939263903659,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9048274481436259,
    'bagging_fraction': 0.6871315558035545,
    'bagging_freq': 1,
    'lambda_l1': 2.9809700145880162,
    'lambda_l2': 2.893530172381254
    }


# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("Differential_lgbm_model_R4.txt")

print("Model training complete and saved to 'Differential_lgbm_model_R4.txt'")


In [None]:
#Rear Wing (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure' , 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 41,
    'max_depth': 14,
    'learning_rate': 0.07379289439034513,
    'min_data_in_leaf': 14,
    'feature_fraction': 0.9805583874353474,
    'bagging_fraction': 0.757494844528942,
    'bagging_freq': 1,
    'lambda_l1': 3.3280185770912905,
    'lambda_l2': 4.668868493827121
    }

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("RearWing_lgbm_model_R4.txt")

print("Model training complete and saved to 'RearWing_lgbm_model_R4.txt'")


In [None]:
#Front Wing (hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 228,
    'max_depth': 6,
    'learning_rate': 0.10991909073754202,
    'min_data_in_leaf': 94,
    'feature_fraction': 0.954670793846081,
    'bagging_fraction': 0.7659843231590522,
    'bagging_freq': 1, 'lambda_l1': 3.8616883301361553,
    'lambda_l2': 4.896984209036385
    }


# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("FrontWing_lgbm_model_R4.txt")

print("Model training complete and saved to 'FrontWing_lgbm_model_R4.txt'")


In [None]:
#Suspension
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 31,
    'max_depth': 10,
    'learning_rate': 0.14051851191638579,
    'min_data_in_leaf': 75,
    'feature_fraction': 0.999888882811791,
    'bagging_fraction': 0.7009625088481276,
    'bagging_freq': 1,
    'lambda_l1': 4.8851994495913145,
    'lambda_l2': 3.6824821872767783
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("Suspension_lgbm_model_R4.txt")

print("Model training complete and saved to 'Suspension_lgbm_model_R4.txt'")


In [None]:
#Brake Balance (hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df[['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine','Cornering', 'Width', 'Roughness', 'Temperature']]
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 27,
    'max_depth': 11,
    'learning_rate': 0.18696777761444966,
    'min_data_in_leaf': 73,
    'feature_fraction': 0.964098272670725,
    'bagging_fraction': 0.8852821315081366,
    'bagging_freq': 10,
    'lambda_l1': 3.486399193949678,
    'lambda_l2': 3.7053586457221366
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("BrakeBalance_lgbm_model_R4.txt")

print("Model training complete and saved to 'BrakeBalance_lgbm_model_R4.txt'")


### Optimize Models using Practice_Data

In [None]:
import pandas as pd
import lightgbm as lgb
import os

# Model names
model_names = [
    "Engine_lgbm_model_R4.txt",
    "Differential_lgbm_model_R4.txt",
    "RearWing_lgbm_model_R4.txt",
    "FrontWing_lgbm_model_R4.txt",
    "Suspension_lgbm_model_R4.txt",
    "BrakeBalance_lgbm_model_R4.txt"
]

# Corresponding feature sets
feature_sets = {
    "Engine": ['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure', 'Cornering'],
    "Differential": ['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density'],
    "RearWing": ['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness'],
    "FrontWing": ['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)'],
    "Suspension": ['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width'],
    "BrakeBalance": ['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Width', 'Roughness', 'Temperature']
}

# Load new data
df = pd.read_csv("practice_data_newnamesV4_3.csv")

# Compute Avg Speed for new data
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Target variable
y_new = df["Avg Speed"]

# Sample weight
sample_weight = [22] * len(y_new)

# Training each model with its feature set
for model_file in model_names:
    model_key = model_file.split("_")[0]  # Extract the prefix e.g., "Engine"
    features = feature_sets[model_key]

    print(f"\nUpdating {model_file} with features: {features}")

    # Prepare feature matrix
    X_new = df[features]

    # Load existing model
    model = lgb.Booster(model_file=model_file)

    # Create dataset
    new_data = lgb.Dataset(X_new, label=y_new, weight=sample_weight)

    # Continue training
    updated_model = lgb.train(
        params={},
        train_set=new_data,
        init_model=model,
        num_boost_round=1000
    )

    # Save updated model to folder
    folder_name = 'carpart_lgbm_model_R4'

    # Create folder if it deos not exist
    if not os.path.exists(folder_name):
        os.makedirs(folder_name)

    #update filename
    updated_file = model_file.replace(".txt", "_updated.txt")

    # complete file path
    updated_file_path = os.path.join(folder_name, updated_file)

    # save model to path
    updated_model.save_model(updated_file_path)

    print(f'Model saved to: {updated_file_path}')
    print(f"Model saved as {updated_file}")

print("\nAll models updated successfully.")

### Sequentially optimizing Car-Parameters

In [None]:
import pandas as pd
import lightgbm as lgb
import optuna

# Ordered list of car parameters to optimize
params_to_optimize = [
    "Engine", "Differential", "RearWing",
    "FrontWing", "Suspension", "BrakeBalance"
]

# Feature sets used by each model for prediction
feature_sets = {
    "Engine": ['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure', 'Cornering'],
    "Differential": ['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density'],
    "RearWing": ['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness'],
    "FrontWing": ['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)'],
    "Suspension": ['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width'],
    "BrakeBalance": ['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Width', 'Roughness', 'Temperature']
}

# Load static base data (track & weather)
base_data = pd.read_csv("track_weather_japan.csv")

# Dictionary to store optimized values
optimized_params = {}

# Sequential optimization loop
for car_part in params_to_optimize:
    print(f"\n🔧 Optimizing {car_part}...")

    # Load the pretrained LightGBM model from folder
    folder_name = 'carpart_lgbm_model_R4'
    model_file_name = f"{car_part}_lgbm_model_R4_updated.txt"
    model_file_path = os.path.join(folder_name, model_file_name)
    model = lgb.Booster(model_file=model_file_path)
    features = feature_sets[car_part]

    def objective(trial):
        # Clone base data for this trial
        input_data = base_data.copy()

        # Add fixed parameters (already optimized)
        for param, value in optimized_params.items():
            input_data[param] = value

        # Suggest value only for the current parameter
        trial_value = trial.suggest_int(car_part, 1, 500)
        input_data[car_part] = trial_value

        # Fill in missing feature columns with default (0)
        for feature in features:
            if feature not in input_data.columns:
                input_data[feature] = 0

        # Predict average speed
        X = input_data[features]
        avg_speed = model.predict(X)[0]
        return avg_speed

    # Run Optuna optimization for this parameter
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=100, show_progress_bar=True)

    # Save best value
    best_value = study.best_params[car_part]
    optimized_params[car_part] = best_value
    print(f"✅ Best {car_part}: {best_value}")

# Final results
print("\n🎯 Final Optimized Parameters:")
for k, v in optimized_params.items():
    print(f"{k}: {v}")


In [None]:
import pandas as pd
import lightgbm as lgb
import optuna

# Ordered list of parameters to optimize
params_to_optimize = ["Engine", "Differential", "RearWing", "FrontWing", "Suspension", "BrakeBalance"]

# Corresponding feature sets for each model
feature_sets = {
    "Engine": ['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure', 'Cornering'],
    "Differential": ['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density'],
    "RearWing": ['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness'],
    "FrontWing": ['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)'],
    "Suspension": ['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width'],
    "BrakeBalance": ['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Width', 'Roughness', 'Temperature']
}

# Load static track & weather data
base_data = pd.read_csv("track_weather_japan.csv")

# This will hold the best parameter values as we optimize them
optimized_params = {}

# Begin sequential optimization
for car_part in params_to_optimize:
    print(f"\n🔧 Optimizing {car_part}...")

    model_path = f"{car_part}_lgbm_model_R4.txt"
    model = lgb.Booster(model_file=model_path)
    features = feature_sets[car_part]

    def objective(trial):
        # Set the current parameter being optimized

        params = {
            "Engine": trial.suggest_int("Engine", 1, 500),
            "RearWing": trial.suggest_int("RearWing", 1, 500),
            "FrontWing": trial.suggest_int("FrontWing", 1, 500),
            "BrakeBalance": trial.suggest_int("BrakeBalance", 1, 500),
            "Suspension": trial.suggest_int("Suspension", 1, 500),
            "Differential": trial.suggest_int("Differential", 1, 500),}

        current_value = params[car_part]

        # Create a single row input with all needed features
        input_data = base_data.copy()

        # Add previously optimized params
        for param, value in optimized_params.items():
            input_data[param] = value

        # Add current trial value
        input_data[car_part] = current_value

        # If any missing feature, fill with 0 or a safe default
        for col in features:
            if col not in input_data.columns:
                input_data[col] = 0

        # Ensure correct order of features
        X = input_data[features]

        # Predict Avg Speed
        avg_speed = model.predict(X)[0]
        return avg_speed  # Optuna maximizes this

    # Run Optuna
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=250, show_progress_bar=True)

    best_value = study.best_params[car_part]
    optimized_params[car_part] = best_value
    print(f"✅ Best {car_part}: {best_value}")

# Final results
print("\n🎯 Final Optimized Parameters:")
for k, v in optimized_params.items():
    print(f"{k}: {v}")


## Tuning weights

The following code first builds and saves a LightGBM regression model for predicting average speed based on simulator driving data.
In the second part, it loads this trained model and applies it to the practice_data, using cross-validation and Optuna hyperparameter optimization to find the optimal sample weight scaling factor that minimizes prediction error on the new data.

In [None]:
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time"])
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 242,
    'max_depth': 9,
    'learning_rate': 0.0645474493988489,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9427185636727835,
    'bagging_fraction': 0.785711983453578,
    'bagging_freq': 2,
    'lambda_l1': 0.6006835353397709,
    'lambda_l2': 1.4905819957883624,
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("optm_weights_lgbm_model_R4.txt")

print("Model training complete and saved to 'optm_weights_lgbm_model_R4.txt'")

In [None]:
from pickle import FALSE
import pandas as pd
import lightgbm as lgb
import optuna
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import numpy as np

# Load the original model
model = lgb.Booster(model_file="optm_weights_lgbm_model_R4.txt")

# Load the new data
new_df = pd.read_csv("practice_dataV4_3.csv")

# Compute Avg Speed for new data
new_df["Avg Speed"] = (new_df["Lap Distance"] / new_df["Lap Time"]) * 3600

# Prepare features and target
X_new = new_df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time", "Round", "Track", "Qualifying",
                             "Stint", "Lap", "Fuel", "Tyre Remaining", "Tyre Choice"])
y_new = new_df["Avg Speed"]

def objective(trial):
    # Suggest a scaling factor for sample weights
    weight_scaling_factor = trial.suggest_int('weight_scaling_factor', 1, 100, log = True )
    sample_weight = [weight_scaling_factor] * len(y_new)

    # Use k-fold cross-validation
    kf = KFold(n_splits=3, shuffle=True, random_state=42)
    rmse_list = []

    # Train and evaluate using cross-validation
    for train_index, valid_index in kf.split(X_new):
        X_train, X_valid = X_new.iloc[train_index], X_new.iloc[valid_index]
        y_train, y_valid = y_new.iloc[train_index], y_new.iloc[valid_index]
        train_weight = np.array(sample_weight)[train_index]

        train_data = lgb.Dataset(X_train, label=y_train, weight=train_weight)
        valid_data = lgb.Dataset(X_valid, label=y_valid, weight=np.array(sample_weight)[valid_index], reference=train_data)

        # Train the model
        model_tmp = model
        model_tmp = lgb.train(
            params={},  # Use default params unless specified
            train_set=train_data,
            init_model=model,
            num_boost_round=1000,  # You can increase if needed
        )

        # Predict and calculate RMSE for this fold
        y_pred = model_tmp.predict(X_valid, num_iteration=model_tmp.best_iteration)
        #rmse = mean_squared_error(y_valid, y_pred, squared=False)
        rmse = np.sqrt(((y_valid - y_pred)**2).mean())
        rmse_list.append(rmse)

    # Return the average RMSE over all folds
    mean_rmse = np.mean(rmse_list)
    return mean_rmse

# Create Optuna study to minimize RMSE
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50)  # You can adjust the number of trials

best_weight_factor = study.best_params['weight_scaling_factor']
print(f"Best weight scaling factor: {best_weight_factor}")


## Lasso Regression

In [None]:
import pandas as pd
import numpy as np
from sklearn.linear_model import Lasso
from sklearn.model_selection import KFold, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, make_scorer
import optuna

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600
X = df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time"])
y = df["Avg Speed"]

# Define the Optuna objective
def objective(trial):
    alpha = trial.suggest_float("alpha", 1e-4, 10.0, log=True)

    # Create pipeline
    model = Pipeline([
        ("scaler", StandardScaler()),
        ("lasso", Lasso(alpha=alpha, max_iter=10000))
    ])

    # K-Fold CV
    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    scores = cross_val_score(model, X, y, cv=kf, scoring="neg_root_mean_squared_error")

    return -np.mean(scores)

# Run Optuna study
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=150)

# Print best params and score
print("Best RMSE:", study.best_value)
print("Best parameters:", study.best_params)

# Train final model on full data
best_alpha = study.best_params["alpha"]
final_model = Pipeline([
    ("scaler", StandardScaler()),
    ("lasso", Lasso(alpha=best_alpha, max_iter=10000))
])
final_model.fit(X, y)


In [None]:
import pandas as pd
import numpy as np
from sklearn.linear_model import Lasso
from sklearn.model_selection import KFold, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import make_scorer, r2_score
import optuna

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600


X = df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time"])
y = df["Avg Speed"]

# Define Optuna objective function
def objective(trial):
    alpha = trial.suggest_float("alpha", 1e-4, 10.0, log=True)

    pipeline = Pipeline([
        ("scaler", StandardScaler()),
        ("lasso", Lasso(alpha=alpha, max_iter=10000))
    ])

    kf = KFold(n_splits=5, shuffle=True, random_state=42)

    # Negative RMSE
    rmse_scores = cross_val_score(pipeline, X, y, cv=kf, scoring="neg_root_mean_squared_error")
    mean_rmse = -np.mean(rmse_scores)

    # R^2 Score (logged for info)
    r2_scores = cross_val_score(pipeline, X, y, cv=kf, scoring="r2")
    mean_r2 = np.mean(r2_scores)

    # Log R²
    trial.set_user_attr("mean_r2", mean_r2)

    return mean_rmse

# Run Optuna tuning
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=100)

# Best result
print("Best RMSE:", study.best_value)
print("Best alpha:", study.best_params["alpha"])
print("R² for best trial:", study.best_trial.user_attrs["mean_r2"])


In [None]:
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
import joblib

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600
X = df[["Cornering", "Inclines", "Camber", "Grip", "Wind (Avg. Speed)",
          "Temperature", "Humidity", "Air Density", "Air Pressure", "Wind (Gusts)",
          "Altitude", "Roughness", "Width", "Rear Wing", "Engine", "Front Wing",
          "Brake Balance", "Differential", "Suspension"]]
        #X = df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time"])
y = df["Avg Speed"]

# Manually set best alpha
best_alpha = 0.009431544737855992

# Build pipeline
model_pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("lasso", Lasso(alpha=best_alpha, max_iter=10000))
])

# Train model
model_pipeline.fit(X, y)

# Save model to file
joblib.dump(model_pipeline, "lasso_model.pkl")

print("Model saved as 'lasso_model.pkl'")


In [None]:
import pandas as pd
import numpy as np
import optuna
import joblib

feature_order = [
    "Cornering", "Inclines", "Camber", "Grip", "Wind (Avg. Speed)",
    "Temperature", "Humidity", "Air Density", "Air Pressure", "Wind (Gusts)",
    "Altitude", "Roughness", "Width", "Rear Wing", "Engine", "Front Wing",
    "Brake Balance", "Differential", "Suspension"]


# Load the saved model
model = joblib.load("lasso_model.pkl")

# Load constant track and weather data
#track_weather = pd.read_csv("track_weather_japan.csv")
track_weather_data = track_weather.iloc[0]  # Use first row (assumed to be constant)

# Define Optuna optimization function
def objective(trial):
    # Car setup parameters (suggested by Optuna)
    params = {
        "Rear Wing": trial.suggest_int("Rear Wing", 1, 500),
        "Engine": trial.suggest_int("Engine", 1, 500),
        "Front Wing": trial.suggest_int("Front Wing", 1, 500),
        "Brake Balance": trial.suggest_int("Brake Balance", 1, 500),
        "Differential": trial.suggest_int("Differential", 1, 500),
        "Suspension": trial.suggest_int("Suspension", 1, 500),

    }
     # Merge with constant track/weather data
    input_dict = {**track_weather_data.to_dict(), **params}

   # input_data = pd.concat([track_weather, pd.DataFrame([params])], axis=1)

    # Combine car params and constant track/weather data into a single DataFrame
    input_data = pd.DataFrame([[input_dict[col] for col in feature_order]], columns=feature_order)

    # Predict Avg Speed (Lasso was trained with standardized inputs)
    predicted_speed = model.predict(input_data)[0]

    # Since Optuna minimizes, return negative speed to maximize it
    return -predicted_speed

# Run the Optuna study
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=250)

# Show best result
print("Best Avg Speed:", -study.best_value)
print("Best Parameters:", study.best_params)


## Direct Optimization with Selenium

This code does not run in Google Colab, it must be run in a local instance of python.

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
import optuna

USERNAME = 'xxx@studium.uni-hamburg.de'
PASSWORD = 'xxx'
STINT_LENGTH = 5
FUEL_LOAD = 16
COUNTER = 11 #set to last xpath tr value +1 or last stint +2
N_TRAILS = 1

# Function to set up the browser
def setup_browser():

    chrome_options = Options()

    # Open the browser
    driver = webdriver.Chrome(options=chrome_options)

    return driver


# Function to login and navigate to the page with the form
def login_and_navigate(driver, username, password):
    # Open the login page
    driver.get("https://team-analytics.com/f1/")  # Replace with your login URL

    # Fill in the login credentials
    driver.find_element(By.NAME, 'email').send_keys(username)  # Replace 'username' with actual element ID
    driver.find_element(By.NAME, 'password').send_keys(password)  # Replace 'password' with actual element ID

    # Click the login button
    driver.find_element(By.NAME, 'login_user_btn').click()  # Replace 'login-button' with the correct button ID

    # Wait for the login to complete (adjust sleep time as necessary)
    time.sleep(3)

     #Click the button that takes you to the next page
    next_page_button = driver.find_element(By.NAME, 'practice_round')  # Adjust the ID of the button
    next_page_button.click()

    # Wait for the page to load (adjust as necessary)
    time.sleep(3)

def parse_time_string(time_str):
    """Parses a time string in the format 'MM:SS:MS' into total seconds."""
    minutes, seconds, milliseconds = map(int, time_str.split(':'))
    total_seconds = minutes * 60 + seconds + milliseconds / 1000.0
    return total_seconds

# Function to fill in the form and get the result (reuse the open driver session)
def get_avg_time(driver, params):


    # Locate and fill the input fields (replace 'paramX' with actual field IDs)
    driver.execute_script("""
    const input = document.querySelector('[name="rearwing"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[0]))

    driver.execute_script("""
    const input = document.querySelector('[name="engine"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[1]))

    driver.execute_script("""
    const input = document.querySelector('[name="frontwing"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[2]))

    driver.execute_script("""
    const input = document.querySelector('[name="brake"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[3]))

    driver.execute_script("""
    const input = document.querySelector('[name="differential"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[4]))

    driver.execute_script("""
    const input = document.querySelector('[name="suspension"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[5]))

    stintlenght = 3
    driver.execute_script("""
    const input = document.querySelector('[name="stint_length"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(STINT_LENGTH))

    fuelload = 12
    driver.execute_script("""
    const input = document.querySelector('[name="fuel_load"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(FUEL_LOAD))



    # Submit the form
    driver.find_element(By.NAME, 'submit_practice_stint').click()

    # Wait for the result to load
    time.sleep(3)

    if not hasattr(get_avg_time, "counter"):
        get_avg_time.counter = COUNTER #set to last xpath tr value +1 / last stint +2

    # Extract the avg. time result
    xpath = f'//*[@id="submit_practice"]/table[2]/tbody/tr[{get_avg_time.counter}]/td[12]'
    time_str_element = driver.find_element(By.XPATH, xpath)
    time_str = time_str_element.text
    avg_time = parse_time_string(time_str)

    # Increment the counter
    get_avg_time.counter += 1

    return avg_time


# Function to close the browser
def close_browser(driver):
    driver.quit()

# Define the objective function for Optuna
def objective(trial, driver):
    # Sample values for the 6 parameters
    param1 = trial.suggest_int('param1', low=350, high=500) #RearWing
    param2 = trial.suggest_int('param2', low=20, high=150)    #Engine
    param3 = trial.suggest_int('param3', low=300, high=450)  #FrontWing
    param4 = trial.suggest_int('param4', low=100, high=250)   #Brake
    param5 = trial.suggest_int('param5', low=35, high=250)  #Differential
    param6 = trial.suggest_int('param6', low=50, high=200)  #Suspension

    # Bundle the parameters into a list
    params = [param1, param2, param3, param4, param5, param6]

    # Get the avg time from the website (reuse the same driver)
    avg_time = get_avg_time(driver, params)

    return avg_time

# Set up the browser and login
driver = setup_browser()
login_and_navigate(driver, USERNAME, PASSWORD)

# Create an Optuna study to optimize the objective function
storage = "sqlite:///DirectOptStudyOneR4.db"
study = optuna.create_study(direction='minimize', study_name="DirectOptStudyOneR5", storage=storage, load_if_exists=True)
study.optimize(lambda trial: objective(trial, driver), n_trials=N_TRAILS)

# Print the best parameters and corresponding result
print("Best parameters found:", study.best_params)
print("Best avg. time:", study.best_value)

# Close the browser after optimization
close_browser(driver)




# **Analytics for Race 5**

The goals of this weeks Analytics were twofold.

1. We wanted to further look into weighing our Data, not only weighing the whole set of Practice_data, but also weighig each Line of Simulator_Data based on its distance to the Races Track/Weather Conditions.
2. We wanted to integrate the "Direct Optimization with Selenium" into our optimizatin Workflow and use it to vary the parameters in a defindes space to create more Data.

**Weighing Simulator and Pracice Data**
This week we wanted to further look into weighing the Simulator_data and Practice_Data. We want to weigh each line of Simulator_Data depending on its closeness to the actual Track_Data. For weighing the Practice_Data we want to use the same optimization as last week.
In addition to this,we want to be able, to weigh the Practice_Data from the last races different to the Practice_Data from this race. (e.g. prior practice_data weight = 2, current practice_data weight = 5). The code found in "Weighing Each Line of Simulator_Data" weights every line of Simulator_Data and then combines the weighed Simulator_Data with the the two Practice_Data sets. The weights, that are assinged to the two practice data sets can be set manually within the code and are optimized beforehand.

This week, we focused on further refining the weighting of our Simulator_Data and Practice_Data. Our goal is to assign weights to each line of Simulator_Data based on its proximity to the actual Track_Data, giving more importance to data that closely matches the real track conditions. For Practice_Data, we plan to use the same optimization approach as last week to determine appropriate weights.

Additionally, we want the flexibility to assign different weights to Practice_Data from previous races compared to the Practice_Data from the current race, for example, setting a weight of 2 for prior practice_data and 5 for practice_data from the current race. The code in "Weighing Each Line of Simulator_Data" handles the weighting of each Simulator_Data line and then combines it with the two Practice_Data sets. The weights assigned to the practice datasets can be manually set within the code and are optimized beforehand to find the best combination.

The weighing of the Simulator_Data lines works as follows:

**Hier einfügen, wie die gewichtung der einzelnen datenzeilen funktioniert, gerne in einem Stil wie ich  ihn zum beispiel auch bei Race 1 oder 4 genutzt habe um logisches vorgehen des codes zu beschreiben.**

Once the code for weighting the data was developed, we modified our hyperparameter optimization, model training, and sequential optimization routines to incorporate the new, weighted datasets. This ensured that all processes could utilize the adjusted data effectively. We then proceeded as described, leveraging the weighted Simulator_Data and Practice_Data to refine our models and optimization strategies, ultimately aiming for improved predictive accuracy and more realistic simulation results.

**Implementing the direct Optimization**

We aimed to combine our direct optimization approach with the sequential optimization method. To achieve this, we first optimized the car parameters using the weighted data and our sequential model. Subsequently, we configured the direct optimization to vary the parameters within an interval around the values obtained from the sequential model. This approach is similar to the parameter variation we performed with our all-in-one optimizer, as described in "Analytics for Race 2." The generated practice laps from this process could then be fed back into the sequential model for further refinement. We repeated this process for multiple iterations, and after each iteration, the parameter ranges within the direct optimization were redefined based on the results from the sequential optimizer.

We tried doing this for 15 trails, using fuel=10 and stint_lenght=1, so 15 Laps per iteration. But due to some hickups and user error we were not able to acurately do this, but were forced to do three iterations.




In [None]:
# Team members working on this code: Paula Kussauer, Cedric Schwandt, Hannes Kock

## Weighing Simulator_Data and Practice_Data




### Weighing Practice_Data vs Simulator_Data

The part in this Notebook following this one weighs each Lap from the Simulator_Data and then merges the Simulator_Data and Practice_Data.

When it merges the two datasets, it also assigns weights to the Practice_Data. We use these two Codecells to figure out what that weight sould be.

Its the same as the one from last week, it just uses different Data.

In [None]:
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time"])
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 242,
    'max_depth': 9,
    'learning_rate': 0.0645474493988489,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9427185636727835,
    'bagging_fraction': 0.785711983453578,
    'bagging_freq': 2,
    'lambda_l1': 0.6006835353397709,
    'lambda_l2': 1.4905819957883624,
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("optm_weights_lgbm_model_R5.txt")

print("Model training complete and saved to 'optm_weights_lgbm_model_R5.txt'")


In [None]:
from pickle import FALSE
import pandas as pd
import lightgbm as lgb
import optuna
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import numpy as np

# Load the original model
model = lgb.Booster(model_file="optm_weights_lgbm_model_R5.txt")

# Load the new data
new_df = pd.read_csv("practice_data_newnamesV5_1.csv")

# Compute Avg Speed for new data
new_df["Avg Speed"] = (new_df["Lap Distance"] / new_df["Lap Time"]) * 3600

# Prepare features and target
X_new = new_df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time", "Round", "Track", "Qualifying",
                             "Stint", "Lap", "Fuel", "Tyre Remaining", "Tyre Choice"])
y_new = new_df["Avg Speed"]

def objective(trial):
    # Suggest a scaling factor for sample weights
    weight_scaling_factor = trial.suggest_int('weight_scaling_factor', 1, 15, log = True )
    sample_weight = [weight_scaling_factor] * len(y_new)

    # Use k-fold cross-validation
    kf = KFold(n_splits=3, shuffle=True, random_state=42)
    rmse_list = []

    # Train and evaluate using cross-validation
    for train_index, valid_index in kf.split(X_new):
        X_train, X_valid = X_new.iloc[train_index], X_new.iloc[valid_index]
        y_train, y_valid = y_new.iloc[train_index], y_new.iloc[valid_index]
        train_weight = np.array(sample_weight)[train_index]

        train_data = lgb.Dataset(X_train, label=y_train, weight=train_weight)
        valid_data = lgb.Dataset(X_valid, label=y_valid, weight=np.array(sample_weight)[valid_index], reference=train_data)

        # Train the model
        model_tmp = model
        model_tmp = lgb.train(
            params={},  # Use default params unless specified
            train_set=train_data,
            init_model=model,
            num_boost_round=1000,  # You can increase if needed
        )

        # Predict and calculate RMSE for this fold
        y_pred = model_tmp.predict(X_valid, num_iteration=model_tmp.best_iteration)
        #rmse = mean_squared_error(y_valid, y_pred, squared=False)
        rmse = np.sqrt(((y_valid - y_pred)**2).mean())
        rmse_list.append(rmse)

    # Return the average RMSE over all folds
    mean_rmse = np.mean(rmse_list)
    return mean_rmse

# Create Optuna study to minimize RMSE
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=30)  # You can adjust the number of trials

best_weight_factor = study.best_params['weight_scaling_factor']
print(f"Best weight scaling factor: {best_weight_factor}")


### Weighing Each Line of Simulator_Data

This script merges simulated and real racing data, calculates weights (higher if simulated data resembles real conditions), and outputs the consolidated data for further machine learning, giving real/practice laps greater emphasis.

The weights of the SImulation_Data Laps are calculated based on Gaussian RBF kernel weights, based on how "close" each simulated lap's track/weather features are to the centroid of the practice laps (in Mahalanobis distance).

In [None]:
# rbf_weights_auto.py  ︳  nur EINMAL pro Race ausführen
import numpy as np
import pandas as pd
from scipy.spatial.distance import mahalanobis

# -----------------------------------------------------------
# A) Dateipfade anpassen
# -----------------------------------------------------------
SIM_CSV      = "simulator_data_newnames.csv"   # 10 000 Sim-Laps
PRACTICEOLD_CSV = "practice_data_newnamesV5_1.csv"             # 80 echte Laps
PRACTICENEW_CSV = "practice_data_newnamesV5_3.csv"
ENV_CSV      = "track_weather_USA.csv" # 1 Zeile Track & Weather

# -----------------------------------------------------------
# B) Daten laden
# -----------------------------------------------------------
df_sim = pd.read_csv(SIM_CSV)
df_pro  = pd.read_csv(PRACTICEOLD_CSV)
df_prn  = pd.read_csv(PRACTICENEW_CSV)
env_df = pd.read_csv(ENV_CSV)                  # eine Zeile

# -----------------------------------------------------------
# C) Meta-Spalten automatisch ableiten
# -----------------------------------------------------------
# 1. Alle Car-Parameter (= Optimierungs­kandidaten) sammeln
car_param_cols = {
    "Engine", "Differential", "RearWing", "FrontWing",
    "Suspension", "BrakeBalance"
}

# 2. Alles, was in der ENV-Datei steht, ist ein Track/Weather-Feature
env_cols = env_df.columns.tolist()             # z.B. Air Pressure, Grip, …

# 3. Sicherheitshalber nur die Spalten behalten,
#    die auch wirklich in den Sim/Practice-CSVs stehen
meta_cols = [c for c in env_cols if c in df_sim.columns]

# -----------------------------------------------------------
# D) RBF-Kernel-Gewichte berechnen
# -----------------------------------------------------------
tau = 2.0  # Bandbreite
X_sim_meta = df_sim[meta_cols].values

# Use target track/weather parameters from ENV_CSV instead of practice centroid
env_point = env_df[meta_cols].values.flatten()  # 1D array of target conditions

# Covariance matrix (use simulator data for shape/spread)
Sigma_inv = np.linalg.inv(np.cov(X_sim_meta, rowvar=False))

k_sim = np.exp([
    -mahalanobis(x, env_point, Sigma_inv) / tau
    for x in X_sim_meta
])  # k_i = exp(-d_i/τ)

w_pro = np.full(len(df_pro), 1)  # konstant für Practice, wie gehabt
w_prn = np.full(len(df_prn), 6)  # konstant für Practice, wie gehabt
weights = np.concatenate([k_sim, w_pro, w_prn])

# -----------------------------------------------------------
# E) Gemeinsames DataFrame für alle weiteren Modelle
# -----------------------------------------------------------
df_all = pd.concat([df_sim, df_pro, df_prn], ignore_index=True)

df_all["Avg Speed"] = (df_all["Lap Distance"] / df_all["Lap Time"]) * 3600


df_all["sample_weight"] = weights     # Add weights as a new column

# List of columns to drop
drop_cols = [
    "Lap Distance", "Lap Time", "Round", "Track", "Qualifying", "Stint", "Lap",
    "Fuel", "Tyre Remaining", "Tyre Choice"
]

# Drop the columns (ignore errors if column missing just in case)
df_all_dropped = df_all.drop(columns=drop_cols, errors="ignore")

OUTPUT_CSV = "all_data_with_weights.csv"
df_all_dropped.to_csv(OUTPUT_CSV, index=False)


### Trying Weighed Data on AllInOne Optimization code

In [None]:
import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("all_data_with_weights.csv")  # <--- new combined data with weights included!

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }

    # K-Fold Cross-Validation
    kf = KFold(n_splits=3, shuffle=True, random_state=42)

    # Drop unwanted columns
    X = df.drop(columns=['Avg Speed', 'sample_weight'])  # ensure these are not in X
    y = df["Avg Speed"]
    weights = df["sample_weight"].values
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        train_weights, val_weights = weights[train_idx], weights[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train, weight=train_weights)
        dvalid = lgb.Dataset(X_val, label=y_val, weight=val_weights)

        model = lgb.train(param, dtrain, num_boost_round=100,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        rmse = np.sqrt(((y_val - preds) ** 2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=500)

print("Best trial:")
print(study.best_trial.params)

In [None]:
{'num_leaves': 78, 'max_depth': 15, 'learning_rate': 0.11424671210900976, 'min_data_in_leaf': 48, 'feature_fraction': 0.7696371152792187, 'bagging_fraction': 0.9622430850853232, 'bagging_freq': 10, 'lambda_l1': 0.4732893195494387, 'lambda_l2': 2.047352465618619}

In [None]:
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the weighed data
df = pd.read_csv("all_data_with_weights.csv")

# Features and target
X = df.drop(columns=["Avg Speed", "sample_weight"])
y = df["Avg Speed"]
weights = df["sample_weight"].values

# Define LightGBM dataset with weights
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 78,
    'max_depth': 15,
    'learning_rate': 0.11424671210900976,
    'min_data_in_leaf': 48,
    'feature_fraction': 0.7696371152792187,
    'bagging_fraction': 0.9622430850853232,
    'bagging_freq': 10,
    'lambda_l1': 0.4732893195494387,
    'lambda_l2': 2.047352465618619}

# Train on full dataset using sample weights
model = lgb.train(params, lgb_data, num_boost_round=100)

# Save the model to a file
model.save_model("best_lgbm_model_R5.txt")

print("Model training complete and saved to 'best_lgbm_model_R5.txt'")

In [None]:
import pandas as pd
import lightgbm as lgb
import optuna


# Load trained model
#model = lgb.Booster(model_file="best_lgbm_model_updated.txt")
model = lgb.Booster(model_file="best_lgbm_model_R5.txt")

# Load track/weather data (single row)
track_weather = pd.read_csv("track_weather_USA.csv")

# Define the objective function
def objective(trial):
    # Suggest car parameters
    params = {
        "Engine": trial.suggest_int("Engine", 1, 500),
        "Rear Wing": trial.suggest_int("Rear Wing", 1, 500),
        "Front Wing": trial.suggest_int("Front Wing", 1, 500),
        "Brake Balance": trial.suggest_int("Brake Balance", 1, 500),
        "Suspension": trial.suggest_int("Suspension", 1, 500),
        "Differential": trial.suggest_int("Differential", 1, 500),
    }
#Engine: 91
#Differential: 101
#RearWing: 489
#FrontWing: 274
#Suspension: 45
#BrakeBalance: 101

    # Combine with static track/weather parameters
    input_data = pd.concat([track_weather, pd.DataFrame([params])], axis=1)

    # Predict average speed
    predicted_avg_speed = model.predict(input_data)[0]

    # We want to maximize speed
    return -predicted_avg_speed

# Create study
study = optuna.create_study(direction="minimize")

# Optimize
study.optimize(objective, n_trials=1000)

# Output best results
best_trial = study.best_trial
print("\nBest Parameters:")
for key, value in best_trial.params.items():
    print(f"{key}: {value}")
print(f"Predicted Avg Speed: {-best_trial.value:.2f}")

## Weighed Data and Sequential Optimization

### Hyperparameter Optimization

In [None]:
#Engine
import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("all_data_with_weights.csv")

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }

    # K-Fold Cross-Validation
    kf = KFold(n_splits=3, shuffle=True, random_state=42)
    X = df[['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure', 'Cornering']]
    y = df["Avg Speed"]
    weights = df["sample_weight"].values
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        w_train, w_val = weights[train_idx], weights[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train, weight=w_train)
        dvalid = lgb.Dataset(X_val, label=y_val, weight=w_val)

        model = lgb.train(param, dtrain, num_boost_round=100,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=500)

print("Best trial:")
print(study.best_trial.params)

In [None]:
Best trial Engine:
{'num_leaves': 228, 'max_depth': 9, 'learning_rate': 0.019795034976830574, 'min_data_in_leaf': 45, 'feature_fraction': 0.6769051793180653, 'bagging_fraction': 0.5679615916585826, 'bagging_freq': 5, 'lambda_l1': 0.2140082876917294, 'lambda_l2': 0.778813758195442}

In [None]:
#Hyperparameter Optimization Differential
#https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html

import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("all_data_with_weights.csv")

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    kf = KFold(n_splits=3, shuffle=True, random_state=42)
    X = df[['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density']]
    y = df["Avg Speed"]
    weights = df["sample_weight"].values
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        w_train, w_val = weights[train_idx], weights[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train, weight=w_train)
        dvalid = lgb.Dataset(X_val, label=y_val, weight=w_val)

        model = lgb.train(param, dtrain, num_boost_round=100,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=500)

print("Best trial:")
print(study.best_trial.params)

In [None]:
Best trial Differential:
{'num_leaves': 95, 'max_depth': 14, 'learning_rate': 0.020430708599380454, 'min_data_in_leaf': 11, 'feature_fraction': 0.8289782961700536, 'bagging_fraction': 0.6054739109884507, 'bagging_freq': 6, 'lambda_l1': 0.164978854742107, 'lambda_l2': 0.09387958914404965}

In [None]:
#Hyperparameter Optimization Rear Wing
#https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html

import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("all_data_with_weights.csv")

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    kf = KFold(n_splits=3, shuffle=True, random_state=42)
    X = df[['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure' , 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness']]
    y = df["Avg Speed"]
    weights = df["sample_weight"].values
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        w_train, w_val = weights[train_idx], weights[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train, weight=w_train)
        dvalid = lgb.Dataset(X_val, label=y_val, weight=w_val)

        model = lgb.train(param, dtrain, num_boost_round=100,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=500)

print("Best trial:")
print(study.best_trial.params)

In [None]:
Best trial Rear Wing:
{'num_leaves': 263, 'max_depth': 8, 'learning_rate': 0.033545204540268374, 'min_data_in_leaf': 10, 'feature_fraction': 0.67904602486583, 'bagging_fraction': 0.9010766438515055, 'bagging_freq': 3, 'lambda_l1': 0.00361547478377304, 'lambda_l2': 0.00947809994264175}

In [None]:
#Hyperparameter Optimization Front Wing
#https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html

import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("all_data_with_weights.csv")

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    kf = KFold(n_splits=3, shuffle=True, random_state=42)
    X = df[['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)']]
    y = df["Avg Speed"]
    weights = df["sample_weight"].values
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        w_train, w_val = weights[train_idx], weights[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train, weight=w_train)
        dvalid = lgb.Dataset(X_val, label=y_val, weight=w_val)

        model = lgb.train(param, dtrain, num_boost_round=100,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=500)

print("Best trial:")
print(study.best_trial.params)

In [None]:
Best trial Front Wing:
{'num_leaves': 192, 'max_depth': 12, 'learning_rate': 0.062393011943707284, 'min_data_in_leaf': 10, 'feature_fraction': 0.8418876970992247, 'bagging_fraction': 0.8723398679038107, 'bagging_freq': 10, 'lambda_l1': 0.3441273621643392, 'lambda_l2': 0.5476995416360311}

In [None]:
#Hyperparameter Optimization Suspension
#https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html

import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("all_data_with_weights.csv")

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    kf = KFold(n_splits=3, shuffle=True, random_state=42)
    X = df[['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width']]
    y = df["Avg Speed"]
    weights = df["sample_weight"].values
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        w_train, w_val = weights[train_idx], weights[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train, weight=w_train)
        dvalid = lgb.Dataset(X_val, label=y_val, weight=w_val)

        model = lgb.train(param, dtrain, num_boost_round=100,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=500)

print("Best trial:")
print(study.best_trial.params)

In [None]:
#Hyperparameter Optimization Brake Balance
#https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html

import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("all_data_with_weights.csv")

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    kf = KFold(n_splits=3, shuffle=True, random_state=42)
    X = df[['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine','Cornering', 'Width', 'Roughness', 'Temperature']]
    y = df["Avg Speed"]
    weights = df["sample_weight"].values
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
        w_train, w_val = weights[train_idx], weights[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train, weight=w_train)
        dvalid = lgb.Dataset(X_val, label=y_val, weight=w_val)

        model = lgb.train(param, dtrain, num_boost_round=100,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=500)

print("Best trial:")
print(study.best_trial.params)

In [None]:
Best trial Engine:
{'num_leaves': 228, 'max_depth': 9, 'learning_rate': 0.019795034976830574, 'min_data_in_leaf': 45, 'feature_fraction': 0.6769051793180653, 'bagging_fraction': 0.5679615916585826, 'bagging_freq': 5, 'lambda_l1': 0.2140082876917294, 'lambda_l2': 0.778813758195442}

In [None]:
Best trial Differential:
{'num_leaves': 95, 'max_depth': 14, 'learning_rate': 0.020430708599380454, 'min_data_in_leaf': 11, 'feature_fraction': 0.8289782961700536, 'bagging_fraction': 0.6054739109884507, 'bagging_freq': 6, 'lambda_l1': 0.164978854742107, 'lambda_l2': 0.09387958914404965}

In [None]:
Best trial Rear Wing:
{'num_leaves': 263, 'max_depth': 8, 'learning_rate': 0.033545204540268374, 'min_data_in_leaf': 10, 'feature_fraction': 0.67904602486583, 'bagging_fraction': 0.9010766438515055, 'bagging_freq': 3, 'lambda_l1': 0.00361547478377304, 'lambda_l2': 0.00947809994264175}

In [None]:
Best trial Front Wing:
{'num_leaves': 192, 'max_depth': 12, 'learning_rate': 0.062393011943707284, 'min_data_in_leaf': 10, 'feature_fraction': 0.8418876970992247, 'bagging_fraction': 0.8723398679038107, 'bagging_freq': 10, 'lambda_l1': 0.3441273621643392, 'lambda_l2': 0.5476995416360311}

In [None]:
Best trial Suspension:
{'num_leaves': 273, 'max_depth': 11, 'learning_rate': 0.04451045225106865, 'min_data_in_leaf': 12, 'feature_fraction': 0.7765092601043706, 'bagging_fraction': 0.6301841215843957, 'bagging_freq': 1, 'lambda_l1': 0.01670808736808642, 'lambda_l2': 0.12825953049230684}

In [None]:
Best trial:
{'num_leaves': 191, 'max_depth': 14, 'learning_rate': 0.024681809936626296, 'min_data_in_leaf': 10, 'feature_fraction': 0.8101056533251277, 'bagging_fraction': 0.5448991038428275, 'bagging_freq': 2, 'lambda_l1': 0.09124351826015425, 'lambda_l2': 0.0008864299246597318}

### Building Models for Seq. Optimization

basically the same as before, adjusted to use the weighed data and to save to a Folder.

In [None]:
#Engine (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb
import os

# Load the weighed data
df = pd.read_csv("all_data_with_weights.csv")

# Features and target
X = df[['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure' , 'Cornering']]
y = df["Avg Speed"]
weights = df["sample_weight"].values

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 228,
    'max_depth': 9,
    'learning_rate': 0.019795034976830574,
    'min_data_in_leaf': 45,
    'feature_fraction': 0.6769051793180653,
    'bagging_fraction': 0.5679615916585826,
    'bagging_freq': 5,
    'lambda_l1': 0.2140082876917294,
    'lambda_l2': 0.778813758195442
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=100)

# Save the model to a file
folder = "SeqOptmModels"
os.makedirs(folder, exist_ok=True)

model_path = os.path.join(folder, "Engine_lgbm_model_R5.txt")
model.save_model(model_path)

print(f"Model training complete and saved to '{model_path}'")


In [None]:
#Differential (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb
import os

# Load the data
df = pd.read_csv("all_data_with_weights.csv")

# Features and target
X = df[['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density']]
y = df["Avg Speed"]
weights = df["sample_weight"].values

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 95,
    'max_depth': 14,
    'learning_rate': 0.020430708599380454,
    'min_data_in_leaf': 11,
    'feature_fraction': 0.8289782961700536,
    'bagging_fraction': 0.6054739109884507,
    'bagging_freq': 6,
    'lambda_l1': 0.164978854742107,
    'lambda_l2': 0.09387958914404965
    }


# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=100)

# Save the model to a file
folder = "SeqOptmModels"
os.makedirs(folder, exist_ok=True)

model_path = os.path.join(folder, "Differential_lgbm_model_R5.txt")
model.save_model(model_path)

print(f"Model training complete and saved to '{model_path}'")


In [None]:
#Rear Wing (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb
import os

# Load the data
df = pd.read_csv("all_data_with_weights.csv")


# Features and target
X = df[['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure' , 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness']]
y = df["Avg Speed"]
weights = df["sample_weight"].values

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 263,
    'max_depth': 8,
    'learning_rate': 0.033545204540268374,
    'min_data_in_leaf': 10,
    'feature_fraction': 0.67904602486583,
    'bagging_fraction': 0.9010766438515055,
    'bagging_freq': 3,
    'lambda_l1': 0.00361547478377304,
    'lambda_l2': 0.00947809994264175
    }

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=100)

# Save the model to a file
folder = "SeqOptmModels"
os.makedirs(folder, exist_ok=True)

model_path = os.path.join(folder, "RearWing_lgbm_model_R5.txt")
model.save_model(model_path)

print(f"Model training complete and saved to '{model_path}'")


In [None]:
#Front Wing (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb
import os

# Load the data
df = pd.read_csv("all_data_with_weights.csv")


# Features and target
X = df[['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)']]
y = df["Avg Speed"]
weights = df["sample_weight"].values

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 192,
    'max_depth': 12,
    'learning_rate': 0.062393011943707284,
    'min_data_in_leaf': 10,
    'feature_fraction': 0.8418876970992247,
    'bagging_fraction': 0.8723398679038107,
    'bagging_freq': 10,
    'lambda_l1': 0.3441273621643392,
    'lambda_l2': 0.5476995416360311
    }


# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=100)

# Save the model to a file
folder = "SeqOptmModels"
os.makedirs(folder, exist_ok=True)

model_path = os.path.join(folder, "FrontWing_lgbm_model_R5.txt")
model.save_model(model_path)

print(f"Model training complete and saved to '{model_path}'")


In [None]:
#Suspension (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb
import os

# Load the data
df = pd.read_csv("all_data_with_weights.csv")


# Features and target
X = df[['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width']]
y = df["Avg Speed"]
weights = df["sample_weight"].values

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 191,
    'max_depth': 14,
    'learning_rate': 0.024681809936626296,
    'min_data_in_leaf': 10,
    'feature_fraction': 0.8101056533251277,
    'bagging_fraction': 0.5448991038428275,
    'bagging_freq': 2,
    'lambda_l1': 0.09124351826015425,
    'lambda_l2': 0.0008864299246597318
    }

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=100)

# Save the model to a file
folder = "SeqOptmModels"
os.makedirs(folder, exist_ok=True)

model_path = os.path.join(folder, "Suspension_lgbm_model_R5.txt")
model.save_model(model_path)

print(f"Model training complete and saved to '{model_path}'")


In [None]:
#Brake Balance ( and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb
import os

# Load the data
df = pd.read_csv("all_data_with_weights.csv")


# Features and target
X = df[['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine','Cornering', 'Width', 'Roughness', 'Temperature']]
y = df["Avg Speed"]
weights = df["sample_weight"].values

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 27,
    'max_depth': 11,
    'learning_rate': 0.18696777761444966,
    'min_data_in_leaf': 73,
    'feature_fraction': 0.964098272670725,
    'bagging_fraction': 0.8852821315081366,
    'bagging_freq': 10,
    'lambda_l1': 3.486399193949678,
    'lambda_l2': 3.7053586457221366
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=100)

# Save the model to a file
folder = "SeqOptmModels"
os.makedirs(folder, exist_ok=True)

model_path = os.path.join(folder, "BrakeBalance_lgbm_model_R5.txt")
model.save_model(model_path)

print(f"Model training complete and saved to '{model_path}'")


### Seq. optimizing Car-Parameters

In [None]:
import pandas as pd
import lightgbm as lgb
import optuna

# Ordered list of car parameters to optimize
params_to_optimize = [
    "Engine", "Differential", "RearWing",
    "FrontWing", "Suspension", "BrakeBalance"
]

# Feature sets used by each model for prediction
feature_sets = {
    "Engine": ['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure', 'Cornering'],
    "Differential": ['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density'],
    "RearWing": ['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness'],
    "FrontWing": ['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)'],
    "Suspension": ['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width'],
    "BrakeBalance": ['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Width', 'Roughness', 'Temperature']
}

# Load static base data (track & weather)
base_data = pd.read_csv("track_weather_USA.csv")

# Dictionary to store optimized values
optimized_params = {}

# Sequential optimization loop
for car_part in params_to_optimize:
    print(f"\n🔧 Optimizing {car_part}...")

    # Load the pretrained LightGBM model from folder
    folder_name = 'SeqOptmModels'
    model_file_name = f"{car_part}_lgbm_model_R5.txt"
    model_file_path = os.path.join(folder_name, model_file_name)
    model = lgb.Booster(model_file=model_file_path)
    features = feature_sets[car_part]

    def objective(trial):
        # Clone base data for this trial
        input_data = base_data.copy()

        # Add fixed parameters (already optimized)
        for param, value in optimized_params.items():
            input_data[param] = value

        # Suggest value only for the current parameter
        trial_value = trial.suggest_int(car_part, 1, 500)
        input_data[car_part] = trial_value

        # Fill in missing feature columns with default (0)
        for feature in features:
            if feature not in input_data.columns:
                input_data[feature] = 0

        # Predict average speed
        X = input_data[features]
        avg_speed = model.predict(X)[0]
        return avg_speed

    # Run Optuna optimization for this parameter
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=500, show_progress_bar=True)

    # Save best value
    best_value = study.best_params[car_part]
    optimized_params[car_part] = best_value
    print(f"✅ Best {car_part}: {best_value}")

# Final results
print("\n🎯 Final Optimized Parameters:")
for k, v in optimized_params.items():
    print(f"{k}: {v}")


## Direct Optimization using Selenium

This code does not run in Google Colab, it must be run in a local instance of python.

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
import optuna

USERNAME = 'xxx@studium.uni-hamburg.de'
PASSWORD = 'xxx'
STINT_LENGTH = 1
FUEL_LOAD = 10
COUNTER = 25 #set to last xpath tr value +1 or last stint +2
N_TRAILS = 15

# Function to set up the browser
def setup_browser():

    chrome_options = Options()

    # Open the browser
    driver = webdriver.Chrome(options=chrome_options)

    return driver


# Function to login and navigate to the page with the form
def login_and_navigate(driver, username, password):
    # Open the login page
    driver.get("https://team-analytics.com/f1/")  # Replace with your login URL

    # Fill in the login credentials
    driver.find_element(By.NAME, 'email').send_keys(username)  # Replace 'username' with actual element ID
    driver.find_element(By.NAME, 'password').send_keys(password)  # Replace 'password' with actual element ID

    # Click the login button
    driver.find_element(By.NAME, 'login_user_btn').click()  # Replace 'login-button' with the correct button ID

    # Wait for the login to complete (adjust sleep time as necessary)
    time.sleep(3)

     #Click the button that takes you to the next page
    next_page_button = driver.find_element(By.NAME, 'practice_round')  # Adjust the ID of the button
    next_page_button.click()

    # Wait for the page to load (adjust as necessary)
    time.sleep(3)

def parse_time_string(time_str):
    """Parses a time string in the format 'MM:SS:MS' into total seconds."""
    minutes, seconds, milliseconds = map(int, time_str.split(':'))
    total_seconds = minutes * 60 + seconds + milliseconds / 1000.0
    return total_seconds

# Function to fill in the form and get the result (reuse the open driver session)
def get_avg_time(driver, params):


    # Locate and fill the input fields (replace 'paramX' with actual field IDs)
    driver.execute_script("""
    const input = document.querySelector('[name="rearwing"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[0]))

    driver.execute_script("""
    const input = document.querySelector('[name="engine"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[1]))

    driver.execute_script("""
    const input = document.querySelector('[name="frontwing"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[2]))

    driver.execute_script("""
    const input = document.querySelector('[name="brake"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[3]))

    driver.execute_script("""
    const input = document.querySelector('[name="differential"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[4]))

    driver.execute_script("""
    const input = document.querySelector('[name="suspension"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[5]))

    stintlenght = 3
    driver.execute_script("""
    const input = document.querySelector('[name="stint_length"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(STINT_LENGTH))

    fuelload = 12
    driver.execute_script("""
    const input = document.querySelector('[name="fuel_load"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(FUEL_LOAD))



    # Submit the form
    driver.find_element(By.NAME, 'submit_practice_stint').click()

    # Wait for the result to load
    time.sleep(3)

    if not hasattr(get_avg_time, "counter"):
        get_avg_time.counter = COUNTER #set to last xpath tr value +1 / last stint +2

    # Extract the avg. time result
    xpath = f'//*[@id="submit_practice"]/table[2]/tbody/tr[{get_avg_time.counter}]/td[12]'
    time_str_element = driver.find_element(By.XPATH, xpath)
    time_str = time_str_element.text
    avg_time = parse_time_string(time_str)

    # Increment the counter
    get_avg_time.counter += 1

    return avg_time


# Function to close the browser
def close_browser(driver):
    driver.quit()

# Define the objective function for Optuna
def objective(trial, driver):
    # Sample values for the 6 parameters
    param1 = trial.suggest_int('param1', low=350, high=500) #RearWing
    param2 = trial.suggest_int('param2', low=20, high=150)    #Engine
    param3 = trial.suggest_int('param3', low=300, high=450)  #FrontWing
    param4 = trial.suggest_int('param4', low=100, high=250)   #Brake
    param5 = trial.suggest_int('param5', low=35, high=250)  #Differential
    param6 = trial.suggest_int('param6', low=50, high=200)  #Suspension

    # Bundle the parameters into a list
    params = [param1, param2, param3, param4, param5, param6]

    # Get the avg time from the website (reuse the same driver)
    avg_time = get_avg_time(driver, params)

    return avg_time

# Set up the browser and login
driver = setup_browser()
login_and_navigate(driver, USERNAME, PASSWORD)

# Create an Optuna study to optimize the objective function
storage = "sqlite:///DirectOptStudyOneR5.db"
study = optuna.create_study(direction='minimize', study_name="DirectOptStudyOneR5", storage=storage, load_if_exists=True)
study.optimize(lambda trial: objective(trial, driver), n_trials=N_TRAILS)

# Print the best parameters and corresponding result
print("Best parameters found:", study.best_params)
print("Best avg. time:", study.best_value)

# Close the browser after optimization
close_browser(driver)



## Race Strategy


In [None]:
import math

# Define the tire parameters and their lap time formulas
def lap_time_super_soft(X):
    return (92.3006197773077) + 0.282958307692307 * X

def lap_time_soft(X):
    return (93.4651746647101) + 0.246570086956521 * X

def lap_time_medium(X):
    return (94.2806098566049) + 0.211755061728395 * X

def lap_time_hard(X):
    return (93.4813606710215) + 0.259557913978495 * X

# Tire data with their lifespan
tire_lifespan = {
    "super_soft": 13,
    "soft": 23,
    "medium": 27,
    "hard": 31
}

# Pit stop penalty
pit_stop_time = 30  # seconds

# Function to calculate the total race time for a given strategy
def calculate_race_time(laps, strategy):
    total_time = 0
    total_pit_stops = 0
    lap_index = 0
    lap_counter = 0

    while lap_counter < laps:
        tire, stint_laps = strategy[lap_index]

        # Ensure we don't exceed the total laps
        if lap_counter + stint_laps > laps:
            stint_laps = laps - lap_counter

        # Calculate the lap times for this stint
        lap_times = []
        for i in range(stint_laps):
            if tire == "super_soft":
                lap_times.append(lap_time_super_soft(i + 1))
            elif tire == "soft":
                lap_times.append(lap_time_soft(i + 1))
            elif tire == "medium":
                lap_times.append(lap_time_medium(i + 1))
            elif tire == "hard":
                lap_times.append(lap_time_hard(i + 1))

        total_time += sum(lap_times)  # Add the lap times of this stint
        lap_counter += stint_laps

        # If we are not at the last stint, account for a pit stop
        if lap_counter < laps:
            total_time += pit_stop_time  # Pit stop penalty
            total_pit_stops += 1

        lap_index += 1
        if lap_index >= len(strategy):
            break

    return total_time, total_pit_stops

# Function to generate possible strategies dynamically
def generate_strategies(laps):
    strategies = []
    tire_choices = ["super_soft", "soft", "medium", "hard"]

    # Generate strategies by breaking the laps into multiple stints
    for tire1 in tire_choices:
        for tire2 in tire_choices:
            for tire3 in tire_choices:
                for tire4 in tire_choices:
                  for tire5 in tire_choices:
                    strategy = []
                    remaining_laps = laps

                    # Create dynamic stints for each tire
                    for tire in [tire1, tire2, tire3, tire4, tire5]:
                    #for tire in [tire1, tire2, tire3, tire4]:
                    #for tire in [tire1, tire2, tire3]:
                    #for tire in [tire1, tire2]:
                        stint_laps = tire_lifespan[tire]

                        if remaining_laps > stint_laps:
                            strategy.append((tire, stint_laps))
                            remaining_laps -= stint_laps
                        else:
                            strategy.append((tire, remaining_laps))
                            break

                    if sum([stint[1] for stint in strategy]) == laps:
                        strategies.append(strategy)

    return strategies

# Function to find the best strategy
def optimize_strategy(laps):
    best_time = math.inf
    best_strategy = None

    strategies = generate_strategies(laps)

    for strategy in strategies:
        total_time, pit_stops = calculate_race_time(laps, strategy)
        if total_time < best_time:
            best_time = total_time
            best_strategy = strategy
            best_pit_stops = pit_stops

    return best_strategy, best_time, best_pit_stops


# Main function
if __name__ == "__main__":
    race_laps = 63
    best_strategy, best_time, total_pit_stops = optimize_strategy(race_laps)
    print(f"Best Strategy: {best_strategy}")
    print(f"Best Total Time: {best_time} seconds")
    print(f"Total Pit Stops: {total_pit_stops}")

# **Debrief Race Calendar before Final Race**

*Write a longer text (200-500 words) reflecting on what were the main ideas you started the seminar with, how you improved your models to achieve better performance and what strategy and analytics you want to use for your final race during seminar day*

# **Analytics for Final Race**

For this weeks Race we plan on fine tuning what we did for Race 5.
Therefore we want to:
1. weigh and combine the Datasets
2. Train LGBM Models for sequential optimization
3. Run sequential optimization and validate Parameters using 1 Practice Lap
4. Set up the direct optimization with correct parameter ranges and run for 15 trails, using fuel=10 and stint_lenght=1
5. Include new Data into the Training data
Repeat Steps 1-4 four times.
This leaves 16 Laps for strategy optimization, which we will do afterwords.

The weiging part can be split up into three parts:
- 1.1 being the one where we figure out what weights to use with our old (1.1.1 and new practice_data (1.1.2) practice data within part 1.2 and 1.3
- 1.2 is where we weigh the simulator data and combine with the old practice data
- 1.3 is where we weigh the simulator data and combine with the old practice_data and the new  practice_data


After using our regular 80 Practice Laps we plan on doing a run of 225 Trails using our direct optimizer without any ML Models inbetween with the hope that this might lead to even better car-parameters. This leaves 25 Laps to validate possible results. As the counter stops once it hits -251 Laps.

In [None]:
# Team members working on this code: Paula Kussauer, Cedric Schwandt, Hannes Kock

## Part 1: Weigh and combine the Datasets

### Part 1.1

#### Part 1.1.1

In [None]:
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("simulator_data_newnames.csv")

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time"])
y = df["Avg Speed"]

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 242,
    'max_depth': 9,
    'learning_rate': 0.0645474493988489,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9427185636727835,
    'bagging_fraction': 0.785711983453578,
    'bagging_freq': 2,
    'lambda_l1': 0.6006835353397709,
    'lambda_l2': 1.4905819957883624,
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("optm_weights_lgbm_model_R6.txt")

print("Model training complete and saved to 'optm_weights_lgbm_model_R6.txt'")


In [None]:
from pickle import FALSE
import pandas as pd
import lightgbm as lgb
import optuna
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import numpy as np

# Load the original model
model = lgb.Booster(model_file="optm_weights_lgbm_model_R6.txt")

# Load the new data
new_df = pd.read_csv("practice_data_newnamesV6_1.csv")

# Compute Avg Speed for new data
new_df["Avg Speed"] = (new_df["Lap Distance"] / new_df["Lap Time"]) * 3600

# Prepare features and target
X_new = new_df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time", "Round", "Track", "Qualifying",
                             "Stint", "Lap", "Fuel", "Tyre Remaining", "Tyre Choice"])
y_new = new_df["Avg Speed"]

def objective(trial):
    # Suggest a scaling factor for sample weights
    weight_scaling_factor = trial.suggest_int('weight_scaling_factor', 1, 10, log = True )
    sample_weight = [weight_scaling_factor] * len(y_new)

    # Use k-fold cross-validation
    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    rmse_list = []

    # Train and evaluate using cross-validation
    for train_index, valid_index in kf.split(X_new):
        X_train, X_valid = X_new.iloc[train_index], X_new.iloc[valid_index]
        y_train, y_valid = y_new.iloc[train_index], y_new.iloc[valid_index]
        train_weight = np.array(sample_weight)[train_index]

        train_data = lgb.Dataset(X_train, label=y_train, weight=train_weight)
        valid_data = lgb.Dataset(X_valid, label=y_valid, weight=np.array(sample_weight)[valid_index], reference=train_data)

        # Train the model
        model_tmp = model
        model_tmp = lgb.train(
            params={},  # Use default params unless specified
            train_set=train_data,
            init_model=model,
            num_boost_round=1000,  # You can increase if needed
        )

        # Predict and calculate RMSE for this fold
        y_pred = model_tmp.predict(X_valid, num_iteration=model_tmp.best_iteration)
        #rmse = mean_squared_error(y_valid, y_pred, squared=False)
        rmse = np.sqrt(((y_valid - y_pred)**2).mean())
        rmse_list.append(rmse)

    # Return the average RMSE over all folds
    mean_rmse = np.mean(rmse_list)
    return mean_rmse

# Create Optuna study to minimize RMSE
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=30)  # You can adjust the number of trials

best_weight_factor = study.best_params['weight_scaling_factor']
print(f"Best weight scaling factor: {best_weight_factor}")


In [None]:
Best weight scaling factor: 9

#### Part 1.1.2

To figure out what weight to use for the new Practice_Data, we first merge the Simulator_Data and old Practice_Data together with the weights that we determined in 1.1.1 and then just follow the known weights optimization procedure with the modification of weights.

In [None]:
import pandas as pd

# Read in both CSV files
simulator_df = pd.read_csv('simulator_data_newnames.csv')
practice_df = pd.read_csv('practice_data_newnamesV6_1.csv')

# Define your weights
SIMULATOR_WEIGHT = 1
PRACTICE_WEIGHT = 9

# Assign weights as a new column
simulator_df['dataset_weight'] = SIMULATOR_WEIGHT
practice_df['dataset_weight'] = PRACTICE_WEIGHT

# Keep only columns of simulator_df in practice_df (excluding the new 'dataset_weight' column)
cols_to_match = simulator_df.columns.intersection(practice_df.columns)
practice_df_matched = practice_df[cols_to_match]

# Reindex columns to match simulator_df order (excluding 'dataset_weight' in reindex)
practice_df_matched = practice_df_matched.reindex(columns=simulator_df.columns.difference(['dataset_weight']))

# Concatenate with 'dataset_weight' column included
merged_df = pd.concat([
    simulator_df,
    practice_df_matched.assign(dataset_weight=PRACTICE_WEIGHT)
], ignore_index=True)

# Write the merged DataFrame to a new CSV file
merged_df.to_csv('merged_data_R6.csv', index=False)

print("Merge completed, saved as 'merged_data_R6.csv'")

In [None]:
import pandas as pd
import numpy as np
import lightgbm as lgb

# Load the data
df = pd.read_csv("merged_data_R6.csv")  # Use merged file

# Compute Avg Speed
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Features and target
X = df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time"])
y = df["Avg Speed"]

# Use weights from the merged file
weights = df["dataset_weight"]

# Remove 'dataset_weight' from features (as it's not a feature but a weight)
X = X.drop(columns=["dataset_weight"])

# Define LightGBM dataset using weights
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 242,
    'max_depth': 9,
    'learning_rate': 0.0645474493988489,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9427185636727835,
    'bagging_fraction': 0.785711983453578,
    'bagging_freq': 2,
    'lambda_l1': 0.6006835353397709,
    'lambda_l2': 1.4905819957883624,
}

# Train on full dataset using weights
model = lgb.train(params, lgb_data, num_boost_round=1000)

# Save the model to a file
model.save_model("optm_weights_lgbm_model_R6_merged.txt")

print("Model training complete and saved to 'optm_weights_lgbm_model_R6_merged.txt'")

In [None]:
from pickle import FALSE
import pandas as pd
import lightgbm as lgb
import optuna
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import numpy as np

# Load the original model
model = lgb.Booster(model_file="optm_weights_lgbm_model_R6_merged.txt")

# Load the new data
new_df = pd.read_csv("practice_data_newnamesV6_2.csv")

# Compute Avg Speed for new data
new_df["Avg Speed"] = (new_df["Lap Distance"] / new_df["Lap Time"]) * 3600

# Prepare features and target
X_new = new_df.drop(columns=["Avg Speed", "Lap Distance", "Lap Time", "Round", "Track", "Qualifying",
                             "Stint", "Lap", "Fuel", "Tyre Remaining", "Tyre Choice"])
y_new = new_df["Avg Speed"]

def objective(trial):
    # Suggest a scaling factor for sample weights
    weight_scaling_factor = trial.suggest_int('weight_scaling_factor', 1, 100, log = True )
    sample_weight = [weight_scaling_factor] * len(y_new)

    # Use k-fold cross-validation
    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    rmse_list = []

    # Train and evaluate using cross-validation
    for train_index, valid_index in kf.split(X_new):
        X_train, X_valid = X_new.iloc[train_index], X_new.iloc[valid_index]
        y_train, y_valid = y_new.iloc[train_index], y_new.iloc[valid_index]
        train_weight = np.array(sample_weight)[train_index]

        train_data = lgb.Dataset(X_train, label=y_train, weight=train_weight)
        valid_data = lgb.Dataset(X_valid, label=y_valid, weight=np.array(sample_weight)[valid_index], reference=train_data)

        # Train the model
        model_tmp = model
        model_tmp = lgb.train(
            params={},  # Use default params unless specified
            train_set=train_data,
            init_model=model,
            num_boost_round=1000,  # You can increase if needed
        )

        # Predict and calculate RMSE for this fold
        y_pred = model_tmp.predict(X_valid, num_iteration=model_tmp.best_iteration)
        #rmse = mean_squared_error(y_valid, y_pred, squared=False)
        rmse = np.sqrt(((y_valid - y_pred)**2).mean())
        rmse_list.append(rmse)

    # Return the average RMSE over all folds
    mean_rmse = np.mean(rmse_list)
    return mean_rmse

# Create Optuna study to minimize RMSE
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=30)  # You can adjust the number of trials

best_weight_factor = study.best_params['weight_scaling_factor']
print(f"Best weight scaling factor: {best_weight_factor}")


### Part 1.2

In [None]:

import numpy as np
import pandas as pd
from scipy.spatial.distance import mahalanobis

# Adjust file paths to your current setup

SIM_CSV         = "simulator_data_newnames.csv"     # 10,000 simulated laps
PRACTICEOLD_CSV = "practice_data_newnamesV6_1.csv"
ENV_CSV         = "track_weather_italy.csv"           # 1 row: track & weather conditions


# Load all relevant data

df_sim = pd.read_csv(SIM_CSV)
df_pro = pd.read_csv(PRACTICEOLD_CSV)
env_df = pd.read_csv(ENV_CSV)  # track/weather parameters (single row)


# Automatically infer meta feature columns

# 1. Collect all car setup parameters relevant for optimization
car_param_cols = {
    "Engine", "Differential", "RearWing", "FrontWing",
    "Suspension", "BrakeBalance"
}

# 2. All columns in the ENV file are considered track/weather features
env_cols = env_df.columns.tolist()

# 3. Only keep those features that are present in both ENV and Sim/Practice datasets (for consistency)
meta_cols = [c for c in env_cols if c in df_sim.columns]


# D) Compute RBF kernel weights for simulator data
tau = 2.0  # Bandwidth for kernel function
X_sim_meta = df_sim[meta_cols].values

# Use the specific target track/weather conditions from ENV_CSV
env_point = env_df[meta_cols].values.flatten()

# Calculate covariance matrix (using simulator data for variance and correlation structure)
Sigma_inv = np.linalg.inv(np.cov(X_sim_meta, rowvar=False))

# Calculate RBF kernel weights for each simulator sample (measuring similarity to actual track/weather)
k_sim = np.exp([
    -mahalanobis(x, env_point, Sigma_inv) / tau
    for x in X_sim_meta
])

# Assign constant weights for old and new practice laps (as previously defined)
w_pro = np.full(len(df_pro), 9)  # All old practice laps

# Combine all weights into a single array for further use
weights = np.concatenate([k_sim, w_pro]) #without new practice data data



# Create a unified DataFrame for downstream modeling
# Combine simulator, old practice, and new practice datasets
df_all = pd.concat([df_sim, df_pro], ignore_index=True) #without new practice data data



# Add a column for average speed in km/h (distance in km divided by time in hours)
df_all["Avg Speed"] = (df_all["Lap Distance"] / df_all["Lap Time"]) * 3600

# Append the computed weights as a new column named 'sample_weight'
df_all["sample_weight"] = weights

# Specify columns that should be removed from the final dataset
drop_cols = [
    "Lap Distance", "Lap Time", "Round", "Track", "Qualifying", "Stint", "Lap",
    "Fuel", "Tyre Remaining", "Tyre Choice"
]

# Safely drop unwanted columns, ignoring any missing columns just in case
df_all_dropped = df_all.drop(columns=drop_cols, errors="ignore")

# Export the cleaned, combined, and weighted dataset to CSV
OUTPUT_CSV = "all_data_with_weights_R6.csv"
df_all_dropped.to_csv(OUTPUT_CSV, index=False)

### Part 1.3

In [None]:

import numpy as np
import pandas as pd
from scipy.spatial.distance import mahalanobis

# Adjust file paths to your current setup

SIM_CSV         = "simulator_data_newnames.csv"     # 10,000 simulated laps
PRACTICEOLD_CSV = "practice_data_newnamesV6_1.csv"
PRACTICENEW_CSV = "practice_data_newnamesV6_2.csv"
ENV_CSV         = "track_weather_italy.csv"           # 1 row: track & weather conditions


# Load all relevant data

df_sim = pd.read_csv(SIM_CSV)
df_pro = pd.read_csv(PRACTICEOLD_CSV)
df_prn = pd.read_csv(PRACTICENEW_CSV)
env_df = pd.read_csv(ENV_CSV)  # track/weather parameters (single row)


# Automatically infer meta feature columns

# 1. Collect all car setup parameters relevant for optimization
car_param_cols = {
    "Engine", "Differential", "RearWing", "FrontWing",
    "Suspension", "BrakeBalance"
}

# 2. All columns in the ENV file are considered track/weather features
env_cols = env_df.columns.tolist()

# 3. Only keep those features that are present in both ENV and Sim/Practice datasets (for consistency)
meta_cols = [c for c in env_cols if c in df_sim.columns]


# D) Compute RBF kernel weights for simulator data
tau = 2.0  # Bandwidth for kernel function
X_sim_meta = df_sim[meta_cols].values

# Use the specific target track/weather conditions from ENV_CSV
env_point = env_df[meta_cols].values.flatten()

# Calculate covariance matrix (using simulator data for variance and correlation structure)
Sigma_inv = np.linalg.inv(np.cov(X_sim_meta, rowvar=False))

# Calculate RBF kernel weights for each simulator sample (measuring similarity to actual track/weather)
k_sim = np.exp([
    -mahalanobis(x, env_point, Sigma_inv) / tau
    for x in X_sim_meta
])

# Assign constant weights for old and new practice laps (as previously defined)
w_prn = np.full(len(df_pro), 9)  # All old practice laps
w_prn = np.full(len(df_prn), 15)  # All new practice laps

# Combine all weights into a single array for further use
weights = np.concatenate([k_sim, w_pro, w_prn]) #with new practice data data


# Create a unified DataFrame for downstream modeling
# Combine simulator, old practice, and new practice datasets
df_all = pd.concat([df_sim, df_pro, df_prn], ignore_index=True) #with new practice data data


# Add a column for average speed in km/h (distance in km divided by time in hours)
df_all["Avg Speed"] = (df_all["Lap Distance"] / df_all["Lap Time"]) * 3600

# Append the computed weights as a new column named 'sample_weight'
df_all["sample_weight"] = weights

# Specify columns that should be removed from the final dataset
drop_cols = [
    "Lap Distance", "Lap Time", "Round", "Track", "Qualifying", "Stint", "Lap",
    "Fuel", "Tyre Remaining", "Tyre Choice"
]

# Safely drop unwanted columns, ignoring any missing columns just in case
df_all_dropped = df_all.drop(columns=drop_cols, errors="ignore")

# Export the cleaned, combined, and weighted dataset to CSV
OUTPUT_CSV = "all_data_with_weights_R6.csv"
df_all_dropped.to_csv(OUTPUT_CSV, index=False)

## Part 2: Train LightGBM Models for sequenial optimization

In [None]:
#Engine (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb
import os

# Load the weighed data
df = pd.read_csv("all_data_with_weights_R6.csv")

# Features and target
X = df[['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure' , 'Cornering']]
y = df["Avg Speed"]
weights = df["sample_weight"].values

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 228,
    'max_depth': 9,
    'learning_rate': 0.019795034976830574,
    'min_data_in_leaf': 45,
    'feature_fraction': 0.6769051793180653,
    'bagging_fraction': 0.5679615916585826,
    'bagging_freq': 5,
    'lambda_l1': 0.2140082876917294,
    'lambda_l2': 0.778813758195442
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=100)

# Save the model to a file
folder = "SeqOptmModels_R6"
os.makedirs(folder, exist_ok=True)

model_path = os.path.join(folder, "Engine_lgbm_model_R6.txt")
model.save_model(model_path)

print(f"Model training complete and saved to '{model_path}'")


In [None]:
#Differential (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb
import os

# Load the data
df = pd.read_csv("all_data_with_weights_R6.csv")

# Features and target
X = df[['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density']]
y = df["Avg Speed"]
weights = df["sample_weight"].values

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 95,
    'max_depth': 14,
    'learning_rate': 0.020430708599380454,
    'min_data_in_leaf': 11,
    'feature_fraction': 0.8289782961700536,
    'bagging_fraction': 0.6054739109884507,
    'bagging_freq': 6,
    'lambda_l1': 0.164978854742107,
    'lambda_l2': 0.09387958914404965
    }


# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=100)

# Save the model to a file
folder = "SeqOptmModels_R6"
os.makedirs(folder, exist_ok=True)

model_path = os.path.join(folder, "Differential_lgbm_model_R6.txt")
model.save_model(model_path)

print(f"Model training complete and saved to '{model_path}'")


In [None]:
#Rear Wing (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb
import os

# Load the data
df = pd.read_csv("all_data_with_weights_R6.csv")


# Features and target
X = df[['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure' , 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness']]
y = df["Avg Speed"]
weights = df["sample_weight"].values

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 263,
    'max_depth': 8,
    'learning_rate': 0.033545204540268374,
    'min_data_in_leaf': 10,
    'feature_fraction': 0.67904602486583,
    'bagging_fraction': 0.9010766438515055,
    'bagging_freq': 3,
    'lambda_l1': 0.00361547478377304,
    'lambda_l2': 0.00947809994264175
    }

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=100)

# Save the model to a file
folder = "SeqOptmModels_R6"
os.makedirs(folder, exist_ok=True)

model_path = os.path.join(folder, "RearWing_lgbm_model_R6.txt")
model.save_model(model_path)

print(f"Model training complete and saved to '{model_path}'")


In [None]:
#Front Wing (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb
import os

# Load the data
df = pd.read_csv("all_data_with_weights_R6.csv")


# Features and target
X = df[['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)']]
y = df["Avg Speed"]
weights = df["sample_weight"].values

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 192,
    'max_depth': 12,
    'learning_rate': 0.062393011943707284,
    'min_data_in_leaf': 10,
    'feature_fraction': 0.8418876970992247,
    'bagging_fraction': 0.8723398679038107,
    'bagging_freq': 10,
    'lambda_l1': 0.3441273621643392,
    'lambda_l2': 0.5476995416360311
    }


# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=100)

# Save the model to a file
folder = "SeqOptmModels_R6"
os.makedirs(folder, exist_ok=True)

model_path = os.path.join(folder, "FrontWing_lgbm_model_R6.txt")
model.save_model(model_path)

print(f"Model training complete and saved to '{model_path}'")


In [None]:
#Suspension (Hyperparameters and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb
import os

# Load the data
df = pd.read_csv("all_data_with_weights_R6.csv")


# Features and target
X = df[['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width']]
y = df["Avg Speed"]
weights = df["sample_weight"].values

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 191,
    'max_depth': 14,
    'learning_rate': 0.024681809936626296,
    'min_data_in_leaf': 10,
    'feature_fraction': 0.8101056533251277,
    'bagging_fraction': 0.5448991038428275,
    'bagging_freq': 2,
    'lambda_l1': 0.09124351826015425,
    'lambda_l2': 0.0008864299246597318
    }

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=100)

# Save the model to a file
folder = "SeqOptmModels_R6"
os.makedirs(folder, exist_ok=True)

model_path = os.path.join(folder, "Suspension_lgbm_model_R6.txt")
model.save_model(model_path)

print(f"Model training complete and saved to '{model_path}'")


In [None]:
#Brake Balance ( and X set)
import pandas as pd
import numpy as np
import lightgbm as lgb
import os

# Load the data
df = pd.read_csv("all_data_with_weights_R6.csv")


# Features and target
X = df[['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine','Cornering', 'Width', 'Roughness', 'Temperature']]
y = df["Avg Speed"]
weights = df["sample_weight"].values

# Define LightGBM dataset
lgb_data = lgb.Dataset(X, label=y, weight=weights)

# Parameters (tuned already)
params = {
    'objective': 'regression',
    'metric': 'l2',
    'verbosity': -1,
    'boosting_type': 'gbdt',
    'num_leaves': 27,
    'max_depth': 11,
    'learning_rate': 0.18696777761444966,
    'min_data_in_leaf': 73,
    'feature_fraction': 0.964098272670725,
    'bagging_fraction': 0.8852821315081366,
    'bagging_freq': 10,
    'lambda_l1': 3.486399193949678,
    'lambda_l2': 3.7053586457221366
}

# Train on full dataset
model = lgb.train(params, lgb_data, num_boost_round=100)

# Save the model to a file
folder = "SeqOptmModels_R6"
os.makedirs(folder, exist_ok=True)

model_path = os.path.join(folder, "BrakeBalance_lgbm_model_R6.txt")
model.save_model(model_path)

print(f"Model training complete and saved to '{model_path}'")


## Part 3: Run sequential optimization

In [None]:
import pandas as pd
import lightgbm as lgb
import optuna

# Ordered list of car parameters to optimize
params_to_optimize = [
    "Engine", "Differential", "RearWing",
    "FrontWing", "Suspension", "BrakeBalance"
]

# Feature sets used by each model for prediction
feature_sets = {
    "Engine": ['Engine', 'Grip', 'Humidity', 'Air Density', 'Altitude', 'Temperature', 'Inclines', 'Air Pressure', 'Cornering'],
    "Differential": ['Differential', 'Engine', 'Cornering', 'Width', 'Inclines', 'Grip', 'Temperature', 'Air Density'],
    "RearWing": ['RearWing', 'Differential', 'Engine', 'Air Density', 'Cornering', 'Air Pressure', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Roughness'],
    "FrontWing": ['FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Air Pressure', 'Air Density', 'Inclines', 'Wind (Avg. Speed)', 'Humidity', 'Wind (Gusts)'],
    "Suspension": ['Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Grip', 'Inclines', 'Cornering', 'Camber', 'Roughness', 'Width'],
    "BrakeBalance": ['BrakeBalance', 'Suspension', 'FrontWing', 'RearWing', 'Differential', 'Engine', 'Cornering', 'Width', 'Roughness', 'Temperature']
}

# Load static base data (track & weather)
base_data = pd.read_csv("track_weather_italy.csv")

# Dictionary to store optimized values
optimized_params = {}

# Sequential optimization loop
for car_part in params_to_optimize:
    print(f"\n Optimizing {car_part}...")

    # Load the pretrained LightGBM model from folder
    folder_name = 'SeqOptmModels_R6'
    model_file_name = f"{car_part}_lgbm_model_R6.txt"
    model_file_path = os.path.join(folder_name, model_file_name)
    model = lgb.Booster(model_file=model_file_path)
    features = feature_sets[car_part]

    def objective(trial):
        # Clone base data for this trial
        input_data = base_data.copy()

        # Add fixed parameters (already optimized)
        for param, value in optimized_params.items():
            input_data[param] = value

        # Suggest value only for the current parameter
        trial_value = trial.suggest_int(car_part, 1, 500)
        input_data[car_part] = trial_value

        # Fill in missing feature columns with default (0)
        for feature in features:
            if feature not in input_data.columns:
                input_data[feature] = 0

        # Predict average speed
        X = input_data[features]
        avg_speed = model.predict(X)[0]
        return avg_speed

    # Run Optuna optimization for this parameter
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=500, show_progress_bar=True)

    # Save best value
    best_value = study.best_params[car_part]
    optimized_params[car_part] = best_value
    print(f" Best {car_part}: {best_value}")

# Final results
print("\n Final Optimized Parameters:")
for k, v in optimized_params.items():
    print(f"{k}: {v}")


In [None]:
Ranges for direct Direct optimization run 1:

Engine: 18 - 38 (+-10)
Differential: 21 - 51 (+-15)
RearWing: 325 - 365 (+-20)
FrontWing: 55 - 105 (+-25)
Suspension: 106 - 166 (+-30)
BrakeBalance: 40 - 110 (+-35)

In [None]:
Ranges for direct Direct optimization run 2:

Engine: 18 - 38
Differential: 28 - 58
RearWing: 325 - 365
FrontWing: 55 - 105
Suspension: 106 - 166
BrakeBalance: 40 - 110

## Part 4: Direct Optimization using Selenium

This code does not run in Google Colab, it must be run in a local instance of python.

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
import optuna

USERNAME = 'xxx@studium.uni-hamburg.de'
PASSWORD = 'xxx'
STINT_LENGTH = 1
FUEL_LOAD = 10
COUNTER = 3 #set to last xpath tr value +1 or last stint +2
N_TRAILS = 15

# Function to set up the browser
def setup_browser():

    chrome_options = Options()

    # Open the browser
    driver = webdriver.Chrome(options=chrome_options)

    return driver


# Function to login and navigate to the page with the form
def login_and_navigate(driver, username, password):
    # Open the login page
    driver.get("https://team-analytics.com/f1/")  # Replace with your login URL

    # Fill in the login credentials
    driver.find_element(By.NAME, 'email').send_keys(username)  # Replace 'username' with actual element ID
    driver.find_element(By.NAME, 'password').send_keys(password)  # Replace 'password' with actual element ID

    # Click the login button
    driver.find_element(By.NAME, 'login_user_btn').click()  # Replace 'login-button' with the correct button ID

    # Wait for the login to complete (adjust sleep time as necessary)
    time.sleep(3)

     #Click the button that takes you to the next page
    next_page_button = driver.find_element(By.NAME, 'practice_round')  # Adjust the ID of the button
    next_page_button.click()

    # Wait for the page to load (adjust as necessary)
    time.sleep(3)

def parse_time_string(time_str):
    """Parses a time string in the format 'MM:SS:MS' into total seconds."""
    minutes, seconds, milliseconds = map(int, time_str.split(':'))
    total_seconds = minutes * 60 + seconds + milliseconds / 1000.0
    return total_seconds

# Function to fill in the form and get the result (reuse the open driver session)
def get_avg_time(driver, params):


    # Locate and fill the input fields (replace 'paramX' with actual field IDs)
    driver.execute_script("""
    const input = document.querySelector('[name="rearwing"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[0]))

    driver.execute_script("""
    const input = document.querySelector('[name="engine"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[1]))

    driver.execute_script("""
    const input = document.querySelector('[name="frontwing"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[2]))

    driver.execute_script("""
    const input = document.querySelector('[name="brake"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[3]))

    driver.execute_script("""
    const input = document.querySelector('[name="differential"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[4]))

    driver.execute_script("""
    const input = document.querySelector('[name="suspension"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(params[5]))

    stintlenght = 3
    driver.execute_script("""
    const input = document.querySelector('[name="stint_length"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(STINT_LENGTH))

    fuelload = 12
    driver.execute_script("""
    const input = document.querySelector('[name="fuel_load"]');
    input.value = arguments[0];
    input.dispatchEvent(new Event('input', { bubbles: true }));
    """, str(FUEL_LOAD))



    # Submit the form
    driver.find_element(By.NAME, 'submit_practice_stint').click()

    # Wait for the result to load
    time.sleep(3)

    if not hasattr(get_avg_time, "counter"):
        get_avg_time.counter = COUNTER #set to last xpath tr value +1 / last stint +2

    # Extract the avg. time result
    xpath = f'//*[@id="submit_practice"]/table[2]/tbody/tr[{get_avg_time.counter}]/td[12]'
    time_str_element = driver.find_element(By.XPATH, xpath)
    time_str = time_str_element.text
    avg_time = parse_time_string(time_str)

    # Increment the counter
    get_avg_time.counter += 1

    return avg_time


# Function to close the browser
def close_browser(driver):
    driver.quit()

# Define the objective function for Optuna
def objective(trial, driver):
    # Sample values for the 6 parameters
    param1 = trial.suggest_int('param1', low=350, high=500) #RearWing
    param2 = trial.suggest_int('param2', low=20, high=150)    #Engine
    param3 = trial.suggest_int('param3', low=300, high=450)  #FrontWing
    param4 = trial.suggest_int('param4', low=100, high=250)   #Brake
    param5 = trial.suggest_int('param5', low=35, high=250)  #Differential
    param6 = trial.suggest_int('param6', low=50, high=200)  #Suspension

    # Bundle the parameters into a list
    params = [param1, param2, param3, param4, param5, param6]

    # Get the avg time from the website (reuse the same driver)
    avg_time = get_avg_time(driver, params)

    return avg_time

# Set up the browser and login
driver = setup_browser()
login_and_navigate(driver, USERNAME, PASSWORD)

# Create an Optuna study to optimize the objective function
storage = "sqlite:///DirectOptStudyOneR6.db"
study = optuna.create_study(direction='minimize', study_name="DirectOptStudyOneR6", storage=storage, load_if_exists=True)
study.optimize(lambda trial: objective(trial, driver), n_trials=N_TRAILS)

# Print the best parameters and corresponding result
print("Best parameters found:", study.best_params)
print("Best avg. time:", study.best_value)

# Close the browser after optimization
close_browser(driver)



## Race Strategy

In [None]:
import math

# Define the tire parameters and their lap time formulas
def lap_time_super_soft(X):
    return (92.3006197773077) + 0.282958307692307 * X

def lap_time_soft(X):
    return (93.4651746647101) + 0.246570086956521 * X

def lap_time_medium(X):
    return (94.2806098566049) + 0.211755061728395 * X

def lap_time_hard(X):
    return (93.4813606710215) + 0.259557913978495 * X

# Tire data with their lifespan
tire_lifespan = {
    "super_soft": 13,
    "soft": 23,
    "medium": 27,
    "hard": 31
}

# Pit stop penalty
pit_stop_time = 30  # seconds

# Function to calculate the total race time for a given strategy
def calculate_race_time(laps, strategy):
    total_time = 0
    total_pit_stops = 0
    lap_index = 0
    lap_counter = 0

    while lap_counter < laps:
        tire, stint_laps = strategy[lap_index]

        # Ensure we don't exceed the total laps
        if lap_counter + stint_laps > laps:
            stint_laps = laps - lap_counter

        # Calculate the lap times for this stint
        lap_times = []
        for i in range(stint_laps):
            if tire == "super_soft":
                lap_times.append(lap_time_super_soft(i + 1))
            elif tire == "soft":
                lap_times.append(lap_time_soft(i + 1))
            elif tire == "medium":
                lap_times.append(lap_time_medium(i + 1))
            elif tire == "hard":
                lap_times.append(lap_time_hard(i + 1))

        total_time += sum(lap_times)  # Add the lap times of this stint
        lap_counter += stint_laps

        # If we are not at the last stint, account for a pit stop
        if lap_counter < laps:
            total_time += pit_stop_time  # Pit stop penalty
            total_pit_stops += 1

        lap_index += 1
        if lap_index >= len(strategy):
            break

    return total_time, total_pit_stops

# Function to generate possible strategies dynamically
def generate_strategies(laps):
    strategies = []
    tire_choices = ["super_soft", "soft", "medium", "hard"]

    # Generate strategies by breaking the laps into multiple stints
    for tire1 in tire_choices:
        for tire2 in tire_choices:
            for tire3 in tire_choices:
                for tire4 in tire_choices:
                  for tire5 in tire_choices:
                    strategy = []
                    remaining_laps = laps

                    # Create dynamic stints for each tire
                    for tire in [tire1, tire2, tire3, tire4, tire5]:
                    #for tire in [tire1, tire2, tire3, tire4]:
                    #for tire in [tire1, tire2, tire3]:
                    #for tire in [tire1, tire2]:
                        stint_laps = tire_lifespan[tire]

                        if remaining_laps > stint_laps:
                            strategy.append((tire, stint_laps))
                            remaining_laps -= stint_laps
                        else:
                            strategy.append((tire, remaining_laps))
                            break

                    if sum([stint[1] for stint in strategy]) == laps:
                        strategies.append(strategy)

    return strategies

# Function to find the best strategy
def optimize_strategy(laps):
    best_time = math.inf
    best_strategy = None

    strategies = generate_strategies(laps)

    for strategy in strategies:
        total_time, pit_stops = calculate_race_time(laps, strategy)
        if total_time < best_time:
            best_time = total_time
            best_strategy = strategy
            best_pit_stops = pit_stops

    return best_strategy, best_time, best_pit_stops


# Main function
if __name__ == "__main__":
    race_laps = 79
    best_strategy, best_time, total_pit_stops = optimize_strategy(race_laps)
    print(f"Best Strategy: {best_strategy}")
    print(f"Best Total Time: {best_time} seconds")
    print(f"Total Pit Stops: {total_pit_stops}")

## **References**
- Cite all references you need according to chair guidelines

Liu, Xuan; Shi, Savannah Wei; Teixeira, Thales; Wedel, Michel (2018): Video Content Marketing: The Making of Clips, Journal of Marketing, Vol. 82, 86-101.