<a href="https://colab.research.google.com/github/HannesKock/RaceTeam2_CHP/blob/main/Race_Team_Project_Notebook12.04.V1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Race Team Project Notebook**
Names:

Matr.:

Please write down the Name of the Group member you worked on each section of code. This is necessary for grading by Studienbüro.

## **Analytics for Race 1**

- For each race day built one section with the code that you built for this race
- If you keep the same code during the next races copy the code over to the next section
- Describe why you built your analytics the way you do and interprete after each race what went well and what you want to improve on
- During motivation and interpretation try to cite online sources that you read to make better decisions (e.g., scientific articles on machine learning, blog posts, github pages)

In [None]:
# Team members working on this code: Paula, Cedric, Hannes

First we read the CSV file"simulator_data.csv" into a pandas DataFrame for analysis. We displays the first few rows, lists the column names, provides summary statistics, and shows a concise overview of the dataset's structure.


In [None]:
#Reading Data and summarizing contens
#https://www.datacamp.com/tutorial/pandas-read-csv
import pandas as pd

df = pd.read_csv("simulator_data.csv")
print(df.head())
print(df.columns)
print(df.describe())
print(df.info())


To see what the Data looks like we create scatter plots showing the relationship between each parameter in the dataset and lap time.

In [None]:
# Scatterplots (Parameter vs. Lap Time)


import matplotlib.pyplot as plt
import seaborn as sns

param_cols = df.columns.drop("Lap Time")

for col in param_cols:
    plt.figure()
    sns.scatterplot(x=df[col], y=df["Lap Time"], alpha=0.3)
    plt.title(f"{col} vs. Lap Time")
    plt.show()


We noticed that the Laps within the "Simulator_Data.csv" have different lengths, therefore we divide "Lap Distance" by "Lap_time" and multiply by 3600 to get "Avg. Speed" in km/h.

Then we again generate scatter plots to visualize the relationship between each parameter (excluding "Lap Time" and "Avg. Speed") and the calculated average speed.

In [None]:
# Scatterplots (Parameter vs. Avg. Speed)

import matplotlib.pyplot as plt
import seaborn as sns

# Durchschnittsgeschwindigkeit berechnen (falls noch nicht vorhanden)
if "Avg. Speed" not in df.columns:
    df["Avg. Speed"] = df["Lap Distance"] / (df["Lap Time"] / 3600)

# Alle Spalten außer "Lap Time" und "Avg. Speed" verwenden
param_cols = df.columns.drop(["Lap Time", "Avg. Speed"])

# Scatterplots erzeugen
for col in param_cols:
    plt.figure()
    sns.scatterplot(x=df[col], y=df["Avg. Speed"], alpha=0.3)
    plt.title(f"{col} vs. Avg. Speed")
    plt.xlabel(col)
    plt.ylabel("Avg. Speed (km/h)")
    plt.grid(True)
    plt.tight_layout()
    plt.show()

Because the scatter plots did not lead to much insights, we wanted to se how the different parameters correlate with each other. We hoped to be able to use logic to figure out how to set the car paramters.

We calculated the correlation matrix to explore how the variables relate to each other. The correlations are visualized using a heatmap to easily identify weak or strong linear relationships between parameters.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Step 1: Load your CSV file into a pandas DataFrame
df = pd.read_csv("simulator_data.csv")  # replace with your file path

# Step 2: Calculate the correlation matrix
corr_matrix = df.corr()

# Step 3: Plot the correlation matrix
plt.figure(figsize=(20, 20))  # adjust size as needed
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", fmt=".2f", square=True, vmin=-0.05, vmax=0.05)
plt.title("Correlation Matrix")
plt.show()


The next four code blocks are first trys at using ML Models to predict and further understand the Parameters.

These attempts did not lead to much, they are still in here because they are an important step on the way to our Final Solution for this weeks race analytics.

The four Codes are:
1. first try to use a randomForrest Model and optuna to find best car parameters
2. To further understand the model parameters, we used a ML Model to create a SHAP-Diagram
3. Using the Model from the SHAP-Diagram, we try to find optimal Carparameters using optuna
4. to improve the ML Model-Parameters, we do a Grid-Search which took forever and did not lead to noticably better results.

In [None]:
!pip install optuna
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import optuna

# === 1. Daten einlesen ===
df = pd.read_csv("simulator_data.csv")

# Ziel- und Feature-Spalten
target_col = "Lap Time"
feature_cols = df.columns.drop(target_col)

# === 2. Modell trainieren ===
X = df[feature_cols]
y = df[target_col]

# Optional: Skalieren
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Training/Test-Split (z. B. für Validierung)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Random Forest Regressor
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# === 3. Bayesian Optimization Setup (Optuna) ===

# Gegeben: Umgebungsbedingungen
fixed_conditions = {
    'Lap Distance': 3.7,
    'Cornering': 6,
    'Inclines': 20,
    'Camber': 44,
    'Grip': 1,
    'Wind (Avg. Speed)': 97,
    'Temperature': 29,
    'Humidity': 23,
    'Air Density': 70,
    'Air Pressure': 98,
    'Wind (Gusts)': 49,
    'Altitude': 31,
    'Roughness': 49,
    'Width': 29
}

def objective(trial):
    # Optimierbare Fahrzeugparameter
    params = {
        'Rear Wing': trial.suggest_float('Rear Wing', 0.0, 500),
        'Engine': trial.suggest_float('Engine', 0.0, 500),
        'Front Wing': trial.suggest_float('Front Wing', 0.0, 500),
        'Brake Balance': trial.suggest_float('Brake Balance', 0.0, 500),
        'Differential': trial.suggest_float('Differential', 0.0, 500),
        'Suspension': trial.suggest_float('Suspension', 0.0, 500),
    }

    # Kombinieren mit festen Bedingungen
    full_input = {**fixed_conditions, **params}
    X_input = pd.DataFrame([full_input])
    X_input_scaled = scaler.transform(X_input)

    # Vorhersage durch Modell
    lap_time = model.predict(X_input_scaled)[0]
    return lap_time

# Optuna-Studie starten
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)

# Ergebnisse
print("Beste Parameterkombination:")
for key, value in study.best_params.items():
    print(f"  {key}: {value:.4f}")
print(f"Erwartete Rundenzeit: {study.best_value:.4f} Sekunden")

In [None]:
import shap
import numpy  as np
import xgboost as xgb
from sklearn.model_selection import train_test_split

# 1. Falls noch nicht geschehen: Durchschnittsgeschwindigkeit berechnen
if "Avg. Speed" not in df.columns:
    df["Avg. Speed"] = df["Lap Distance"] / (df["Lap Time"] / 3600)

# 2. Features & Ziel definieren
feature_cols = [
    'Lap Distance', 'Cornering', 'Inclines', 'Camber', 'Grip',
    'Wind (Avg. Speed)', 'Temperature', 'Humidity', 'Air Density',
    'Air Pressure', 'Wind (Gusts)', 'Altitude', 'Roughness', 'Width',
    'Rear Wing', 'Engine', 'Front Wing', 'Brake Balance', 'Differential',
    'Suspension'
]

X = df[feature_cols]
y = df["Avg. Speed"]

# 3. Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# 4. Modell trainieren (XGBoost)
#model = xgb.XGBRegressor(n_estimators=100, max_depth=5, random_state=42)
model = xgb.XGBRegressor(colsample_bytree=1.0, learning_rate = 0.05, max_depth = 6, min_child_weight = 1, n_estimators = 300, subsample = 0.8)
model.fit(X_train, y_train)
y_hat = model.predict(X_test)
error = np.mean(np.abs(y_test - y_hat))
print(error)

# # 5. SHAP-Werte berechnen
explainer = shap.Explainer(model)
shap_values = explainer(X_test)

# # 6. SHAP Summary Plot
shap.summary_plot(shap_values, X_test)


In [None]:

import optuna
import numpy as np
import pandas as pd

# You can get this from df.mean().values or pick a sample row
base_input = df[feature_cols].mean().values

for fc in fixed_conditions:
    base_input[feature_cols.index(fc)] = fixed_conditions[fc]

# Indices of the features we want to optimize
optim_features = ['Rear Wing', 'Engine', 'Front Wing', 'Brake Balance', 'Differential', 'Suspension']
optim_indices = [feature_cols.index(f) for f in optim_features]

# Define bounds from your dataset (here we use min/max)
feature_bounds = {
    'Rear Wing': (1, 500),
    'Engine': (1, 500),
    'Front Wing': (1, 500),
    'Brake Balance': (1, 500),
    'Differential': (1, 500),
    'Suspension': (1, 500),
}

# Objective function for Optuna
def objective(trial):
    x = base_input.copy()

    for f in optim_features:
        val = trial.suggest_int(f, int(feature_bounds[f][0]), int(feature_bounds[f][1]))
        x[feature_cols.index(f)] = val

    pred = model.predict(np.array([x]))[0]
    return pred  # Optuna will maximize if we tell it to

# Run Optuna study
optuna.logging.disable_default_handler()
study = optuna.create_study(direction='maximize')  # use 'minimize' if lower lap time is better
study.optimize(objective, n_trials=255)


# Show results
print("Best params:")
for k, v in study.best_params.items():
    print(f"{k}: {v}")

print(f"Max predicted avg speed: {study.best_value}")


In [None]:
#Grid Search for best Model Parameters

import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error
from xgboost import XGBRegressor
import numpy as np

# Load the dataset
df = pd.read_csv("simulator_data.csv")

# Create the target variable: average speed
df['Avg Speed'] = df['Lap Distance'] / df['Lap Time']

# Drop unused columns
X = df.drop(columns=['Lap Distance', 'Lap Time', 'Avg Speed'])
y = df['Avg Speed']

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Define XGBoost Regressor
model = XGBRegressor(objective='reg:squarederror', random_state=42)

# Define grid search parameters
param_grid = {
    'max_depth': [4, 6, 8],
    'learning_rate': [0.01, 0.05, 0.1],
    'n_estimators': [100, 300],
    'subsample': [0.8, 1.0],
    'colsample_bytree': [0.8, 1.0],
    'min_child_weight': [1, 5]
}

# Set up GridSearchCV
grid_search = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    scoring='neg_root_mean_squared_error',
    cv=3,
    verbose=2,
    n_jobs=-1
)

# Fit model
grid_search.fit(X_train, y_train)

# Best model
best_model = grid_search.best_estimator_
print("Best Parameters:", grid_search.best_params_)

# Evaluate
y_pred = best_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print(f"Test RMSE: {rmse:.4f}")


# Determining, which Parameters have the biggest impacts on Lap Times / Speeds.



This code trains an XGBoost Regressor model to predict the average speed of a car based on various track and environmental features. First, it loads the data and calculates the average speed by dividing the lap distance by the lap time. The code then preprocesses the data by removing unnecessary columns and splits it into training and testing sets. After training the model, it evaluates the feature importances to determine which variables most strongly influence the predicted average speed. **Finally, it plots a bar chart to visually display the feature importances, providing insights into the relative impact of each feature.**

In [None]:
import pandas as pd
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Load and preprocess
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = df["Lap Distance"] / df["Lap Time"]

X = df.drop(columns=["Lap Distance", "Lap Time", "Avg Speed"])
y = df["Avg Speed"]

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = XGBRegressor()
model.fit(X_train, y_train)

# Plot feature importance
importances = model.feature_importances_
feat_importance = pd.Series(importances, index=X.columns).sort_values(ascending=False)

feat_importance.plot(kind="bar", figsize=(12, 6), title="Feature Importance")
plt.tight_layout()
plt.show()



# Determining, which Track/Weather Parameters have the biggest impacts on each car parameter

This code performs feature importance analysis for various car setup parameters based on environmental and track/weather conditions. It uses the XGBoost Regressor to model the relationship between the track/weather variables and each car setup parameter. The data is split into training and test sets, and the model is evaluated using the R² score to measure prediction accuracy. After training, the code extracts and plots the feature importances for each car parameter, helping to identify which track/weather variables have the most significant impact on each car setup.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Load your data
df = pd.read_csv("simulator_data.csv")

# Define inputs (track/weather features) and targets (car parameters)
track_weather_features = [
    'Lap Distance', 'Cornering', 'Inclines', 'Camber', 'Grip',
    'Wind (Avg. Speed)', 'Temperature', 'Humidity', 'Air Density',
    'Air Pressure', 'Wind (Gusts)', 'Altitude', 'Roughness', 'Width'
]

car_setup_params = ['Rear Wing', 'Engine', 'Front Wing', 'Brake Balance', 'Differential', 'Suspension']

# Store feature importances for each car parameter
feature_importance_dict = {}

# Loop through car parameters
for param in car_setup_params:
    X = df[track_weather_features]
    y = df[param]

    # Train/test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train model
    model = XGBRegressor(random_state=42)
    model.fit(X_train, y_train)

    # Predict and calculate R² score
    y_pred = model.predict(X_test)
    score = r2_score(y_test, y_pred)
    print(f"\n{param} - R²: {score:.3f}")

    # Get feature importances and filter by importance > 0.1
    importances = model.feature_importances_
    importance_series = pd.Series(importances, index=X.columns).sort_values(ascending=False)

    # Store importances
    feature_importance_dict[param] = importance_series

    # Display the features with importance > 0.1
    print(f"\nFor {param}, track/weather parameters with importance > 0.1:")
    for feature, importance in importance_series.items():
        if importance > 0.075:
            print(f" - {feature}: {importance:.3f}")

    # Optional: Plot feature importances
    importance_series.plot(kind='barh', title=f"Feature Importance for {param}", figsize=(8, 5))
    plt.gca().invert_yaxis()
    plt.tight_layout()
    plt.show()

In [None]:
#Basiclly the same as the code before, but the other car parameters are also included in the models to determine the parameter importance
#This is done to see, how different car parameters might influence each other

import pandas as pd
import matplotlib.pyplot as plt
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Load your data
df = pd.read_csv("simulator_data.csv")

# Define inputs (track/weather features) and targets (car parameters)
track_weather_features = [
    'Lap Distance', 'Cornering', 'Inclines', 'Camber', 'Grip',
    'Wind (Avg. Speed)', 'Temperature', 'Humidity', 'Air Density',
    'Air Pressure', 'Wind (Gusts)', 'Altitude', 'Roughness', 'Width'
]

car_setup_params = ['Rear Wing', 'Engine', 'Front Wing', 'Brake Balance', 'Differential', 'Suspension']

# Store feature importances for each car parameter
feature_importance_dict = {}

# Loop through car parameters
for param in car_setup_params:
    # Include track/weather features and other car parameters (excluding the target parameter itself)
    other_car_params = [p for p in car_setup_params if p != param]
    X = df[track_weather_features + other_car_params]
    y = df[param]

    # Train/test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train model
    model = XGBRegressor(random_state=42)
    model.fit(X_train, y_train)

    # Predict and calculate R² score
    y_pred = model.predict(X_test)
    score = r2_score(y_test, y_pred)
    print(f"\n{param} - R²: {score:.3f}")

    # Get feature importances and sort them
    importances = model.feature_importances_
    importance_series = pd.Series(importances, index=X.columns).sort_values(ascending=False)

    # Store importances
    feature_importance_dict[param] = importance_series

    # Display the features with importance > 0.075
    print(f"\nFor {param}, all parameters with importance > 0.075:")
    for feature, importance in importance_series.items():
        if importance > 0.075:
            print(f" - {feature}: {importance:.3f}")

    # Optional: Plot feature importances
    importance_series.plot(kind='barh', title=f"Feature Importance for {param}", figsize=(8, 5))
    plt.gca().invert_yaxis()  # Invert y-axis for better readability
    plt.tight_layout()
    plt.show()


In [None]:
!pip install optuna

## **Sequential optimization of car-parameters. **

We optimize each Car-parameter in a predefined order, ensuring that each optimization builds upon the previous ones. (Order: Feature Importance (highest to lowest)(see above))

For each car parameter, we first train a predictive model for average speed using relevant track/weather variables and already optimized parameters. Then, using Optuna, we search for the optimal value of the current car parameter that maximizes the predicted average speed.

What track and wheather parameters are important for each Car-Parameter was determined before (see above)

Order of optimization and relevant track/weather parameters:

1. Engine: Grip, Altitude, Humidity, Air Density, Temperature, Air Pressure, Inclines
2. Differetial: Cornering, Width, Inclines, Grip, Temprature, Air Density
3. Rear Wing: Air Pressure, Air Density, Cornering, Inclines, Wind (Avg. Speed), Humidity, Roughness
4. Break Balance: Width, Cornering, Roughness, Temperature
5. Front Wing: Air Pressure, Cornering, Air Density, Inclines, Wind (Avg. Speed), Humidity, Wind (Gusts)
6. Suspension: Grip, Inclines, Cornering, Camber, Width, Roughness


In [None]:
import pandas as pd
import numpy as np
import optuna
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"])*3600

# Fixed track & weather settings — user-defined!
fixed_conditions = {
    "Cornering": 6,
    "Inclines": 20,
    "Camber": 44,
    "Grip": 1,
    "Altitude": 31,
    "Roughness": 49,
    "Width": 29,
    "Temperature": 29,
    "Humidity": 23,
    "Wind (Avg. Speed)": 97,
    "Wind (Gusts)": 49,
    "Air Density": 70,
    "Air Pressure": 98,
}

# Order of optimization and relevant features
optimization_order = [
    ("Engine", ["Grip", "Altitude", "Humidity", "Air Density", "Temperature", "Air Pressure", "Inclines"]),
    ("Differential", ["Cornering", "Width", "Inclines", "Grip", "Temperature", "Air Density"]),
    ("Rear Wing", ["Air Pressure", "Air Density", "Cornering", "Inclines", "Wind (Avg. Speed)", "Humidity", "Roughness"]),
    ("Brake Balance", ["Width", "Cornering", "Roughness", "Temperature"]),
    ("Front Wing", ["Air Pressure", "Cornering", "Air Density", "Inclines", "Wind (Avg. Speed)", "Humidity", "Wind (Gusts)"]),
    ("Suspension", ["Grip", "Inclines", "Cornering", "Camber", "Width", "Roughness"]),
]

# Storage for optimized parameters
optimized_params = {}


# Optimization loop
for param, relevant_features in optimization_order:
    print(f"\n Optimizing {param}...")

    # Features for model = track/weather + already optimized + current param
    model_features = relevant_features + list(optimized_params.keys()) + [param]
    model_df = df[model_features + ["Avg Speed"]].dropna()

    X = model_df[model_features]
    y = model_df["Avg Speed"]

    # Train model
    model = XGBRegressor(random_state=42)
    model.fit(X, y)

    # Prepare fixed input for this stage
    input_row = {f: fixed_conditions[f] for f in relevant_features}
    input_row.update(optimized_params)  # Include already optimized parameters

    def objective(trial):
        trial_value = trial.suggest_int(param, 1, 500)
        row = input_row.copy()
        row[param] = trial_value
        df_input = pd.DataFrame([row])
        pred = model.predict(df_input)[0]
        return pred  # Maximizing Avg Speed

    optuna.logging.disable_default_handler()
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=10)
    print(f"Max predicted avg speed: {study.best_value}")

    best_val = study.best_params[param]
    optimized_params[param] = best_val

    print(f" Best {param}: {best_val}")

# Final output
print("\n All optimized parameters:")
for k, v in optimized_params.items():
    print(f"{k}: {v}")


In [None]:
All optimized parameters:
Engine: 30
Differential: 1
Rear Wing: 48
Brake Balance: 3
Front Wing: 37
Suspension: 124

In [None]:
#Same code as above, but the optimization order of the Car-Parameters is flipped.

import pandas as pd
import numpy as np
import optuna
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"])*3600

# Fixed track & weather settings — user-defined!
fixed_conditions = {
    "Cornering": 6,
    "Inclines": 20,
    "Camber": 44,
    "Grip": 1,
    "Altitude": 31,
    "Roughness": 49,
    "Width": 29,
    "Temperature": 29,
    "Humidity": 23,
    "Wind (Avg. Speed)": 97,
    "Wind (Gusts)": 49,
    "Air Density": 70,
    "Air Pressure": 98,
}

# Order of optimization and relevant features
optimization_order = [
    ("Suspension", ["Grip", "Inclines", "Cornering", "Camber", "Width", "Roughness"]),
    ("Front Wing", ["Air Pressure", "Cornering", "Air Density", "Inclines", "Wind (Avg. Speed)", "Humidity", "Wind (Gusts)"]),
    ("Brake Balance", ["Width", "Cornering", "Roughness", "Temperature"]),
    ("Rear Wing", ["Air Pressure", "Air Density", "Cornering", "Inclines", "Wind (Avg. Speed)", "Humidity", "Roughness"]),
    ("Differential", ["Cornering", "Width", "Inclines", "Grip", "Temperature", "Air Density"]),
    ("Engine", ["Grip", "Altitude", "Humidity", "Air Density", "Temperature", "Air Pressure", "Inclines"])
]

# Storage for optimized parameters
optimized_params = {}


# Optimization loop
for param, relevant_features in optimization_order:
    print(f"\n Optimizing {param}...")

    # Features for model = track/weather + already optimized + current param
    model_features = relevant_features + list(optimized_params.keys()) + [param]
    model_df = df[model_features + ["Avg Speed"]].dropna()

    X = model_df[model_features]
    y = model_df["Avg Speed"]

    # Train model
    model = XGBRegressor(random_state=42)
    model.fit(X, y)

    # Prepare fixed input for this stage
    input_row = {f: fixed_conditions[f] for f in relevant_features}
    input_row.update(optimized_params)  # Include already optimized parameters

    def objective(trial):
        trial_value = trial.suggest_int(param, 1, 500)
        row = input_row.copy()
        row[param] = trial_value
        df_input = pd.DataFrame([row])
        pred = model.predict(df_input)[0]
        return pred  # Maximizing Avg Speed

    #optuna.logging.disable_default_handler()
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=1000)
    print(f"Max predicted avg speed: {study.best_value}")

    best_val = study.best_params[param]
    optimized_params[param] = best_val

    print(f" Best {param}: {best_val}")

# Final output
print("\n All optimized parameters:")
for k, v in optimized_params.items():
    print(f"{k}: {v}")


# Race Strategy

In [None]:
import math

# Define the tire parameters and their lap time formulas
def lap_time_super_soft(X):
    return (69.269-0.141812499999999) + 0.141812499999999 * X

def lap_time_soft(X):
    return (70.53-0.0667741935483869) + 0.0667741935483869 * X

def lap_time_medium(X):
    return (70.927-0.0670857142857143) + 0.0670857142857143 * X

def lap_time_hard(X):
    return (70.257-0.109609756097561) + 0.109609756097561 * X

# Tire data with their lifespan
tire_lifespan = {
    "super_soft": 16,
    "soft": 30,
    "medium": 35,
    "hard": 41
}

# Pit stop penalty
pit_stop_time = 30  # seconds

# Function to calculate the total race time for a given strategy
def calculate_race_time(laps, strategy):
    total_time = 0
    total_pit_stops = 0
    lap_index = 0
    lap_counter = 0

    while lap_counter < laps:
        tire, stint_laps = strategy[lap_index]

        # Ensure we don't exceed the total laps
        if lap_counter + stint_laps > laps:
            stint_laps = laps - lap_counter

        # Calculate the lap times for this stint
        lap_times = []
        for i in range(stint_laps):
            if tire == "super_soft":
                lap_times.append(lap_time_super_soft(i + 1))
            elif tire == "soft":
                lap_times.append(lap_time_soft(i + 1))
            elif tire == "medium":
                lap_times.append(lap_time_medium(i + 1))
            elif tire == "hard":
                lap_times.append(lap_time_hard(i + 1))

        total_time += sum(lap_times)  # Add the lap times of this stint
        lap_counter += stint_laps

        # If we are not at the last stint, account for a pit stop
        if lap_counter < laps:
            total_time += pit_stop_time  # Pit stop penalty
            total_pit_stops += 1

        lap_index += 1
        if lap_index >= len(strategy):
            break

    return total_time, total_pit_stops

# Function to generate possible strategies dynamically
def generate_strategies(laps):
    strategies = []
    tire_choices = ["super_soft", "soft", "medium", "hard"]

    # Generate strategies by breaking the laps into multiple stints
    for tire1 in tire_choices:
        for tire2 in tire_choices:
            for tire3 in tire_choices:
                #for tire4 in tire_choices:
                  #for tire5 in tire_choices:
                    strategy = []
                    remaining_laps = laps

                    # Create dynamic stints for each tire
                    #for tire in [tire1, tire2, tire3, tire4, tire5]:
                    #for tire in [tire1, tire2, tire3, tire4]:
                    for tire in [tire1, tire2, tire3]:
                    #for tire in [tire1, tire2]:
                        stint_laps = tire_lifespan[tire]

                        if remaining_laps > stint_laps:
                            strategy.append((tire, stint_laps))
                            remaining_laps -= stint_laps
                        else:
                            strategy.append((tire, remaining_laps))
                            break

                    if sum([stint[1] for stint in strategy]) == laps:
                        strategies.append(strategy)

    return strategies

# Function to find the best strategy
def optimize_strategy(laps):
    best_time = math.inf
    best_strategy = None

    strategies = generate_strategies(laps)

    for strategy in strategies:
        total_time, pit_stops = calculate_race_time(laps, strategy)
        if total_time < best_time:
            best_time = total_time
            best_strategy = strategy
            best_pit_stops = pit_stops

    return best_strategy, best_time, best_pit_stops


# Main function
if __name__ == "__main__":
    race_laps = 83
    best_strategy, best_time, total_pit_stops = optimize_strategy(race_laps)
    print(f"Best Strategy: {best_strategy}")
    print(f"Best Total Time: {best_time} seconds")
    print(f"Total Pit Stops: {total_pit_stops}")


In [None]:
Best Strategy: [('super_soft', 16), ('super_soft', 16), ('super_soft', 16), ('super_soft', 16), ('soft', 19)]
Best Total Time: 5972.774387096774 seconds

Best Strategy: [('super_soft', 16), ('super_soft', 16), ('soft', 30), ('soft', 21)]
Best Total Time: 5980.742354838709 seconds

Best Strategy: [('soft', 30), ('soft', 30), ('soft', 23)]
Best Total Time: 5988.977419354838 seconds

Random Car-Parameters, to see, if optimized values are actually better than other values.

In [None]:
from random import randint

print(f'Rear Wing {randint(1, 500)}')
print(f'Engine {randint(1, 500)}')
print(f'Front Wing {randint(1, 500)}')
print(f'Brakebalance {randint(1, 500)}')
print(f'Differential {randint(1, 500)}')
print(f'Suspension {randint(1, 500)}')



## **Analytics for Race 2**

In [None]:
# Team members working on this code: Names..

In [None]:
!pip install optuna

In [None]:
import optuna
import lightgbm as lgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"])*3600

# Example: Regression task
def objective(trial):
    param = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'num_leaves': trial.suggest_int('num_leaves', 20, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.2, log=True),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
        'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
        'force_col_wise': True  # Optional but often speeds things up
    }


    # K-Fold Cross-Validation
    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    X = df.drop(columns=['Lap Distance', 'Lap Time', 'Avg Speed'])
    y = df["Avg Speed"]
    scores = []

    for train_idx, val_idx in kf.split(X):
        X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
        y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]

        dtrain = lgb.Dataset(X_train, label=y_train)
        dvalid = lgb.Dataset(X_val, label=y_val)

        model = lgb.train(param, dtrain, num_boost_round=1000,
                          valid_sets=[dvalid],
                          callbacks=[lgb.early_stopping(50), lgb.log_evaluation(10)])

        preds = model.predict(X_val)
        #rmse = mean_squared_error(y_val, preds, squared=False)
        rmse = np.sqrt(((y_val - preds)**2).mean())
        scores.append(rmse)

    return np.mean(scores)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50)

print("Best trial:")
print(study.best_trial.params)


In [None]:
import pandas as pd
import lightgbm as lgb
import matplotlib.pyplot as plt

# Load the data
df = pd.read_csv("simulator_data.csv")
df["Avg Speed"] = (df["Lap Distance"] / df["Lap Time"]) * 3600

# Define X and y
X = df.drop(columns=['Lap Distance', 'Lap Time', 'Avg Speed'])
y = df["Avg Speed"]

# Set the hyperparameters
params = {
    'num_leaves': 242,
    'max_depth': 9,
    'learning_rate': 0.0645474493988489,
    'min_data_in_leaf': 96,
    'feature_fraction': 0.9427185636727835,
    'bagging_fraction': 0.785711983453578,
    'bagging_freq': 2,
    'lambda_l1': 0.6006835353397709,
    'lambda_l2': 1.4905819957883624,
    'objective': 'regression',
    'metric': 'l2',
}

# Train the model
train_data = lgb.Dataset(X, label=y)
model = lgb.train(params,
                  train_data,
                  num_boost_round=100)

# Get feature importance by 'split' and 'gain'
importance_split = model.feature_importance(importance_type='split')
importance_gain = model.feature_importance(importance_type='gain')

# Sort feature importance by 'split'
sorted_split_idx = importance_split.argsort()[::1]
sorted_split_importance = importance_split[sorted_split_idx]
sorted_split_features = X.columns[sorted_split_idx]

# Sort feature importance by 'gain'
sorted_gain_idx = importance_gain.argsort()[::1]
sorted_gain_importance = importance_gain[sorted_gain_idx]
sorted_gain_features = X.columns[sorted_gain_idx]

# Plot feature importance by 'split'
plt.figure(figsize=(12, 6))
plt.barh(sorted_split_features, sorted_split_importance , color='red')
plt.title('Feature Importance by Split')
plt.xlabel('Number of Splits')
plt.ylabel('Features')
plt.show()

# Plot feature importance by 'gain'
plt.figure(figsize=(12, 6))
plt.barh(sorted_gain_features, sorted_gain_importance, color='blue')
plt.title('Feature Importance by Gain')
plt.xlabel('Gain')
plt.ylabel('Features')
plt.show()


## **Analytics for Race 3**

In [None]:
# Team members working on this code: Names..

## **Analytics for Race 4**

In [None]:
# Team members working on this code: Names..

## **Analytics for Race 5**

In [None]:
# Team members working on this code: Names..

## **Debrief Race Calendar before Final Race**

*Write a longer text (200-500 words) reflecting on what were the main ideas you started the seminar with, how you improved your models to achieve better performance and what strategy and analytics you want to use for your final race during seminar day*

## **Analytics for Final Race**

In [None]:
# Team members working on this code: Names..

## **References**
- Cite all references you need according to chair guidelines

Liu, Xuan; Shi, Savannah Wei; Teixeira, Thales; Wedel, Michel (2018): Video Content Marketing: The Making of Clips, Journal of Marketing, Vol. 82, 86-101.