## Analisis para la prediccion de Tiempo de Vuelta usando Regresion Lineal Multiple

`Interlagos` (Autódromo José Carlos Pace) es un circuito ubicado en São Paulo, Brasil. Con una longitud de 4309 km , es una de las pocas pistas que se corren en sentido antihorario, combinando rectas rápidas con una zona técnica y sinuosa en el sector 2. 

El clima es impredecible, con lluvias repentinas que afectan la estrategia.Generalmente, los equipos optan por estrategias de dos paradas debido a la degradación de neumáticos. Por tanto es clave para los pilotos tener una buena tracción en la salida de la última curva para maximizar la velocidad en la recta y manejo eficiente de los neumáticos en el sector medio, donde hay curvas de radio corto y carga aerodinámica alta.

![Interlagos](../img/interlagos.jpg)

Hagamos entonces un estudio particularizado para condiciones de la pista , para ello consideramos todas nuestras variables y mediante un metodo iterativo vamos descartando variables :


In [61]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import seaborn as sns
import io
from urllib.request import urlopen
import statsmodels.api as sm

data = pd.read_csv("formula1_interlagos_data_final.csv")

# Print column names
print("Available columns:")
print(data.columns.tolist())


def summary_circuit(circuit_name):
    
    circuit_data = data[data['Circuit'] == circuit_name].copy()  

    if circuit_data.empty:
        print(f"No data found for circuit {circuit_name}")
        print("Available circuits:")
        print(data['Circuit'])  
    else:
        print(f"Data for circuit {circuit_name}:")
        # print(circuit_data.head()) 


    print(f"\nAnalyzing data for circuit: {circuit_name}")
    print(f"Number of races: {len(circuit_data)}")

    # Select relevant features
    features = ['MaxSpeed','Overtakes','TrackTemperature','DriverSkill',
            'FuelConsumption','ReactionTime','PitStopTime','Experience',
            'CarPerformance','TrackFamiliarity','DownForceLevel','TyreWear']

    target = 'FinalRaceTime'

    # Check if columns exist and remove those that don't
    features = [f for f in features if f in data.columns]
    if target not in data.columns:
        raise ValueError(f"Target column '{target}' not found in the dataset")

    print(f"\nUsing features: {features}")
    print(f"Number of features : {len(features)}")
    print(f"Target: {target}")

    for feature in features:
        circuit_data[feature] = pd.to_numeric(circuit_data[feature], errors='coerce')
    circuit_data = circuit_data.dropna(subset=features + [target])


    # Split the data for the selected driver (only Hamilton's data)
    X = circuit_data[features]
    y = circuit_data[target]


    # Split into train and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Scale the features
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)

    # Create and train the model
    model = LinearRegression()
    model.fit(X_train_scaled, y_train)

    # Print the equation of the hyperplane
    coefficients = model.coef_
    intercept = model.intercept_

    print("\nEquation of the hyperplane:")
    equation = f"FinalRiceTime = {intercept:.2f}"
    for feature, coef in zip(features, coefficients):
        equation += f" + ({coef:.2f} * {feature})"
    print(equation)

    # Calculate R-squared
    r_squared = model.score(X_test_scaled, y_test)
    print(f"\nR-squared: {r_squared:.4f}")
    
   
# Agregar constante para la intersección
    X_train_const = sm.add_constant(X_train_scaled)

# Ajustar el modelo de regresión con statsmodels
    model_sm = sm.OLS(y_train, X_train_const).fit()

# Mostrar los p-values de cada coeficiente
    print(model_sm.summary())

summary_circuit('Interlagos')

Available columns:
['Date', 'Driver', 'Age', 'Team', 'Circuit', 'PitStopTime', 'ReactionTime', 'FinalPosition', 'DNF', 'Points', 'MaxSpeed', 'Overtakes', 'Experience', 'DriverSkill', 'CarPerformance', 'TrackFamiliarity', 'WeatherCondition', 'TyreCompound', 'EngineMode', 'QualifyingPosition', 'TyreWear', 'FuelConsumption', 'DownforceLevel', 'FinalRaceTime', 'TrackTemperature', 'TrackGrip']
Data for circuit Interlagos:

Analyzing data for circuit: Interlagos
Number of races: 42

Using features: ['MaxSpeed', 'Overtakes', 'TrackTemperature', 'DriverSkill', 'FuelConsumption', 'ReactionTime', 'PitStopTime', 'Experience', 'CarPerformance', 'TrackFamiliarity', 'TyreWear']
Number of features : 11
Target: FinalRaceTime

Equation of the hyperplane:
FinalRiceTime = 270.36 + (-2.79 * MaxSpeed) + (1.26 * Overtakes) + (-0.98 * TrackTemperature) + (4.20 * DriverSkill) + (-0.04 * FuelConsumption) + (-0.72 * ReactionTime) + (-0.68 * PitStopTime) + (-8.09 * Experience) + (-1.27 * CarPerformance) + (0.19 

                            OLS Regression Results                            
Dep. Variable:          FinalRaceTime   R-squared:                       0.669
Model:                            OLS   Adj. R-squared:                  0.495
Method:                 Least Squares   F-statistic:                     3.857
Date:                Sun, 16 Feb 2025   Prob (F-statistic):            0.00383
Time:                        01:17:42   Log-Likelihood:                -94.490
No. Observations:                  33   AIC:                             213.0
Df Residuals:                      21   BIC:                             230.9
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        270.3552      0.925    292.246      0.0

In [32]:
# Convertir variables categóricas en dummies
categorical_features = ['TyreCompound', 'WeatherCondition', 'EngineMode', 'TrackGrip']
circuit_data = pd.get_dummies(circuit_data, columns=categorical_features, drop_first=True)

# Seleccionar las características de nuevo después de dummies
features = [col for col in circuit_data.columns if col != target]

# Definir X e y
X = circuit_data[features]
y = circuit_data[target]

# Split en train y test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Escalar los datos
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Crear y entrenar el modelo
model = LinearRegression()
model.fit(X_train_scaled, y_train)

# Mostrar la ecuación de la regresión
coefficients = model.coef_
intercept = model.intercept_

print("\nEquation of the hyperplane:")
equation = f"FinalRaceTime = {intercept:.2f}"
for feature, coef in zip(features, coefficients):
    equation += f" + ({coef:.2f} * {feature})"
print(equation)

# Calcular R-cuadrado
r_squared = model.score(X_test_scaled, y_test)
print(f"\nR-squared: {r_squared:.4f}")

KeyError: "None of [Index(['TyreCompound', 'WeatherCondition', 'EngineMode', 'TrackGrip'], dtype='object')] are in the [columns]"