# Regresión Lineal Múltiple - Predicción de ImpactPlayerScore en CS:GO
Este notebook sigue la metodología CRISP-DM hasta la fase 4 para evaluar el modelo de regresión: **Regresión Lineal Múltiple**.

In [None]:
# Cargar librerías
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from google.colab import files

In [None]:
#Cargar datos y crear variables

url = "https://raw.githubusercontent.com/Seba-RiveraC/Crisp_DM_CSGO/refs/heads/master/Anexo%20ET_demo_round_traces_2022%20(1).csv"
df = pd.read_csv(url, sep=';')
# Crear variables necesarias
df['ImpactPlayerScore'] = df['RoundKills'] + 0.5 * df['RoundAssists'] + df['RoundHeadshots']
df['KAST'] = ((df['RoundKills'] > 0) | (df['RoundAssists'] > 0)).astype(int)
#para propositos de que no se rompa el collab, se usara una parte del dataset
df = df.sample(frac=0.5, random_state=42)

  df = pd.read_csv(url, sep=';')


In [None]:
# Fase 3 - Preparación de datos y limpieza
df_model = df.copy()
df_model = df_model.applymap(lambda x: str(x) if isinstance(x, str) or isinstance(x, bool) else x)
df_model = df_model.replace({'True': True, 'False': False, 'TRUE': True, 'FALSE': False})
for col in df_model.columns:
    if df_model[col].dtype == bool:
        df_model[col] = df_model[col].astype(int)
categorical_cols = df_model.select_dtypes(include='object').columns.tolist()
df_model = pd.get_dummies(df_model, columns=categorical_cols, drop_first=True)



  df_model = df_model.applymap(lambda x: str(x) if isinstance(x, str) or isinstance(x, bool) else x)
  df_model = df_model.replace({'True': True, 'False': False, 'TRUE': True, 'FALSE': False})


In [None]:
# Fase 4 - Regresión Lineal Múltiple
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Cargar y copiar datos
df_model = df.copy()

def convertir_a_float(valor):
    if isinstance(valor, str):
        valor = valor.replace('.', '').replace(',', '.')
        try:
            return float(valor)
        except:
            return np.nan
    return valor

# Columnas a limpiar
columnas = ['TimeAlive', 'TravelledDistance', 'KAST', 'ImpactPlayerScore']
for col in columnas:
    df_model[col] = df_model[col].apply(convertir_a_float)

# Eliminar nulos
df_model.dropna(subset=columnas, inplace=True)

# Variables predictoras y objetivo
X = df_model[['KAST', 'TimeAlive', 'TravelledDistance']]
y = df_model['ImpactPlayerScore']

# División de datos
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Escalar datos
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Modelo de Regresión Lineal
modelo_lr = LinearRegression()
modelo_lr.fit(X_train_scaled, y_train)
y_pred_lr = modelo_lr.predict(X_test_scaled)

# Evaluación
print("\nModelo: Regresión Lineal Múltiple (con escalado)")
print("MAE:", mean_absolute_error(y_test, y_pred_lr))
print("MSE:", mean_squared_error(y_test, y_pred_lr))
print("RMSE:", np.sqrt(mean_squared_error(y_test, y_pred_lr)))
print("R²:", r2_score(y_test, y_pred_lr))




Modelo: Regresión Lineal Múltiple (con escalado)
MAE: 0.49498045000085056
MSE: 0.8435163666442211
RMSE: 0.9184314708481092
R²: 0.5383292801095629
