# LAB | Hyperparameter Tuning

**Load the data**

Finally step in order to maximize the performance on your Spaceship Titanic model.

The data can be found here:

https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv

Metadata

https://github.com/data-bootcamp-v4/data/blob/main/spaceship_titanic.md

So far we've been training and evaluating models with default values for hyperparameters.

Today we will perform the same feature engineering as before, and then compare the best working models you got so far, but now fine tuning it's hyperparameters.

In [1]:
#Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

In [2]:
spaceship = pd.read_csv("https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv")
spaceship.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True


Now perform the same as before:
- Feature Scaling
- Feature Selection


In [3]:
from sklearn.preprocessing import StandardScaler, LabelEncoder

# 1. Feature Selection: Quitamos lo que no sirve (Nombres, IDs, Cabina)
spaceship.drop(['PassengerId', 'Name', 'Cabin'], axis=1, inplace=True)

# 2. Limpieza previa (necesaria para el escalado)
for col in spaceship.columns:
    if spaceship[col].dtype == 'object' or spaceship[col].dtype == 'bool':
        spaceship[col] = spaceship[col].fillna(spaceship[col].mode()[0])
    else:
        spaceship[col] = spaceship[col].fillna(spaceship[col].median())

# 3. Encoding (necesario para el escalado)
le = LabelEncoder()
for col in spaceship.select_dtypes(include=['object', 'bool']).columns:
    spaceship[col] = le.fit_transform(spaceship[col])

# 4. Feature Scaling: Ajustamos los números
X = spaceship.drop('Transported', axis=1)
y = spaceship['Transported']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print("Feature Scaling and Selection: Done!")

Feature Scaling and Selection: Done!


  spaceship[col] = spaceship[col].fillna(spaceship[col].mode()[0])
  spaceship[col] = spaceship[col].fillna(spaceship[col].mode()[0])


In [4]:
spaceship.head()

Unnamed: 0,HomePlanet,CryoSleep,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Transported
0,1,0,2,39.0,0,0.0,0.0,0.0,0.0,0.0,0
1,0,0,2,24.0,0,109.0,9.0,25.0,549.0,44.0,1
2,1,0,2,58.0,1,43.0,3576.0,0.0,6715.0,49.0,0
3,1,0,2,33.0,0,0.0,1283.0,371.0,3329.0,193.0,0
4,0,0,2,16.0,0,303.0,70.0,151.0,565.0,2.0,1


- Evaluate your model

In [27]:
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, AdaBoostClassifier

# 1. Definimos los candidatos
models = {
    "Random Forest": RandomForestClassifier(random_state=42),
    "Gradient Boosting": GradientBoostingClassifier(random_state=42),
    "AdaBoost": AdaBoostClassifier(random_state=42)
}

# 2. Entrenamos y evaluamos a cada uno
for name, model in models.items():
    model.fit(X_train, y_train)
    score = model.score(X_test, y_test)
    print(f"Model: {name} | Success: {score:.4f}")

from sklearn.metrics import accuracy_score

# 1. Definimos el modelo ganador con sus ajustes de fábrica
gb_model = GradientBoostingClassifier(random_state=42)

# 2. El modelo estudia los datos de entrenamiento
gb_model.fit(X_train, y_train)

# 3. El modelo hace el examen (predice sobre X_test)
y_pred = gb_model.predict(X_test)

# 4. Comparamos sus respuestas con las reales para sacar la nota
baseline_accuracy = accuracy_score(y_test, y_pred)

print(f"Success (Accuracy) initial of the model is: {baseline_accuracy:.4f}")

Model: Random Forest | Success: 0.7786
Model: Gradient Boosting | Success: 0.7855
Model: AdaBoost | Success: 0.7614
Success (Accuracy) initial of the model is: 0.7855


**Grid/Random Search**

For this lab we will use Grid Search.

- Define hyperparameters to fine tune.

In [29]:
# Hyperparmeters menu for the GridSearchCV
param_grid = {
    'n_estimators': [100, 150, 200],
    'learning_rate': [0.05, 0.1, 0.15],
    'max_depth': [3, 4, 5], 
    'subsample': [0.8, 1.0] 
}

- Run Grid Search

In [24]:
from sklearn.model_selection import GridSearchCV

# 1. Creamos el buscador automático
# cv=5 hace que el modelo sea más robusto (validación cruzada)
grid_search = GridSearchCV(estimator=gb_model, param_grid=param_grid, cv=5, n_jobs=-1, verbose=2)

# 2. Empezamos la búsqueda (esto puede tardar un poco)
grid_search.fit(X_train, y_train)

# 3. Guardamos el mejor modelo encontrado
best_model = grid_search.best_estimator_

print("Final search complete!")
print("The best parameters are:", grid_search.best_params_)

Fitting 5 folds for each of 54 candidates, totalling 270 fits
Final search complete!
The best parameters are: {'learning_rate': 0.05, 'max_depth': 5, 'n_estimators': 150, 'subsample': 0.8}


- Evaluate your model

In [23]:
# 1. Usamos el mejor modelo encontrado para predecir
final_predictions = best_model.predict(X_test)

# 2. Calculamos la nueva nota
final_accuracy = accuracy_score(y_test, final_predictions)

# 3. Comparamos los resultados
print(f"Initial success (Baseline): {initial_accuracy:.4f}")
print(f"Final success (Tuned): {final_accuracy:.4f}")
print(f"Total improvement: {(final_accuracy - initial_accuracy)*100:.2f}%")

Initial success (Baseline): 0.7855
Final success (Tuned): 0.7890
Total improvement: 0.35%
