# LAB | Hyperparameter Tuning

**Load the data**

Finally step in order to maximize the performance on your Spaceship Titanic model.

The data can be found here:

https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv

Metadata

https://github.com/data-bootcamp-v4/data/blob/main/spaceship_titanic.md

So far we've been training and evaluating models with default values for hyperparameters.

Today we will perform the same feature engineering as before, and then compare the best working models you got so far, but now fine tuning it's hyperparameters.

In [48]:
#Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, StandardScaler

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

from sklearn.model_selection import GridSearchCV

In [24]:
spaceship = pd.read_csv("https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv")
spaceship.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True


In [25]:
spaceship

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8688,9276_01,Europa,False,A/98/P,55 Cancri e,41.0,True,0.0,6819.0,0.0,1643.0,74.0,Gravior Noxnuther,False
8689,9278_01,Earth,True,G/1499/S,PSO J318.5-22,18.0,False,0.0,0.0,0.0,0.0,0.0,Kurta Mondalley,False
8690,9279_01,Earth,False,G/1500/S,TRAPPIST-1e,26.0,False,0.0,0.0,1872.0,1.0,0.0,Fayey Connon,True
8691,9280_01,Europa,False,E/608/S,55 Cancri e,32.0,False,0.0,1049.0,0.0,353.0,3235.0,Celeon Hontichre,False


Now perform the same as before:
- Feature Scaling
- Feature Selection

In [26]:
spaceship.isnull().sum()

PassengerId       0
HomePlanet      201
CryoSleep       217
Cabin           199
Destination     182
Age             179
VIP             203
RoomService     181
FoodCourt       183
ShoppingMall    208
Spa             183
VRDeck          188
Name            200
Transported       0
dtype: int64

In [27]:
spaceship = spaceship.dropna()

In [28]:
spaceship = spaceship.drop(columns=['PassengerId', 'Name', 'Destination'])

In [29]:
spaceship['Cabin'] = spaceship['Cabin'].astype(str)
spaceship['Cabin'] = spaceship['Cabin'].apply(lambda x: x[0] if len(x) > 0 else '')

In [30]:
spaceship.dtypes

HomePlanet       object
CryoSleep        object
Cabin            object
Age             float64
VIP              object
RoomService     float64
FoodCourt       float64
ShoppingMall    float64
Spa             float64
VRDeck          float64
Transported        bool
dtype: object

In [31]:
spaceship['HomePlanet'].unique()

array(['Europa', 'Earth', 'Mars'], dtype=object)

In [32]:
spaceship['CryoSleep'].unique()

array([False, True], dtype=object)

In [33]:
spaceship['Cabin'].unique()

array(['B', 'F', 'A', 'G', 'E', 'C', 'D', 'T'], dtype=object)

In [34]:
spaceship['VIP'].unique()

array([False, True], dtype=object)

In [35]:
spaceship['CryoSleep'] = spaceship['CryoSleep'].astype(bool)
spaceship['VIP'] = spaceship['VIP'].astype(bool)

In [36]:
spaceship.dtypes

HomePlanet       object
CryoSleep          bool
Cabin            object
Age             float64
VIP                bool
RoomService     float64
FoodCourt       float64
ShoppingMall    float64
Spa             float64
VRDeck          float64
Transported        bool
dtype: object

In [37]:
spaceship = pd.get_dummies(spaceship, columns=['HomePlanet', 'Cabin'], drop_first=True)

In [38]:
X = spaceship.drop(columns=['Transported'])
y = spaceship['Transported']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [39]:
scaler = MinMaxScaler()

In [40]:
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [46]:
scaler = StandardScaler()

In [47]:
X_train_scaled = scaler.fit_transform(X_train_scaled)
X_test_scaled = scaler.transform(X_test_scaled)

- Now let's use the best model we got so far in order to see how it can improve when we fine tune it's hyperparameters.

In [49]:
models = {
    'Logistic Regression': LogisticRegression(),
    'Random Forest': RandomForestClassifier(),
    'Support Vector Machine': SVC(),
    'K-Nearest Neighbors': KNeighborsClassifier(),
    'Decision Tree': DecisionTreeClassifier(),
    'Naive Bayes': GaussianNB()
}


In [50]:
results = []

for model_name, model in models.items():

    model.fit(X_train_scaled, y_train)
    
    y_pred = model.predict(X_test_scaled)
    
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    
    results.append({
        'Model': model_name,
        'Accuracy': accuracy,
        'Precision': precision,
        'Recall': recall,
        'F1 Score': f1
    })

results_df = pd.DataFrame(results)


- Evaluate your model

In [51]:
best_model_by_accuracy = results_df.sort_values(by='Accuracy', ascending=False).head(1)
best_model_by_f1 = results_df.sort_values(by='F1 Score', ascending=False).head(1)

print("Mejor modelo por Accuracy:")
print(best_model_by_accuracy)

print("\nMejor modelo por F1 Score:")
print(best_model_by_f1)


Mejor modelo por Accuracy:
                    Model  Accuracy  Precision   Recall  F1 Score
2  Support Vector Machine  0.811649   0.819149  0.80568  0.812359

Mejor modelo por F1 Score:
                    Model  Accuracy  Precision   Recall  F1 Score
2  Support Vector Machine  0.811649   0.819149  0.80568  0.812359


**Grid/Random Search**

For this lab we will use Grid Search.

- Define hyperparameters to fine tune.

In [52]:
# Definir el modelo base (SVM)
model = SVC()

# Definir el grid de hiperparámetros
param_grid = {
    'C': [0.1, 1, 10, 100],          # Parámetro de regularización
    'kernel': ['linear', 'rbf', 'poly'],  # Diferentes kernels a probar
    'gamma': [1, 0.1, 0.01, 0.001]   # Parámetro gamma (para kernels no lineales)
}

- Run Grid Search

In [None]:
# Definir la búsqueda con GridSearchCV
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy')

# Ejecutar la búsqueda de hiperparámetros en los datos escalados
grid_search.fit(X_train_scaled, y_train)

- Evaluate your model

In [None]:
# Obtener los mejores hiperparámetros encontrados
best_params = grid_search.best_params_
print(f"Mejores hiperparámetros: {best_params}")

# Evaluar el mejor modelo
best_model = grid_search.best_estimator_

# Hacer predicciones en los datos de prueba
y_pred = best_model.predict(X_test_scaled)

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")