# LAB | Hyperparameter Tuning

**Load the data**

Finally step in order to maximize the performance on your Spaceship Titanic model.

The data can be found here:

https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv

Metadata

https://github.com/data-bootcamp-v4/data/blob/main/spaceship_titanic.md

So far we've been training and evaluating models with default values for hyperparameters.

Today we will perform the same feature engineering as before, and then compare the best working models you got so far, but now fine tuning it's hyperparameters.

In [1]:
#Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier

In [2]:
spaceship = pd.read_csv("https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv")
spaceship.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True


Now perform the same as before:
- Feature Scaling
- Feature Selection


In [3]:
#your code here

#Clean

spaceship_cleaned = spaceship.dropna()
spaceship_cleaned = spaceship_cleaned.drop(columns=['PassengerId', 'Name'])
spaceship_cleaned['Cabin'] = spaceship_cleaned['Cabin'].str[0]
spaceship_cleaned.info

#Scaling

from sklearn.preprocessing import StandardScaler

# Selecting numerical columns for scaling
numerical_columns = ['Age', 'RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck']

# Create an instance of StandardScaler
scaler = StandardScaler()

# Apply scaling only to numerical columns
spaceship_scaled = spaceship_cleaned.copy()
spaceship_scaled[numerical_columns] = scaler.fit_transform(spaceship_cleaned[numerical_columns])

#Selection
# Dropping columns that are not useful for prediction


# One-hot encoding for categorical columnsL
spaceship_encoded = pd.get_dummies(spaceship_scaled, columns=['HomePlanet', 'CryoSleep', 'Cabin', 'Destination', 'VIP'], drop_first=True)


- Now let's use the best model we got so far in order to see how it can improve when we fine tune it's hyperparameters.

In [4]:
#your code here
# Definir las características (X) y la variable objetivo (y)
X = spaceship_encoded.drop('Transported', axis=1)
y = spaceship_encoded['Transported']
# Dividir los datos en conjunto de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


# Define the parameter grid
param_grid = {
    'n_neighbors': [3, 5, 7, 9, 11],    # Test different values of neighbors
    'weights': ['uniform', 'distance'], # Try both uniform and distance-based weights
    'p': [1, 2]                         # Manhattan (1) and Euclidean (2) distances
}

# Initialize the model
knn = KNeighborsClassifier()

# Set up GridSearchCV
grid_search = GridSearchCV(knn, param_grid, cv=5, scoring='accuracy', verbose=1)

# Fit GridSearchCV
grid_search.fit(X_train, y_train)  # Use your training data here

# Get the best hyperparameters and model
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_

print(f"Best Hyperparameters: {best_params}")


Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Hyperparameters: {'n_neighbors': 9, 'p': 1, 'weights': 'uniform'}


- Evaluate your model

In [5]:
#your code here
from sklearn.metrics import accuracy_score
# Probar el mejor modelo en los datos de prueba
y_pred = best_model.predict(X_test)

# Evaluar el rendimiento
accuracy = accuracy_score(y_test, y_pred)

print(f"Precisión del modelo optimizado: {accuracy:.4f}")


Precisión del modelo optimizado: 0.7906


**Grid/Random Search**

For this lab we will use Grid Search.

- Define hyperparameters to fine tune.

In [6]:
#your code here
# Definir el grid de hiperparámetros
param_grid = {
    'n_neighbors': [3, 5, 7, 9, 11, 15],
    'weights': ['uniform', 'distance'],
    'p': [1, 2],
    'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
    'leaf_size': [10, 30, 50]
}


- Run Grid Search

In [8]:
# Inicializar el modelo KNN
knn = KNeighborsClassifier()

# Configurar GridSearchCV
grid_search = GridSearchCV(knn, param_grid, cv=5, scoring='accuracy', verbose=1)

# Ajustar GridSearchCV a los datos de entrenamiento
grid_search.fit(X_train, y_train)

# Obtener los mejores hiperparámetros y modelo
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_

print(f"Mejores hiperparámetros: {best_params}")


Fitting 5 folds for each of 288 candidates, totalling 1440 fits


Mejores hiperparámetros: {'algorithm': 'kd_tree', 'leaf_size': 30, 'n_neighbors': 15, 'p': 2, 'weights': 'uniform'}


  _data = np.array(data, dtype=dtype, copy=copy,


- Evaluate your model

In [12]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score

# Use the best model from the grid search
best_knn = grid_search.best_estimator_

# Predict on the test set
y_pred = best_knn.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}')

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

# Classification Report (Precision, Recall, F1-Score)
print("Classification Report:")
print(classification_report(y_test, y_pred))

# ROC-AUC Score
y_proba = best_knn.predict_proba(X_test)[:, 1]  # Probabilities for the positive class
roc_auc = roc_auc_score(y_test, y_proba)
print(f'ROC-AUC Score: {roc_auc:.4f}')


Accuracy: 0.7926
Confusion Matrix:
[[772 203]
 [208 799]]
Classification Report:
              precision    recall  f1-score   support

       False       0.79      0.79      0.79       975
        True       0.80      0.79      0.80      1007

    accuracy                           0.79      1982
   macro avg       0.79      0.79      0.79      1982
weighted avg       0.79      0.79      0.79      1982

ROC-AUC Score: 0.8834
