# LAB | Hyperparameter Tuning

**Load the data**

Finally step in order to maximize the performance on your Spaceship Titanic model.

The data can be found here:

https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv

Metadata

https://github.com/data-bootcamp-v4/data/blob/main/spaceship_titanic.md

So far we've been training and evaluating models with default values for hyperparameters.

Today we will perform the same feature engineering as before, and then compare the best working models you got so far, but now fine tuning it's hyperparameters.

In [1]:
#Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

In [2]:
spaceship = pd.read_csv("https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv")
spaceship.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True


Now perform the same as before:
- Feature Scaling
- Feature Selection


In [3]:
#your code here

features = spaceship[["RoomService", "FoodCourt", "ShoppingMall", "Spa", "VRDeck"]]

target = spaceship["Transported"]       # Categórico -> Clasificación

In [4]:
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)

In [5]:
from sklearn.preprocessing import MinMaxScaler

normalizer = MinMaxScaler()

X_train_norm = normalizer.fit_transform(X_train)
X_test_norm = normalizer.transform(X_test)

- Now let's use the best model we got so far in order to see how it can improve when we fine tune it's hyperparameters.

### El mejor modelo que se escogió en el ejercicio pasado fue Random Forest.

In [6]:
#your code here
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import root_mean_squared_error, mean_absolute_error, r2_score
from sklearn.metrics import classification_report, accuracy_score

# Create and train the Random Forest regressor

rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train_norm, y_train)

# Make predictions
y_pred_norm = rf_classifier.predict(X_test_norm)

- Evaluate your model

In [7]:
#your code here
report = classification_report(y_test, y_pred_norm)              
print(report)

              precision    recall  f1-score   support

       False       0.81      0.72      0.77       861
        True       0.75      0.84      0.79       878

    accuracy                           0.78      1739
   macro avg       0.78      0.78      0.78      1739
weighted avg       0.78      0.78      0.78      1739



**Grid/Random Search**

For this lab we will use Grid Search.

- Define hyperparameters to fine tune.

In [8]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier

# Definir los parámetros de la cuadrícula
param_grid = {
    "n_estimators": [50, 100, 200, 500],
    "max_leaf_nodes": [250, 500, 1000, None],
    "max_depth": [10, 30, 50, None]
}

- Run Grid Search

In [9]:

# # Crear el modelo de RandomForestClassifier
# forest_classi = RandomForestClassifier(random_state=42)

# # Configurar GridSearchCV
# grid_search = GridSearchCV(estimator=forest_classi, param_grid=param_grid, cv=5)

# # Entrenar el modelo
# grid_search.fit(X_train_norm, y_train)

# # Imprimir los mejores parámetros encontrados por GridSearchCV
# print("Best parameters found: ", grid_search.best_params_)


Best parameters found:  {'max_depth': 10, 'max_leaf_nodes': 250, 'n_estimators': 100}


Best parameters found:  {'max_depth': 10, 'max_leaf_nodes': 250, 'n_estimators': 100}

In [10]:
import joblib

# joblib.dump(grid_search, 'rf_classifier_grid_search.joblib')


['rf_classifier_grid_search.joblib']

In [11]:
# Por si quiero volver a ver el resultado sin necesidad de volver a estar 3 minutos

best_params_rf = joblib.load('rf_classifier_grid_search.joblib')

- Evaluate your model

In [12]:
rf_classifier = RandomForestClassifier(n_estimators=100, max_depth=10, max_leaf_nodes=250, random_state=42)

rf_classifier.fit(X_train_norm, y_train)

y_pred_norm = rf_classifier.predict(X_test_norm)

report = classification_report(y_test, y_pred_norm)              
print(report)

              precision    recall  f1-score   support

       False       0.82      0.72      0.77       861
        True       0.75      0.85      0.80       878

    accuracy                           0.78      1739
   macro avg       0.79      0.78      0.78      1739
weighted avg       0.79      0.78      0.78      1739

