# LAB | Hyperparameter Tuning

**Load the data**

Finally step in order to maximize the performance on your Spaceship Titanic model.

The data can be found here:

https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv

Metadata

https://github.com/data-bootcamp-v4/data/blob/main/spaceship_titanic.md

So far we've been training and evaluating models with default values for hyperparameters.

Today we will perform the same feature engineering as before, and then compare the best working models you got so far, but now fine tuning it's hyperparameters.

In [41]:
#Libraries
import pandas as pd
import numpy as np


from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error,classification_report,confusion_matrix
from sklearn.model_selection import train_test_split

In [42]:
spaceship = pd.read_csv("https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv")
spaceship.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True


Now perform the same as before:
- Feature Scaling
- Feature Selection


In [43]:
#your code here
spaceship.dropna(inplace=True)
spaceship["Cabin"] = spaceship["Cabin"].apply(lambda x: x[0])
spaceship.drop(columns=["PassengerId", "Name"], inplace=True)
spaceship = pd.get_dummies(spaceship)
features = spaceship.drop(columns= "Transported")
target = spaceship["Transported"]
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.20, random_state=None)

- Now let's use the best model we got so far in order to see how it can improve when we fine tune it's hyperparameters.

In [44]:
#your code here
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
bagging_class = BaggingClassifier(DecisionTreeClassifier(max_depth=100),
                               n_estimators=100,
                               max_samples = 300)
bagging_class.fit(X_train, y_train)
bagging_class_overfeed = bagging_class.score(X_train, y_train)
bagging_class_acc = bagging_class.score(X_test, y_test)
bagging_class_pred = bagging_class.predict(X_test)
bagging_class_rec_prec = classification_report(y_test, bagging_class_pred)
print("Overfeed:",bagging_class_overfeed, "Accuracy:", bagging_class_acc, "Difference: ",bagging_class_overfeed-bagging_class_acc, "\n", bagging_class_rec_prec)

Overfeed: 0.8247539742619228 Accuracy: 0.791981845688351 Difference:  0.03277212857357181 
               precision    recall  f1-score   support

       False       0.82      0.76      0.79       683
        True       0.76      0.83      0.79       639

    accuracy                           0.79      1322
   macro avg       0.79      0.79      0.79      1322
weighted avg       0.79      0.79      0.79      1322



- Evaluate your model

In [45]:
#your code here

**Grid/Random Search**

For this lab we will use Grid Search.

- Define hyperparameters to fine tune.

In [51]:
#your code here
grid = {"n_estimators": [50, 100, 250 , 500],
        "estimator__max_depth":[50, 100, 250 , 500],
        "estimator__max_leaf_nodes":[100,500, 700, 1000]}

- Run Grid Search

In [52]:
from sklearn.model_selection import GridSearchCV
bagging_class = BaggingClassifier(DecisionTreeClassifier())
model = GridSearchCV(estimator = bagging_class, param_grid = grid, cv=5)
model.fit(X_train, y_train)
best_params = model.best_params_
best_model = model.best_estimator_
bagging_class = best_model
bagging_class.fit(X_train, y_train)
bagging_class_overfeed = bagging_class.score(X_train, y_train)
bagging_class_acc = bagging_class.score(X_test, y_test)
bagging_class_pred = bagging_class.predict(X_test)
bagging_class_rec_prec = classification_report(y_test, bagging_class_pred)

- Evaluate your model

In [56]:
print("Overfeed:",bagging_class_overfeed, "Accuracy:", bagging_class_acc, "Difference: ", bagging_class_acc- bagging_class_overfeed, "\n","Best Parameters: ", best_params , "\n", bagging_class_rec_prec,)

Overfeed: 0.8546555639666918 Accuracy: 0.7844175491679274 Difference:  -0.07023801479876446 
 Best Parameters:  {'estimator__max_depth': 500, 'estimator__max_leaf_nodes': 100, 'n_estimators': 100} 
               precision    recall  f1-score   support

       False       0.83      0.74      0.78       683
        True       0.75      0.83      0.79       639

    accuracy                           0.78      1322
   macro avg       0.79      0.79      0.78      1322
weighted avg       0.79      0.78      0.78      1322

