# LAB | Hyperparameter Tuning

**Load the data**

Finally step in order to maximize the performance on your Spaceship Titanic model.

The data can be found here:

https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv

Metadata

https://github.com/data-bootcamp-v4/data/blob/main/spaceship_titanic.md

So far we've been training and evaluating models with default values for hyperparameters.

Today we will perform the same feature engineering as before, and then compare the best working models you got so far, but now fine tuning it's hyperparameters.

In [53]:
#Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

In [16]:
spaceship = pd.read_csv("https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv")
spaceship.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True


Now perform the same as before:
- Feature Scaling
- Feature Selection


In [17]:
missing_values_per_column = spaceship.isna().sum()
print(missing_values_per_column)

PassengerId       0
HomePlanet      201
CryoSleep       217
Cabin           199
Destination     182
Age             179
VIP             203
RoomService     181
FoodCourt       183
ShoppingMall    208
Spa             183
VRDeck          188
Name            200
Transported       0
dtype: int64


In [18]:
spaceship = spaceship.dropna()

In [19]:
spaceship['Cabin']= spaceship['Cabin'].str.split('/').str[0]

In [20]:
features = spaceship.select_dtypes(include='number')
target = spaceship['Transported']

In [21]:
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size = 0.20, random_state=0)

In [22]:
# Initialize the scaler
normalizer = MinMaxScaler()

In [23]:
# Fit the scaler on the training data and transform the training data
X_train_norm = normalizer.fit_transform(X_train)

In [24]:
# Transform the testing data using the same scaler
X_test_norm = normalizer.transform(X_test)

In [25]:
X_train_norm = pd.DataFrame(X_train_norm, columns = X_train.columns)
X_train_norm

Unnamed: 0,Age,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck
0,0.405063,0.000000,0.000000,0.000000,0.000000,0.000000
1,0.050633,0.000000,0.000000,0.000000,0.000000,0.000000
2,0.379747,0.000000,0.007916,0.000000,0.051276,0.000000
3,0.215190,0.001310,0.000000,0.046111,0.016378,0.000049
4,0.329114,0.000000,0.000000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...
5279,0.670886,0.000000,0.000000,0.000000,0.000000,0.000000
5280,0.455696,0.000000,0.000000,0.000000,0.032355,0.000098
5281,0.455696,0.000000,0.159528,0.000000,0.348893,0.004721
5282,0.430380,0.000000,0.000134,0.000000,0.030569,0.087480


In [26]:
X_test_norm = pd.DataFrame(X_test_norm, columns = X_test.columns)
X_test_norm

Unnamed: 0,Age,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck
0,0.632911,0.000000,0.000000,0.00000,0.000000,0.000000
1,0.227848,0.000000,0.000000,0.00000,0.000000,0.000000
2,0.189873,0.000000,0.000000,0.00000,0.000000,0.000000
3,0.658228,0.000000,0.000000,0.00000,0.000000,0.000000
4,0.784810,0.000000,0.054775,0.00000,0.077740,0.000000
...,...,...,...,...,...,...
1317,0.240506,0.000000,0.000000,0.05468,0.000045,0.001672
1318,0.468354,0.030242,0.115185,0.00000,0.000045,0.008409
1319,0.544304,0.000202,0.178748,0.00000,0.000312,0.000000
1320,0.177215,0.000000,0.000000,0.00000,0.000000,0.000000


In [43]:
bagging_reg = BaggingRegressor(DecisionTreeRegressor(max_depth=20),
                               n_estimators=100,
                               max_samples = 3000)

In [44]:
bagging_reg.fit(X_train_norm, y_train)

In [45]:
pred = bagging_reg.predict(X_test_norm)

print("MAE", mean_absolute_error(pred, y_test))
print("RMSE", mean_squared_error(pred, y_test, squared=False))
print("R2 score", bagging_reg.score(X_test_norm, y_test))

MAE 0.311689787832488
RMSE 0.4028830246270449
R2 score 0.35074107386945574


- Now let's use the best model we got so far in order to see how it can improve when we fine tune it's hyperparameters.

In [57]:
bagging_reg = BaggingRegressor(DecisionTreeRegressor(max_depth=40),
                               n_estimators=400,
                               max_samples = 5000)

In [58]:
bagging_reg.fit(X_train_norm, y_train)

In [59]:
pred = bagging_reg.predict(X_test_norm)

print("MAE", mean_absolute_error(pred, y_test))
print("RMSE", mean_squared_error(pred, y_test, squared=False))
print("R2 score", bagging_reg.score(X_test_norm, y_test))

MAE 0.313408849140679
RMSE 0.4068910595424253
R2 score 0.33775866265777


- Evaluate your model

Evaluation:
Mean Absolute Error (MAE):

The MAE in Set 1 is 0.3117, slightly lower than Set 2's 0.3126. A lower MAE indicates that, on average, the predictions are closer to the actual values in Set 1.
Root Mean Square Error (RMSE):

The RMSE in Set 1 is 0.4029, also lower than Set 2's 0.4065. Since RMSE penalizes larger errors more than MAE, Set 1 has slightly better performance in minimizing larger errors.
R² Score:

The R² score in Set 1 is 0.3507, which is higher than Set 2's 0.3389. A higher R² score indicates that the model in Set 1 explains more of the variance in the target variable compared to the model in Set 2.
Conclusion:
Set 1 has slightly better performance across all metrics (MAE, RMSE, and R²). The differences are small, but Set 1 generally produces more accurate predictions with a slightly better fit to the data.
Given the minimal differences, the choice between the two might also depend on other factors like model complexity or computational efficiency. However, purely based on these metrics, Set 1 would be the preferred model.

**Grid/Random Search**

For this lab we will use Grid Search.

- Define hyperparameters to fine tune.

In [63]:
print("Number of samples in training set:", X_train.shape[0])

Number of samples in training set: 5284


In [65]:
grid = {
    "n_estimators": [50, 100, 200, 500],
    "base_estimator__max_leaf_nodes": [250, 500, 1000, None],
    "base_estimator__max_depth": [10, 30, 50],
    "max_samples": [1000, 2000, 3000, 4000]
}

In [66]:
# Initialize the GridSearchCV
grid_search = GridSearchCV(
    estimator=bagging_reg,
    param_grid=grid,
    scoring='r2',  # You can choose other scoring metrics like 'neg_mean_absolute_error', 'r2', etc.
    cv=5,  # Number of folds in cross-validation
    verbose=1,  # To see the progress of the search
    n_jobs=-1  # Use all available cores to speed up the search
)

- Run Grid Search

In [67]:
# Fit the GridSearchCV to the data
# Assuming X_train and y_train are your features and target variables
grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 192 candidates, totalling 960 fits


  clone(base_estimator).set_params(**self.best_params_)


- Evaluate your model

In [68]:
# Print the best parameters and best score
print("Best parameters found: ", grid_search.best_params_)
print("Best score found: ", grid_search.best_score_)

# Optionally, retrieve the best model
best_bagging_reg = grid_search.best_estimator_

Best parameters found:  {'base_estimator__max_depth': 10, 'base_estimator__max_leaf_nodes': 1000, 'max_samples': 2000, 'n_estimators': 500}
Best score found:  0.386652663860111


The results from your `GridSearchCV` provide the best hyperparameters and the best score for the `BaggingRegressor` model. Let’s interpret these results:

### Best Parameters:
- **`base_estimator__max_depth`:** 10
  - The maximum depth of the base decision tree estimators is set to 10. This means each decision tree in the bagging ensemble will be limited to 10 levels deep, which helps control overfitting.
  
- **`base_estimator__max_leaf_nodes`:** 1000
  - The maximum number of leaf nodes in the base decision tree estimators is set to 1000. This controls the size of the tree, preventing it from becoming too complex.

- **`max_samples`:** 2000
  - Each base estimator in the bagging ensemble will be trained on a random subset of 2000 samples from the training data. This helps in creating diverse models and reduces overfitting.

- **`n_estimators`:** 500
  - The number of base estimators (decision trees) in the ensemble is set to 500. A larger number of estimators typically improves performance but increases computation time.

### Best Score:
- **Best score found:** `0.3867` (assuming this is the R² score from the grid search)

### Evaluation:

#### R² Score Interpretation:
- **R² Score (0.3867):** The R² score measures how well the model explains the variance in the target variable. An R² score of 0.3867 indicates that approximately 38.67% of the variance in the target variable is explained by the model. This means that the model has some predictive power but also that there is room for improvement. 

#### Model Performance:
- **Interpretation of the R² Score:** While an R² score of around 0.39 suggests that the model is capturing some relationship between the features and the target variable, it is not capturing all the variability. Depending on the problem and the data, this may or may not be satisfactory.
  - **Low R²:** A low R² score can be common in complex problems or noisy datasets. It may indicate that either the model is not capturing the underlying patterns well, or the problem itself is very challenging.
  - **Context Matters:** The performance should be evaluated in the context of your specific problem. Compare this score with baseline models or other approaches to determine if the performance is acceptable.

#### Next Steps:
1. **Feature Engineering:** Improve feature engineering to provide better input to the model.
2. **Model Complexity:** Experiment with other model types or more complex models if appropriate.
3. **Hyperparameter Tuning:** Further tune hyperparameters or consider more sophisticated techniques.
4. **Cross-Validation:** Ensure that the model's performance is consistent across different subsets of data by using cross-validation.

If you have specific goals or benchmarks for your model's performance, comparing the R² score with those benchmarks will help determine if further work is needed.