# LAB | Hyperparameter Tuning

**Load the data**

Finally step in order to maximize the performance on your Spaceship Titanic model.

The data can be found here:

https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv

Metadata

https://github.com/data-bootcamp-v4/data/blob/main/spaceship_titanic.md

So far we've been training and evaluating models with default values for hyperparameters.

Today we will perform the same feature engineering as before, and then compare the best working models you got so far, but now fine tuning it's hyperparameters.

In [5]:
#Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report, confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import MinMaxScaler, StandardScaler

In [6]:
spaceship = pd.read_csv("https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv")
spaceship.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True


Now perform the same as before:
- Feature Scaling
- Feature Selection


In [7]:
#your code here
spaceship_cl=spaceship.dropna()
spaceship_cl.loc[:,'Cabin']= spaceship_cl['Cabin'].str[0].str.upper()
spaceship_cl=spaceship_cl.drop(columns = ["PassengerId","Name"])
spaceship_cl2 = pd.get_dummies(spaceship_cl, drop_first=False)
features2=spaceship_cl2.drop(columns = ["Transported"])
target2=spaceship_cl2["Transported"]

In [8]:
X_train2, X_test2, y_train2, y_test2 = train_test_split(features2, target2, test_size = 0.20, random_state=0)
normalizer = MinMaxScaler()
normalizer.fit(X_train2)
X_train_norm = normalizer.transform(X_train2)
X_test_norm = normalizer.transform(X_test2)
X_train_norm = pd.DataFrame(X_train_norm, columns=X_train2.columns, index=X_train2.index )
X_test_norm = pd.DataFrame(X_test_norm, columns=X_test2.columns, index=X_test2.index)

- Now let's use the best model we got so far in order to see how it can improve when we fine tune it's hyperparameters.

In [9]:
#your code here
KNN= KNeighborsClassifier()
KNN.fit(X_train_norm, y_train2)


- Evaluate your model

In [11]:
#your code here
y_pred_test_KNN = KNN.predict(X_test_norm)

# Step 1: Evaluate the model using accuracy
accuracy = accuracy_score(y_test2, y_pred_test_KNN)
print(f"Accuracy: {accuracy:.2f}")

# Step 2: Evaluate the model using precision, recall, and F1-score
# Adjust the average parameter according to your classification needs (e.g., 'micro', 'macro', 'weighted')
precision = precision_score(y_test2, y_pred_test_KNN, average='weighted')
recall = recall_score(y_test2, y_pred_test_KNN, average='weighted')
f1 = f1_score(y_test2, y_pred_test_KNN, average='weighted')

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-Score: {f1:.2f}")

# Step 3: Generate a classification report
print("\nClassification Report:")
print(classification_report(y_test2, y_pred_test_KNN))

# Step 4: Generate a confusion matrix
print("\nConfusion Matrix:")
print(confusion_matrix(y_test2, y_pred_test_KNN))



Accuracy: 0.76
Precision: 0.76
Recall: 0.76
F1-Score: 0.76

Classification Report:
              precision    recall  f1-score   support

       False       0.77      0.76      0.76       661
        True       0.76      0.77      0.76       661

    accuracy                           0.76      1322
   macro avg       0.76      0.76      0.76      1322
weighted avg       0.76      0.76      0.76      1322


Confusion Matrix:
[[500 161]
 [153 508]]


**Grid/Random Search**

For this lab we will use Grid Search.

- Define hyperparameters to fine tune.

In [12]:
#your code here
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier

param_grid = {
    'n_neighbors': [3, 5, 7, 9, 11],  # Number of neighbors to use
    'weights': ['uniform', 'distance'],  # Weight function used in prediction
    'metric': ['euclidean', 'manhattan', 'minkowski']  # Distance metric to use
}
knn = KNeighborsClassifier()
grid_search = GridSearchCV(estimator=knn, param_grid=param_grid, scoring='accuracy', cv=10, verbose=10)


- Run Grid Search

In [13]:
grid_search.fit(X_train_norm, y_train2)
best_params = grid_search.best_params_
best_score = grid_search.best_score_

print(f"Best Parameters: {best_params}")
print(f"Best Cross-Validation Accuracy: {best_score:.2f}")

Fitting 10 folds for each of 30 candidates, totalling 300 fits
[CV 1/10; 1/30] START metric=euclidean, n_neighbors=3, weights=uniform..........
[CV 1/10; 1/30] END metric=euclidean, n_neighbors=3, weights=uniform;, score=0.726 total time=   0.0s
[CV 2/10; 1/30] START metric=euclidean, n_neighbors=3, weights=uniform..........
[CV 2/10; 1/30] END metric=euclidean, n_neighbors=3, weights=uniform;, score=0.783 total time=   0.0s
[CV 3/10; 1/30] START metric=euclidean, n_neighbors=3, weights=uniform..........
[CV 3/10; 1/30] END metric=euclidean, n_neighbors=3, weights=uniform;, score=0.750 total time=   0.0s
[CV 4/10; 1/30] START metric=euclidean, n_neighbors=3, weights=uniform..........
[CV 4/10; 1/30] END metric=euclidean, n_neighbors=3, weights=uniform;, score=0.752 total time=   0.0s
[CV 5/10; 1/30] START metric=euclidean, n_neighbors=3, weights=uniform..........
[CV 5/10; 1/30] END metric=euclidean, n_neighbors=3, weights=uniform;, score=0.748 total time=   0.0s
[CV 6/10; 1/30] START 

- Evaluate your model

In [15]:
#Best model retains the default value of weights as uniform but increase n_neighbors from 5 to 9 and considers the manhattan distance, better for outliners
KNN_best= KNeighborsClassifier(metric='manhattan', n_neighbors=9, weights='uniform')
KNN_best.fit(X_train_norm, y_train2)
y_pred_test_KNN_best = KNN_best.predict(X_test_norm)

# Step 1: Evaluate the model using accuracy
accuracy = accuracy_score(y_test2, y_pred_test_KNN_best)
print(f"Accuracy: {accuracy:.2f}")

# Step 2: Evaluate the model using precision, recall, and F1-score
# Adjust the average parameter according to your classification needs (e.g., 'micro', 'macro', 'weighted')
precision = precision_score(y_test2, y_pred_test_KNN_best, average='weighted')
recall = recall_score(y_test2, y_pred_test_KNN_best, average='weighted')
f1 = f1_score(y_test2, y_pred_test_KNN_best, average='weighted')

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-Score: {f1:.2f}")

# Step 3: Generate a classification report
print("\nClassification Report:")
print(classification_report(y_test2, y_pred_test_KNN_best))

# Step 4: Generate a confusion matrix
print("\nConfusion Matrix:")
print(confusion_matrix(y_test2, y_pred_test_KNN_best))

Accuracy: 0.77
Precision: 0.77
Recall: 0.77
F1-Score: 0.77

Classification Report:
              precision    recall  f1-score   support

       False       0.77      0.77      0.77       661
        True       0.77      0.77      0.77       661

    accuracy                           0.77      1322
   macro avg       0.77      0.77      0.77      1322
weighted avg       0.77      0.77      0.77      1322


Confusion Matrix:
[[509 152]
 [155 506]]
