# LAB | Ensemble Methods

**Load the data**

In this challenge, we will be working with the same Spaceship Titanic data, like the previous Lab. The data can be found here:

https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv

Metadata

https://github.com/data-bootcamp-v4/data/blob/main/spaceship_titanic.md

In this Lab, you should try different ensemble methods in order to see if can obtain a better model than before. In order to do a fair comparison, you should perform the same feature scaling, engineering applied in previous Lab.

In [1]:
#Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
pd.set_option('display.max_columns', None)

In [2]:
spaceship = pd.read_csv("https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv")
spaceship.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True


In [3]:
# Check null values
spaceship.isnull().sum()

PassengerId       0
HomePlanet      201
CryoSleep       217
Cabin           199
Destination     182
Age             179
VIP             203
RoomService     181
FoodCourt       183
ShoppingMall    208
Spa             183
VRDeck          188
Name            200
Transported       0
dtype: int64

In [4]:
# Drop null values
spaceship_clean = spaceship.dropna()

In [5]:
# Re-check null values
spaceship_clean.isnull().sum()

PassengerId     0
HomePlanet      0
CryoSleep       0
Cabin           0
Destination     0
Age             0
VIP             0
RoomService     0
FoodCourt       0
ShoppingMall    0
Spa             0
VRDeck          0
Name            0
Transported     0
dtype: int64

In [6]:
# Check duplicated values
spaceship_clean.duplicated().sum()

0

In [7]:
# Drop unnecesesary columns
spaceship_clean = spaceship_clean.drop(columns=['PassengerId', 'Name'])

In [8]:
spaceship_clean

Unnamed: 0,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Transported
0,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,False
1,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,True
2,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,False
3,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,False
4,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,True
...,...,...,...,...,...,...,...,...,...,...,...,...
8688,Europa,False,A/98/P,55 Cancri e,41.0,True,0.0,6819.0,0.0,1643.0,74.0,False
8689,Earth,True,G/1499/S,PSO J318.5-22,18.0,False,0.0,0.0,0.0,0.0,0.0,False
8690,Earth,False,G/1500/S,TRAPPIST-1e,26.0,False,0.0,0.0,1872.0,1.0,0.0,True
8691,Europa,False,E/608/S,55 Cancri e,32.0,False,0.0,1049.0,0.0,353.0,3235.0,False


In [9]:
spaceship_clean.dtypes

HomePlanet       object
CryoSleep        object
Cabin            object
Destination      object
Age             float64
VIP              object
RoomService     float64
FoodCourt       float64
ShoppingMall    float64
Spa             float64
VRDeck          float64
Transported        bool
dtype: object

In [10]:
# Convert object columns to boolean column (CryoSleep, VIP)
spaceship_clean[['CryoSleep', 'VIP']] = spaceship_clean[['CryoSleep', 'VIP']].astype(bool)

In [11]:
spaceship_clean.dtypes

HomePlanet       object
CryoSleep          bool
Cabin            object
Destination      object
Age             float64
VIP                bool
RoomService     float64
FoodCourt       float64
ShoppingMall    float64
Spa             float64
VRDeck          float64
Transported        bool
dtype: object

In [12]:
# Convert first Boolean columns to numeric (CryoSleep, VIP, Transported)
bool_cols = spaceship_clean.select_dtypes(include='bool').columns

# Convert the boolean columns to integer explicitly and assign back
spaceship_clean[bool_cols] = spaceship_clean[bool_cols].apply(lambda x: x.astype(int))

In [13]:
spaceship_clean

Unnamed: 0,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Transported
0,Europa,0,B/0/P,TRAPPIST-1e,39.0,0,0.0,0.0,0.0,0.0,0.0,0
1,Earth,0,F/0/S,TRAPPIST-1e,24.0,0,109.0,9.0,25.0,549.0,44.0,1
2,Europa,0,A/0/S,TRAPPIST-1e,58.0,1,43.0,3576.0,0.0,6715.0,49.0,0
3,Europa,0,A/0/S,TRAPPIST-1e,33.0,0,0.0,1283.0,371.0,3329.0,193.0,0
4,Earth,0,F/1/S,TRAPPIST-1e,16.0,0,303.0,70.0,151.0,565.0,2.0,1
...,...,...,...,...,...,...,...,...,...,...,...,...
8688,Europa,0,A/98/P,55 Cancri e,41.0,1,0.0,6819.0,0.0,1643.0,74.0,0
8689,Earth,1,G/1499/S,PSO J318.5-22,18.0,0,0.0,0.0,0.0,0.0,0.0,0
8690,Earth,0,G/1500/S,TRAPPIST-1e,26.0,0,0.0,0.0,1872.0,1.0,0.0,1
8691,Europa,0,E/608/S,55 Cancri e,32.0,0,0.0,1049.0,0.0,353.0,3235.0,0


In [14]:
# Select first letter in Cabin
spaceship_clean['Cabin'] = spaceship_clean['Cabin'].str[0]

In [15]:
# Use one-hot encoding for Homeplanet and Destination columns
spaceship_clean = pd.get_dummies(spaceship_clean, columns=['HomePlanet', 'Destination', 'Cabin'])

In [16]:
# Convert one-hot-encoding Boolean columns to numeric (HomePlanet, Destination)
bool_cols = spaceship_clean.select_dtypes(include='bool').columns

# Convert the boolean columns to integer explicitly and assign back
spaceship_clean[bool_cols] = spaceship_clean[bool_cols].apply(lambda x: x.astype(int))

In [17]:
spaceship_clean

Unnamed: 0,CryoSleep,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Transported,HomePlanet_Earth,HomePlanet_Europa,HomePlanet_Mars,Destination_55 Cancri e,Destination_PSO J318.5-22,Destination_TRAPPIST-1e,Cabin_A,Cabin_B,Cabin_C,Cabin_D,Cabin_E,Cabin_F,Cabin_G,Cabin_T
0,0,39.0,0,0.0,0.0,0.0,0.0,0.0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0
1,0,24.0,0,109.0,9.0,25.0,549.0,44.0,1,1,0,0,0,0,1,0,0,0,0,0,1,0,0
2,0,58.0,1,43.0,3576.0,0.0,6715.0,49.0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0
3,0,33.0,0,0.0,1283.0,371.0,3329.0,193.0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0
4,0,16.0,0,303.0,70.0,151.0,565.0,2.0,1,1,0,0,0,0,1,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8688,0,41.0,1,0.0,6819.0,0.0,1643.0,74.0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0
8689,1,18.0,0,0.0,0.0,0.0,0.0,0.0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0
8690,0,26.0,0,0.0,0.0,1872.0,1.0,0.0,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0
8691,0,32.0,0,0.0,1049.0,0.0,353.0,3235.0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0


Now perform the same as before:
- Feature Scaling
- Feature Selection


**Feature Standardization & Scaling**

In [18]:
# Feature Standardization
from sklearn.preprocessing import StandardScaler

# Seleccionar las columnas numéricas para escalar
num_cols = spaceship_clean.select_dtypes(include=['int64', 'float64']).columns

# Inicializar el escalador
scaler = StandardScaler()

# Escalar las columnas numéricas
spaceship_clean_standard = spaceship_clean.copy()
spaceship_clean_standard[num_cols] = scaler.fit_transform(spaceship_clean_standard[num_cols])

In [19]:
spaceship_clean_standard

Unnamed: 0,CryoSleep,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Transported,HomePlanet_Earth,HomePlanet_Europa,HomePlanet_Mars,Destination_55 Cancri e,Destination_PSO J318.5-22,Destination_TRAPPIST-1e,Cabin_A,Cabin_B,Cabin_C,Cabin_D,Cabin_E,Cabin_F,Cabin_G,Cabin_T
0,-0.738664,0.695413,-0.158555,-0.345756,-0.285355,-0.309494,-0.273759,-0.269534,-1.007293,-1.083063,1.717147,-0.510811,-0.520220,-0.322689,0.666047,-0.179858,3.085305,-0.312289,-0.244975,-0.339578,-0.695098,-0.652578,-0.017402
1,-0.738664,-0.336769,-0.158555,-0.176748,-0.279993,-0.266112,0.206165,-0.230494,0.992760,0.923307,-0.582361,-0.510811,-0.520220,-0.322689,0.666047,-0.179858,-0.324117,-0.312289,-0.244975,-0.339578,1.438646,-0.652578,-0.017402
2,-0.738664,2.002842,6.306963,-0.279083,1.845163,-0.309494,5.596357,-0.226058,-1.007293,-1.083063,1.717147,-0.510811,-0.520220,-0.322689,0.666047,5.559950,-0.324117,-0.312289,-0.244975,-0.339578,-0.695098,-0.652578,-0.017402
3,-0.738664,0.282540,-0.158555,-0.345756,0.479034,0.334285,2.636384,-0.098291,-1.007293,-1.083063,1.717147,-0.510811,-0.520220,-0.322689,0.666047,5.559950,-0.324117,-0.312289,-0.244975,-0.339578,-0.695098,-0.652578,-0.017402
4,-0.738664,-0.887266,-0.158555,0.124056,-0.243650,-0.047470,0.220152,-0.267759,0.992760,0.923307,-0.582361,-0.510811,-0.520220,-0.322689,0.666047,-0.179858,-0.324117,-0.312289,-0.244975,-0.339578,1.438646,-0.652578,-0.017402
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8688,-0.738664,0.833037,6.306963,-0.345756,3.777285,-0.309494,1.162518,-0.203876,-1.007293,-1.083063,1.717147,-0.510811,1.922263,-0.322689,-1.501395,5.559950,-0.324117,-0.312289,-0.244975,-0.339578,-0.695098,-0.652578,-0.017402
8689,1.353795,-0.749641,-0.158555,-0.345756,-0.285355,-0.309494,-0.273759,-0.269534,-1.007293,0.923307,-0.582361,-0.510811,-0.520220,3.098956,-1.501395,-0.179858,-0.324117,-0.312289,-0.244975,-0.339578,-0.695098,1.532384,-0.017402
8690,-0.738664,-0.199145,-0.158555,-0.345756,-0.285355,2.938900,-0.272885,-0.269534,0.992760,0.923307,-0.582361,-0.510811,-0.520220,-0.322689,0.666047,-0.179858,-0.324117,-0.312289,-0.244975,-0.339578,-0.695098,1.532384,-0.017402
8691,-0.738664,0.213728,-0.158555,-0.345756,0.339621,-0.309494,0.034826,2.600774,-1.007293,-1.083063,1.717147,-0.510811,1.922263,-0.322689,-1.501395,-0.179858,-0.324117,-0.312289,-0.244975,2.944832,-0.695098,-0.652578,-0.017402


In [20]:
# Feature Scaling
from sklearn.preprocessing import MinMaxScaler

# Inicializar el escalador MinMax
minmax_scaler = MinMaxScaler()

# Escalar las columnas numéricas
spaceship_clean_scaling = spaceship_clean.copy()
spaceship_clean_scaling[num_cols] = minmax_scaler.fit_transform(spaceship_clean_scaling[num_cols])

In [21]:
spaceship_clean_scaling

Unnamed: 0,CryoSleep,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Transported,HomePlanet_Earth,HomePlanet_Europa,HomePlanet_Mars,Destination_55 Cancri e,Destination_PSO J318.5-22,Destination_TRAPPIST-1e,Cabin_A,Cabin_B,Cabin_C,Cabin_D,Cabin_E,Cabin_F,Cabin_G,Cabin_T
0,0.0,0.493671,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.303797,0.0,0.010988,0.000302,0.002040,0.024500,0.002164,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
2,0.0,0.734177,1.0,0.004335,0.119948,0.000000,0.299670,0.002410,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.417722,0.0,0.000000,0.043035,0.030278,0.148563,0.009491,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.202532,0.0,0.030544,0.002348,0.012324,0.025214,0.000098,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8688,0.0,0.518987,1.0,0.000000,0.228726,0.000000,0.073322,0.003639,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8689,1.0,0.227848,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
8690,0.0,0.329114,0.0,0.000000,0.000000,0.152779,0.000045,0.000000,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
8691,0.0,0.405063,0.0,0.000000,0.035186,0.000000,0.015753,0.159077,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0


**Perform Train Test Split**

In [22]:
# Train test split normal data
features = spaceship_clean.drop(columns='Transported')
target = spaceship_clean['Transported']
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)

In [23]:
# Train test split standard data
features_standard = spaceship_clean_standard.drop(columns='Transported')
target_standard = spaceship_clean_standard['Transported']
X_train_standard, X_test_standard, y_train_standard, y_test_standard = train_test_split(features_standard, target_standard, test_size=0.2, random_state=42)

In [24]:
# Train test split scaling data
features_scaling = spaceship_clean_scaling.drop(columns='Transported')
target_scaling = spaceship_clean_scaling['Transported']
X_train_scaling, X_test_scaling, y_train_scaling, y_test_scaling = train_test_split(features_scaling, target_scaling, test_size=0.2, random_state=42)

**Model Selection** - now you will try to apply different ensemble methods in order to get a better model

#### Bagging and Pasting

In [25]:
# Import modules for Bagging
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score

# Make an instance of the model
bagging = BaggingClassifier(DecisionTreeClassifier(), n_estimators=100, random_state=42)

##### Bagging Classifier Normal Data

In [26]:
# Fit the model for normal data
bagging.fit(X_train, y_train)

In [27]:
# Predict the model for normal data
y_pred = bagging.predict(X_test)

# Evaluate the model for normal data with accuracy, recall, precision, f1-score
accuracy_score = accuracy_score(y_test, y_pred)
recall_score = recall_score(y_test, y_pred)
precision_score = precision_score(y_test, y_pred)
f1_score = f1_score(y_test, y_pred)

# Print scores
print('Accuracy:', accuracy_score)
print('Recall:', recall_score)
print('Precision:', precision_score)
print('F1 Score:', f1_score)

Accuracy: 0.802571860816944
Recall: 0.796711509715994
Precision: 0.8100303951367781
F1 Score: 0.8033157498116051


##### Bagging Classifier Standard Data

In [28]:
# Fit the model for standard data
bagging.fit(X_train_standard, y_train_standard)

ValueError: Unknown label type: continuous. Maybe you are trying to fit a classifier, which expects discrete classes on a regression target with continuous values.

It's not possible to make bagging with standard data because they are continuous values.

##### Bagging Classifier scaling data

In [109]:
# Fit the model for scaling data
bagging.fit(X_train_scaling, y_train_scaling)

In [119]:
# Predict the model for scaling data
y_pred_scaling = bagging.predict(X_test_scaling)

# Evaluate the model for scaling data with accuracy, recall, precision, f1-score
accuracy_score_scaling = accuracy_score(y_test_scaling, y_pred_scaling)
recall_score_scaling = recall_score(y_test_scaling, y_pred_scaling)
precision_score_scaling = precision_score(y_test_scaling, y_pred_scaling)
f1_score_scaling = f1_score(y_test_scaling, y_pred_scaling)

# Print scores
print('Accuracy:', accuracy_score_scaling)
print('Recall:', recall_score_scaling)
print('Precision:', precision_score_scaling)
print('F1 Score:', f1_score_scaling)

TypeError: 'float' object is not callable

#### Random Forests

In [29]:
# Fit the model for random forest
from sklearn.ensemble import RandomForestClassifier

# Make an instance of the model
random_forest = RandomForestClassifier(n_estimators=100, random_state=42)

In [30]:
# Fit the model for normal data
random_forest.fit(X_train, y_train)

In [41]:
# Predict the model for normal data
y_pred = random_forest.predict(X_test)

from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score

# Usar nombres diferentes para las variables
rf_norm_acc = accuracy_score(y_test, y_pred)
rf_norm_rec = recall_score(y_test, y_pred)
rf_norm_prec = precision_score(y_test, y_pred)
rf_norm_f1 = f1_score(y_test, y_pred)

# Imprimir resultados
print('Accuracy:', rf_norm_acc)
print('Recall:', rf_norm_rec)
print('Precision:', rf_norm_prec)
print('F1 Score:', rf_norm_f1)

Accuracy: 0.7291981845688351
Recall: 0.6412556053811659
Precision: 0.7842778793418648
F1 Score: 0.7055921052631579


In [38]:
# Fit the model for scaling data
random_forest.fit(X_train_scaling, y_train_scaling)

In [43]:
# Predict the model for scaling data
y_pred_scaling = random_forest.predict(X_test_scaling)

# Evaluate the model for scaling data with accuracy, recall, precision, f1-score
rf_scal_acc = accuracy_score(y_test_scaling, y_pred_scaling)
rf_scal_rec = recall_score(y_test_scaling, y_pred_scaling)
rf_scal_prec = precision_score(y_test_scaling, y_pred_scaling)
rf_scal_f1 = f1_score(y_test_scaling, y_pred_scaling)

# Print scores
print('Accuracy:', rf_scal_acc)
print('Recall:', rf_scal_rec)
print('Precision:', rf_scal_prec)
print('F1 Score:', rf_scal_f1)


Accuracy: 0.8093797276853253
Recall: 0.7937219730941704
Precision: 0.8232558139534883
F1 Score: 0.8082191780821918


In [44]:
# Fit the model for standard data
random_forest.fit(X_train_standard, y_train_standard)

ValueError: Unknown label type: continuous. Maybe you are trying to fit a classifier, which expects discrete classes on a regression target with continuous values.

#### Gradient Boosting

In [45]:
# Fit the model for gradient boosting
from sklearn.ensemble import GradientBoostingClassifier

# Make an instance of the model
gradient_boosting = GradientBoostingClassifier(n_estimators=100, random_state=42)

In [46]:
# Fit the model for normal data
gradient_boosting.fit(X_train, y_train)

In [47]:
# Predict the model for normal data
y_pred = gradient_boosting.predict(X_test)

# Evaluate the model for normal data with accuracy, recall, precision, f1-score
gb_norm_acc = accuracy_score(y_test, y_pred)
gb_norm_rec = recall_score(y_test, y_pred)
gb_norm_prec = precision_score(y_test, y_pred)
gb_norm_f1 = f1_score(y_test, y_pred)

# Print scores
print('Accuracy:', gb_norm_acc)
print('Recall:', gb_norm_rec)
print('Precision:', gb_norm_prec)
print('F1 Score:', gb_norm_f1)

Accuracy: 0.8101361573373677
Recall: 0.8609865470852018
Precision: 0.784741144414169
F1 Score: 0.8210976478973628


In [48]:
# Fit the model for scaling data
gradient_boosting.fit(X_train_scaling, y_train_scaling)

In [49]:
# Predict the model for scaling data
y_pred_scaling = gradient_boosting.predict(X_test_scaling)

# Evaluate the model for scaling data with accuracy, recall, precision, f1-score
gb_scal_acc = accuracy_score(y_test_scaling, y_pred_scaling)
gb_scal_rec = recall_score(y_test_scaling, y_pred_scaling)
gb_scal_prec = precision_score(y_test_scaling, y_pred_scaling)
gb_scal_f1 = f1_score(y_test_scaling, y_pred_scaling)

# Print scores
print('Accuracy:', gb_scal_acc)
print('Recall:', gb_scal_rec)
print('Precision:', gb_scal_prec)
print('F1 Score:', gb_scal_f1)

Accuracy: 0.8101361573373677
Recall: 0.8609865470852018
Precision: 0.784741144414169
F1 Score: 0.8210976478973628


In [50]:
# Fit the model for standard data
gradient_boosting.fit(X_train_standard, y_train_standard)

ValueError: Unknown label type: continuous. Maybe you are trying to fit a classifier, which expects discrete classes on a regression target with continuous values.

#### Adaptive Boosting

In [51]:
# Fit the model for adaptive boosting
from sklearn.ensemble import AdaBoostClassifier

# Make an instance of the model
adaboost = AdaBoostClassifier(n_estimators=100, random_state=42)

In [52]:
# Fit the model for normal data
adaboost.fit(X_train, y_train)



In [53]:
# Predict the model for normal data
y_pred = adaboost.predict(X_test)

# Evaluate the model for normal data with accuracy, recall, precision, f1-score
ab_norm_acc = accuracy_score(y_test, y_pred)
ab_norm_rec = recall_score(y_test, y_pred)
ab_norm_prec = precision_score(y_test, y_pred)
ab_norm_f1 = f1_score(y_test, y_pred)

# Print scores
print('Accuracy:', ab_norm_acc)
print('Recall:', ab_norm_rec)
print('Precision:', ab_norm_prec)
print('F1 Score:', ab_norm_f1)

Accuracy: 0.7972768532526475
Recall: 0.8325859491778774
Precision: 0.7812061711079944
F1 Score: 0.8060781476121563


In [54]:
# Fit the model for scaling data
adaboost.fit(X_train_scaling, y_train_scaling)



In [55]:
# Predict the model for scaling data
y_pred_scaling = adaboost.predict(X_test_scaling)

# Evaluate the model for scaling data with accuracy, recall, precision, f1-score
ab_scal_acc = accuracy_score(y_test_scaling, y_pred_scaling)
ab_scal_rec = recall_score(y_test_scaling, y_pred_scaling)
ab_scal_prec = precision_score(y_test_scaling, y_pred_scaling)
ab_scal_f1 = f1_score(y_test_scaling, y_pred_scaling)

# Print scores
print('Accuracy:', ab_scal_acc)
print('Recall:', ab_scal_rec)
print('Precision:', ab_scal_prec)
print('F1 Score:', ab_scal_f1)

Accuracy: 0.7972768532526475
Recall: 0.8325859491778774
Precision: 0.7812061711079944
F1 Score: 0.8060781476121563


In [56]:
# Fit the model for standard data
adaboost.fit(X_train_standard, y_train_standard)



ValueError: Unknown label type: continuous. Maybe you are trying to fit a classifier, which expects discrete classes on a regression target with continuous values.

### Which model is the best and why?

In [57]:
# The best F1 Score is for Random Forest with scaling data
# Print the best F1 Score
print('Random Forest with scaling data:', rf_scal_f1)

Random Forest with scaling data: 0.8082191780821918
