# LAB | Ensemble Methods

**Load the data**

In this challenge, we will be working with the same Spaceship Titanic data, like the previous Lab. The data can be found here:

https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv

Metadata

https://github.com/data-bootcamp-v4/data/blob/main/spaceship_titanic.md

In this Lab, you should try different ensemble methods in order to see if can obtain a better model than before. In order to do a fair comparison, you should perform the same feature scaling, engineering applied in previous Lab.

In [None]:
#Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

from sklearn.datasets import  fetch_california_housing

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.tree import DecisionTreeRegressor, DecisionTreeClassifier
from sklearn.ensemble import BaggingRegressor, BaggingClassifier, RandomForestRegressor,AdaBoostRegressor, GradientBoostingRegressor, RandomForestClassifier,GradientBoostingClassifier, AdaBoostClassifier


from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

In [None]:
spaceship = pd.read_csv("https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv")
spaceship.head()

Now perform the same as before:
- Feature Scaling
- Feature Selection


In [None]:
#your code here


**Perform Train Test Split**

In [None]:
#your code here

**Model Selection** - now you will try to apply different ensemble methods in order to get a better model

- Bagging and Pasting

In [None]:
#your code here

- Random Forests

In [None]:
#your code here

- Gradient Boosting

In [None]:
#your code here

- Adaptive Boosting

In [None]:
#your code here

Which model is the best and why?

In [None]:
#comment here

spaceship.dropna(inplace=True)

In [None]:
from sklearn.metrics import accuracy_score  # Make sure to import accuracy_score


# Assuming the DataFrame is called spaceship
# Load your dataset here (this is just an example)
spaceship = pd.DataFrame({
    'PassengerId': ['0001_01', '0002_01', '0003_01', '0003_02', '0004_01'],
    'HomePlanet': ['Europa', 'Earth', 'Europa', 'Europa', 'Earth'],
    'CryoSleep': [False, False, False, False, False],
    'Cabin': ['B/0/P', 'F/0/S', 'A/0/S', 'A/0/S', 'F/1/S'],
    'Destination': ['TRAPPIST-1e']*5,
    'Age': [39.0, 24.0, 58.0, 33.0, 16.0],
    'VIP': [False, False, True, False, False],
    'RoomService': [0.0, 109.0, 43.0, 0.0, 303.0],
    'FoodCourt': [0.0, 9.0, 3576.0, 1283.0, 70.0],
    'ShoppingMall': [0.0, 25.0, 0.0, 371.0, 151.0],
    'Spa': [0.0, 549.0, 6715.0, 3329.0, 565.0],
    'VRDeck': [0.0, 44.0, 49.0, 193.0, 2.0],
    'Name': ['Maham Ofracculy', 'Juanna Vines', 'Altark Susent', 'Solam Susent', 'Willy Santantines'],
    'Transported': [False, True, False, False, True]  # Target variable
})

# Step 1: Preprocessing
# Encoding categorical variables (CryoSleep, VIP, and Transported)
spaceship['CryoSleep'] = spaceship['CryoSleep'].astype(int)
spaceship['VIP'] = spaceship['VIP'].astype(int)
spaceship['Transported'] = spaceship['Transported'].astype(int)

# Drop non-numeric columns
X = spaceship.drop(['PassengerId', 'HomePlanet', 'Cabin', 'Destination', 'Name', 'Transported'], axis=1)
y = spaceship['Transported']

# Step 2: Feature Scaling (Standardization)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 3: Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Step 4: Create models

# 4.1 Bagging (Using Decision Trees as base learners)
bagging_model = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=100, random_state=42)

# 4.2 Pasting (Similar to Bagging, but with replacement turned off)
pasting_model = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=100, max_samples=0.8, random_state=42)

# 4.3 Random Forest (An ensemble of Decision Trees using bagging)
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)

# 4.4 Gradient Boosting (Boosting model)
gb_model = GradientBoostingClassifier(n_estimators=100, random_state=42)

# 4.5 AdaBoost (Adaptive Boosting)
ada_model = AdaBoostClassifier(estimator=DecisionTreeClassifier(max_depth=1), n_estimators=100, random_state=42)

# Step 5: Train models and evaluate them

# Fit and evaluate Bagging Model
bagging_model.fit(X_train, y_train)
y_pred_bagging = bagging_model.predict(X_test)
accuracy_bagging = accuracy_score(y_test, y_pred_bagging)

# Fit and evaluate Pasting Model
pasting_model.fit(X_train, y_train)
y_pred_pasting = pasting_model.predict(X_test)
accuracy_pasting = accuracy_score(y_test, y_pred_pasting)

# Fit and evaluate Random Forest Model
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)
accuracy_rf = accuracy_score(y_test, y_pred_rf)

# Fit and evaluate Gradient Boosting Model
gb_model.fit(X_train, y_train)
y_pred_gb = gb_model.predict(X_test)
accuracy_gb = accuracy_score(y_test, y_pred_gb)

# Fit and evaluate AdaBoost Model
ada_model.fit(X_train, y_train)
y_pred_ada = ada_model.predict(X_test)
accuracy_ada = accuracy_score(y_test, y_pred_ada)

# Step 6: Compare the accuracy of each model
print(f"Accuracy of Bagging: {accuracy_bagging:.4f}")
print(f"Accuracy of Pasting: {accuracy_pasting:.4f}")
print(f"Accuracy of Random Forest: {accuracy_rf:.4f}")
print(f"Accuracy of Gradient Boosting: {accuracy_gb:.4f}")
print(f"Accuracy of AdaBoost: {accuracy_ada:.4f}")

# Step 7: Determine the best model
best_model_name = ""
best_accuracy = 0

if accuracy_bagging > best_accuracy:
    best_accuracy = accuracy_bagging
    best_model_name = "Bagging"
    
if accuracy_pasting > best_accuracy:
    best_accuracy = accuracy_pasting
    best_model_name = "Pasting"
    
if accuracy_rf > best_accuracy:
    best_accuracy = accuracy_rf
    best_model_name = "Random Forest"
    
if accuracy_gb > best_accuracy:
    best_accuracy = accuracy_gb
    best_model_name = "Gradient Boosting"
    
if accuracy_ada > best_accuracy:
    best_accuracy = accuracy_ada
    best_model_name = "AdaBoost"

print(f"The best model is {best_model_name} with an accuracy of {best_accuracy:.4f}")