# LAB | Ensemble Methods

**Load the data**

In this challenge, we will be working with the same Spaceship Titanic data, like the previous Lab. The data can be found here:

https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv

Metadata

https://github.com/data-bootcamp-v4/data/blob/main/spaceship_titanic.md

In this Lab, you should try different ensemble methods in order to see if can obtain a better model than before. In order to do a fair comparison, you should perform the same feature scaling, engineering applied in previous Lab.

In [1]:
from sklearn.datasets import  fetch_california_housing
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

from sklearn.tree import  DecisionTreeClassifier
from sklearn.ensemble import  BaggingClassifier, RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier

from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.metrics import classification_report

In [None]:
#Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

In [2]:
spaceship = pd.read_csv("https://raw.githubusercontent.com/data-bootcamp-v4/data/main/spaceship_titanic.csv")
spaceship.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True


Now perform the same as before:
- Feature Scaling
- Feature Selection


In [9]:
#your code here
spaceship.dropna(inplace=True)
spaceship['Cabin'] = spaceship['Cabin'].str.split('/').str[0]
spaceship = spaceship.drop(['PassengerId', 'Name'], axis = 1)
spaceship["CryoSleep"] = spaceship["CryoSleep"].astype(int)
spaceship["VIP"] = spaceship["VIP"].astype(int)


In [None]:
df_space_transformed = pd.merge(left=spaceship,
                              right= pd.get_dummies(spaceship[['HomePlanet', 'Cabin', 'Destination']], dtype=int, drop_first=True),
                              left_index=True,
                              right_index=True)
df_space_transformed = df_space_transformed.drop(['HomePlanet', 'Cabin', 'Destination'], axis = 1)
df_space_transformed.dtypes

**Perform Train Test Split**

In [10]:
#your code here
features = df_space_transformed.drop(columns=["Transported"])
target = df_space_transformed["Transported"].astype(int)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.20, random_state=0)

In [None]:
normalizer = MinMaxScaler()
normalizer.fit(X_train)
X_train_norm = normalizer.transform(X_train)
X_test_norm = normalizer.transform(X_test)

In [None]:
X_train_norm = pd.DataFrame(X_train_norm, columns = X_train.columns)
X_test_norm = pd.DataFrame(X_test_norm, columns = X_test.columns)


**Model Selection** - now you will try to apply different ensemble methods in order to get a better model

In [None]:
tree = DecisionTreeClassifier(max_depth=10)
tree.fit(X_train_norm, y_train)

In [None]:
pred = tree.predict(X_test_norm)

print(classification_report(y_test, pred))

In [None]:
tree_importance = {feature : importance for feature, importance in zip(X_train_norm, tree.feature_importances_)}
tree_importance           

In [None]:
features_adjusted = df_space_transformed[['CryoSleep', 'VRDeck', 'RoomService', 'FoodCourt', 'Spa']]

In [None]:
X_train1, X_test1, y_train1, y_test1 = train_test_split(features_adjusted, target, test_size=0.20, random_state=0)

In [None]:
normalizer1 = StandardScaler()

normalizer1.fit(X_train1)
X_train1_norm = normalizer1.transform(X_train1)
X_test1_norm = normalizer1.transform(X_test1)

In [None]:
#full data
lr = LogisticRegression()
lr.fit(X_train_norm, y_train)
pred_lr = lr.predict(X_test)

print(classification_report(y_test, pred_lr))

In [None]:
#selected features
lr = LogisticRegression()
lr.fit(X_train1_norm, y_train1)
pred_lr = lr.predict(X_test1_norm)

print(classification_report(y_test1, pred_lr))

- Bagging and Pasting

In [11]:
#your code here
bagging_cla = BaggingClassifier(DecisionTreeClassifier(max_depth=20),
                               n_estimators=100,
                               max_samples = 1000)

In [None]:
bagging_cla_boot = BaggingClassifier(DecisionTreeClassifier(max_depth=20),
                               n_estimators=100,
                               max_samples = 1000, bootstrap=False)

In [None]:
#without pasting
bagging_cla.fit(X_train_norm, y_train)
pred = bagging_cla.predict(X_test_norm)

print(classification_report(y_test, pred))

In [None]:
#with pasting
bagging_cla_boot.fit(X_train_norm, y_train)
pred_boot = bagging_cla_boot.predict(X_test_norm)

print(classification_report(y_test, pred_boot))

- Random Forests

In [None]:
#your code here

forest = RandomForestClassifier(n_estimators=100,
                             max_depth=20)
forest.fit(X_train_norm, y_train)
pred_forest = forest.predict(X_test_norm)

print(classification_report(y_test, pred_forest))

- Gradient Boosting

In [None]:
#your code here
#your code here
gb_cla = GradientBoostingClassifier(max_depth=20,
                                   n_estimators=100)
gb_cla.fit(X_train_norm, y_train)
pred_gb = gb_cla.predict(X_test_norm)

print(classification_report(y_test, pred_gb))

- Adaptive Boosting

In [None]:
#your code here
#your code here
ada_cla = AdaBoostClassifier(DecisionTreeClassifier(max_depth=20),
                            n_estimators=100)
ada_cla.fit(X_train_norm, y_train)
pred_ada = ada_cla.predict(X_test_norm)

print(classification_report(y_test, pred_ada))

Which model is the best and why?

In [None]:
#comment here