# Ensemble Classfiers
## Bagging
* Bagging performs best with algorithms that have high variance. A popular example are decision trees, often constructed without pruning. 
* Below is an example of using the **BaggingClassifier** with the Classification and Regression Trees algorithm (**DecisionTreeClassifier**). A total of 100 trees are created.

In [None]:
# Bagged Decision Trees for Classification
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
filename = 'pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] 
dataframe = read_csv(filename, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
kfold = KFold(n_splits=10)
cart = DecisionTreeClassifier()
num_trees = 100
model = BaggingClassifier(base_estimator=cart, n_estimators=num_trees) 
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

: 

## Random Forest
* Random Forests is an extension of bagged decision trees. Samples of the training dataset are taken with replacement, but the trees are constructed in a way that reduces the correlation between individual classifiers. 
* Specifically, rather than greedily choosing the best split point in the construction of each tree, only a random subset of features are considered for each split. 
* You can construct a Random Forest model for classification using the **RandomForestClassifier** class. The example below demonstrates using Random Forest for classification with 100 trees and split points chosen from a random selection of 3 features.

In [None]:
# Random Forest Classification
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
filename = 'pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] 
dataframe = read_csv(filename, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
num_trees = 100
max_features = 3
kfold = KFold(n_splits=10)
model = RandomForestClassifier(n_estimators=num_trees, max_features=max_features) 
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

## AdaBoost
* AdaBoost was perhaps the first successful boosting ensemble algorithm. 
* It generally works by weighting instances in the dataset by how easy or difficult they are to classify, allowing the algorithm to pay less attention to them in the construction of subsequent models. 
* You can construct an AdaBoost model for classification using the **AdaBoostClassifier** class. The example below demonstrates the construction of 30 decision trees in sequence using the AdaBoost algorithm.

In [None]:
# AdaBoost Classification
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import AdaBoostClassifier
filename = 'pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] 
dataframe = read_csv(filename, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
num_trees = 30
kfold = KFold(n_splits=10)
model = AdaBoostClassifier(n_estimators=num_trees)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

## XGBoost
* XGBoost is an implementation of gradient boosted decision trees designed for speed and performance that is dominant competitive machine learning. 
* You can construct an XGBoost model for classification using the **XGBClassifier** class. The example below demonstrates the construction of 30 decision trees in sequence using the XGBoost algorithm.

In [None]:
# XGBoost Classification
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from xgboost import XGBClassifier
filename = 'pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] 
dataframe = read_csv(filename, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
num_trees = 10
kfold = KFold(n_splits=10)
model = XGBClassifier(n_estimators=num_trees, use_label_encoder=False)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())