# The goal of ensemble methods is to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator.

### Two families of ensemble methods are usually distinguished:

- In **averaging methods**, the driving principle is to build several estimators independently and then to average their predictions. On average, the combined estimator is usually better than any of the single base estimator because its variance is reduced.

  - Examples: Bagging methods, Forests of randomized trees,etc

- By contrast, in **boosting methods**, base estimators are built sequentially and one tries to reduce the bias of the combined estimator. The motivation is to combine several weak models to produce a powerful ensemble.

  - Examples: AdaBoost, Gradient Tree Boosting,etc

# 2. Bagging/Average methods

- In ensemble algorithms, bagging methods form a class of algorithms which build several instances of a black-box estimator on random subsets of the original training set and then aggregate their individual predictions to form a final prediction.
- These methods are used as a way to reduce the variance of a base estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.
- In many cases, bagging methods constitute a very simple way to improve with respect to a single model, without making it necessary to adapt the underlying base algorithm.
- As they provide a way to reduce overfitting, bagging methods work best with strong and complex models (e.g., fully developed decision trees), in contrast with boosting methods which usually work best with weak models (e.g., shallow decision trees).

- Bagging methods come in many flavours but mostly differ from each other by the way they draw random subsets of the training set:
  - When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as **Pasting**.
  - When samples are drawn with replacement, then the method is known as **Bagging.**
  - When random subsets of the dataset are drawn as random subsets of the features, then the method is known as **Random Subspaces.**
  - Finally, when base estimators are built on subsets of both samples and features, then the method is known as **Random Patches.**

# Bagging algorithms:

- Bagging meta-estimator
- Random forest

## Averaging
- Similar to the max voting technique, multiple predictions are made for each data point in averaging. 
- In this method, we take an average of predictions from all the models and use it to make the final prediction.
- Averaging can be used for making predictions in regression problems or while calculating probabilities for classification problems.

## This is the Pima Indians onset of Diabetes dataset.
## It is a binary classification problem where all of the input variables are numeric and have differing scales.

# 2.1 Bagged Decision Trees
- Bagging performs best with algorithms that have high variance. A popular example are decision trees, often constructed without pruning.
- Example of using the BaggingClassifier with the Classification and Regression Trees algorithm (DecisionTreeClassifier). 

In [1]:
#importing important packages
import pandas as pd
import numpy as np

In [2]:
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"

In [3]:
# Bagged Decision Trees for Classification
from sklearn import model_selection
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

In [4]:
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

In [5]:
df = pd.read_csv(url, names=names)

In [6]:
df.head()

Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age,class
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [7]:
array = df.values

In [8]:
X = array[:,0:8]
Y = array[:,8]

In [15]:
# Spliting the dataset into train and test 
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, Y, test_size = 0.3, random_state = 100) 
      

In [16]:
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)

In [17]:
cart = DecisionTreeClassifier()

In [18]:
num_trees = 100

In [19]:
model = BaggingClassifier(base_estimator=cart, n_estimators=num_trees, random_state=seed)

In [20]:
model.fit(X_train, y_train)

BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=100, n_jobs=1, oob_score=False,
         random_state=7, verbose=0, warm_start=False)

In [23]:
y_pred = model.predict(X_test) 
print("Predicted values:") 
print(y_pred)

Predicted values:
[ 0.  0.  1.  0.  0.  1.  1.  0.  1.  0.  0.  1.  1.  0.  1.  0.  0.  0.
  1.  0.  0.  0.  0.  1.  0.  1.  1.  1.  0.  0.  0.  1.  0.  0.  0.  0.
  1.  0.  0.  0.  0.  1.  1.  0.  0.  0.  0.  0.  1.  1.  0.  0.  0.  1.
  1.  1.  0.  0.  1.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  1.  1.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  1.
  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  1.  0.  1.  0.  0.  0.  1.  1.
  0.  1.  0.  0.  1.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  1.  1.  0.  0.  1.  0.  0.  0.  1.  1.  1.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  1.  0.  0.  0.
  0.  0.  1.  0.  1.  0.  0.  0.  0.  1.  0.  0.  1.  0.  1.  0.  1.  0.
  1.  1.  0.  0.  1.  0.  0.  1.  0.  0.  1.  0.  1.  1.  0.  0.  0.  0.
  1.  1.  1.  1.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.
  0.  1.  1.  0.  1.  1.  0.  0.  0.  0.  1.  0.  1.  1.  0.]


In [27]:
from sklearn.metrics import accuracy_score 
from sklearn.metrics import classification_report 
from sklearn.metrics import confusion_matrix 

In [28]:
print(confusion_matrix(y_test, y_pred))

[[124  26]
 [ 35  46]]


In [29]:
accuracy_score(y_test,y_pred)

0.73593073593073588

In [22]:
model.score(X_test,y_test)

0.73593073593073588

In [30]:
results = model_selection.cross_val_score(model, X_test, y_test, cv=kfold)
print(results.mean())

0.757789855072


# 2.2 Random Forest

- Random forest is an extension of bagged decision trees.

- Samples of the training dataset are taken with replacement, but the trees are constructed in a way that reduces the correlation between individual classifiers.
- Specifically, rather than greedily choosing the best split point in the construction of the tree, only a random subset of features are considered for each split.

- You can construct a Random Forest model for classification using the RandomForestClassifier class.

In [44]:
from sklearn.ensemble import RandomForestClassifier

In [45]:
max_features = 3


In [46]:
kfold = model_selection.KFold(n_splits=10, random_state=seed)


In [47]:
model = RandomForestClassifier(n_estimators=num_trees, max_features=max_features)


In [48]:
model.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features=3, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

In [49]:
y_pred = model.predict(X_test) 
print("Predicted values:") 
print(y_pred)

Predicted values:
[ 0.  0.  1.  0.  1.  1.  1.  0.  1.  0.  0.  1.  1.  0.  1.  0.  0.  0.
  1.  0.  0.  0.  0.  1.  0.  1.  1.  0.  0.  0.  0.  1.  0.  0.  0.  0.
  1.  0.  0.  0.  0.  1.  1.  0.  0.  1.  0.  0.  1.  1.  0.  0.  0.  1.
  1.  1.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  1.  0.  0.  0.  1.  0.  0.  0.  1.  0.  1.  0.  0.
  0.  0.  0.  0.  0.  1.  0.  1.  0.  0.  1.  0.  1.  0.  0.  0.  1.  1.
  0.  1.  0.  0.  1.  0.  0.  0.  1.  0.  0.  0.  0.  1.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  1.  0.  0.  1.  0.  0.  0.  1.  1.  1.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  1.  0.  0.  0.
  0.  0.  0.  0.  1.  0.  0.  0.  0.  1.  0.  0.  1.  0.  1.  0.  1.  0.
  1.  1.  0.  0.  1.  0.  0.  1.  1.  0.  1.  0.  1.  1.  0.  0.  0.  0.
  1.  1.  1.  1.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.
  0.  1.  1.  0.  1.  1.  0.  0.  0.  0.  1.  0.  1.  1.  0.]


In [50]:
print(confusion_matrix(y_test, y_pred))

[[123  27]
 [ 35  46]]


In [51]:
accuracy_score(y_test,y_pred)

0.73160173160173159

In [52]:
model.score(X_test,y_test)

0.73160173160173159

In [53]:
results = model_selection.cross_val_score(model, X_test, y_test, cv=kfold)
print(results.mean())

0.753079710145


# In class lab WAP : Use Decision Tree Classification Algorithm
    Data Set Name: credit.csv ,Using the dataset, perform 
1. Random forest
2. Bagging

# Take home assignment***

    Data Set Name: Heart.csv ,Using the dataset, perform 
1. Random forest
2. Bagging