# 3. Boosting methods

# Boosting Algorithms
- Boosting ensemble algorithms creates a sequence of models that attempt to correct the mistakes of the models before them in the sequence.
- Once created, the models make predictions which may be weighted by their demonstrated accuracy and the results are combined to create a final output prediction.
- Boosting is a form of sequential learning technique. 
- The algorithm works by training a model with the entire training set, and subsequent models are constructed by fitting the residual error values of the initial model.
- In this way, Boosting attempts to give higher weight to those observations that were poorly estimated by the previous model.
- Once the sequence of the models are created the predictions made by models are weighted by their accuracy scores and the results are combined to create a final estimation.
- Models that are typically used in Boosting technique are *GBM (Gradient Boosting Machine)*, *ADABoost (Adaptive Boosting)*, etc.

## This is the Pima Indians onset of Diabetes dataset.
## It is a binary classification problem where all of the input variables are numeric and have differing scales.

# 3.1. AdaBoost

- Adaptive boosting or AdaBoost is one of the simplest boosting algorithms.
- Usually, decision trees are used for modelling. Multiple sequential models are created, each correcting the errors from the last model. 
- AdaBoost assigns weights to the observations which are incorrectly predicted and the subsequent model works to predict these values correctly.

- The algorithm below describes the most widely used form of boosting algorithm called **AdaBoost**, which stands for **adaptive boosting.**

In [2]:
from IPython.display import Image
Image('C:/Users/ROHAN/practice_notes/adaalgo.JPG')

<IPython.core.display.Image object>

- We see that the first base classifier y1(x) is trained using weighting coefficients that are all equal.
- In subsequent boosting rounds, the weighting coefficients are increased for data points that are misclassified and decreased for data points that are correctly classified.
- The quantity epsilon represents a weighted error rate of each of the base classifiers. 
- Therefore, the weighting coefficients alpha give greater weight to the more accurate classifiers.
- Each base learner consists of a decision tree with depth 1, thus classifying the data based on a feature threshold that partitions the space into two regions separated by a linear decision surface that is parallel to one of the axes.

## Below are the steps for performing the AdaBoost algorithm:

- Initially, all observations in the dataset are given equal weights.
- A model is built on a subset of data.
- Using this model, predictions are made on the whole dataset.
- Errors are calculated by comparing the predictions and actual values.
- While creating the next model, higher weights are given to the data points which were predicted incorrectly.
- Weights can be determined using the error value. For instance, higher the error more is the weight assigned to the observation.
- This process is repeated until the error function does not change, or the maximum limit of the number of estimators is reached.

In [7]:
# AdaBoost Classification
import pandas as pd 
from sklearn import model_selection
from sklearn.ensemble import AdaBoostClassifier


In [4]:
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']


In [8]:
dataframe = pd.read_csv(url, names=names)


In [9]:
array = dataframe.values


In [10]:
X = array[:,0:8]
Y = array[:,8]


In [11]:
seed = 7
num_trees = 30


In [12]:
kfold = model_selection.KFold(n_splits=10, random_state=seed)


In [13]:
model = AdaBoostClassifier(n_estimators=num_trees, random_state=seed)


In [14]:
results = model_selection.cross_val_score(model, X, Y, cv=kfold)


In [15]:
print(results.mean())

0.76045796309


# 3.2. Gradient Boosting

- Gradient Boosting builds the model in a sequential way.
-  It can be used for both regression and classification problems. 
- Gradient Boosting is a generalization of boosting to arbitrary differentiable loss functions. 
- GBM is an accurate and effective off-the-shelf procedure that can be used for both regression and classification problems.
- It is also a technique that is proving to be perhaps of the the best techniques available for improving performance via ensembles
- Gradient Tree Boosting models are used in a variety of areas including Web search ranking and ecology.
- Gradient Boosting or GBM is ensemble machine learning algorithm that works for both regression and classification problems.
- GBM uses the boosting technique, combining a number of weak learners to form a strong learner. 
- Regression trees used as a base learner, each subsequent tree in series is built on the errors calculated by the previous tree.

In [19]:
from sklearn.ensemble import GradientBoostingClassifier

In [20]:
seed = 7


In [21]:
num_trees = 100


In [22]:
kfold = model_selection.KFold(n_splits=10, random_state=seed)


In [23]:
model = GradientBoostingClassifier(n_estimators=num_trees, random_state=seed)


In [24]:
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.766900205058


# The advantages of GBM are:

- Natural handling of data of mixed type (= heterogeneous features)
- Predictive power
- Robustness to outliers in output space (via robust loss functions)

# The disadvantages of GBM are:

- Scalability, due to the sequential nature of boosting it can hardly be parallelized.

# Data Set Name: credit.csv ,Using the dataset, perform 
1. Decision Tree Classification Algorithm (Restricting the depth of the tree to 5)
2. Using entropy and Gini Index Method
3. Perform all the evaluation parameters

# Data Set Name: credit.csv ,Using the dataset, perform 
1. Decision Tree Classification Algorithm (Restricting the depth of the tree to 5)
2. Using entropy and Gini Index Method
3. Perform all the evaluation parameters