# Boosting
Boosting is an ensemble approach(meaning it involves several trees) that starts from a weaker 
decision and keeps on building the models such that the final prediction is the weighted sum of all 
the weaker decision-makers. The weights are assigned based on the performance of an individual 
tree.

# Iris Species Prediction 
The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository.

It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

The columns in this dataset are:

Id

SepalLengthCm

SepalWidthCm

PetalLengthCm

PetalWidthCm

Species



In [None]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn import datasets
# Import train_test_split function
from sklearn.model_selection import train_test_split
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics

In [None]:
iris = datasets.load_iris()
X = iris.data
y = iris.target

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

In [None]:
ada = AdaBoostClassifier(n_estimators=50,
                         learning_rate=1)
# Train Adaboost Classifer
model = ada.fit(X_train, y_train)

#Predict the response for test dataset
y_pred = model.predict(X_test)

In [None]:
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.9111111111111111


In [None]:
from sklearn.ensemble import GradientBoostingClassifier

In [None]:
metrics.accuracy_score(y_test, y_pred)

0.24444444444444444

In [None]:
clf = GradientBoostingClassifier(n_estimators=50, learning_rate=1.0,
    max_depth=1, random_state=12).fit(X_train, y_train)

y_pred_g = clf.predict(X_test)

In [None]:
clf.score(X_train,y_train)

1.0

In [None]:
clf.score(X_test,y_test)

0.9111111111111111

In [None]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import RandomizedSearchCV

gb = GradientBoostingRegressor()

# Rate at which correcting is being made
learning_rate = [0.001, 0.01, 0.1, 0.2]
# Number of trees in Gradient boosting
n_estimators=list(range(500,1000,100))
# Maximum number of levels in a tree
max_depth=list(range(4,9,4))
# Minimum number of samples required to split an internal node
min_samples_split=list(range(4,9,2))
# Minimum number of samples required to be at a leaf node.
min_samples_leaf=[1,2,5,7]
# Number of fearures to be considered at each split
max_features=['auto','sqrt']

# Hyperparameters dict
param_grid = {"learning_rate":learning_rate,
              "n_estimators":n_estimators,
              "max_depth":max_depth,
              "min_samples_split":min_samples_split,
              "min_samples_leaf":min_samples_leaf,
              "max_features":max_features}

gb_rs = RandomizedSearchCV(estimator = gb, param_distributions = param_grid)

In [None]:
hp=gb_rs.fit(X_train,y_train)

In [None]:
hp.best_params_

{'learning_rate': 0.01,
 'max_depth': 4,
 'max_features': 'sqrt',
 'min_samples_leaf': 7,
 'min_samples_split': 8,
 'n_estimators': 500}

In [None]:
hp.best_score_

0.9155835671146854

In [None]:
clf = GradientBoostingClassifier(n_estimators=500, learning_rate=0.01,
    max_depth=4,max_features= 'sqrt', min_samples_leaf=7,min_samples_split=8,random_state=12).fit(X_train, y_train)


In [None]:
clf.score(X_train,y_train)

1.0

In [None]:
clf.score(X_test,y_test)

0.9555555555555556