# **Ensemble Learning or Random Forest Classifier** #

- A group of predictors or classifiers is known as Ensemble Learning and this algorithm is known as Ensemble method. 
- The prediction or classification given is much more accurate than individual prediction.
- It works on voting classifiers (Imagine you have trained various classifiers and they predict a class where new instance should be placed , Predictions made by each class is noted and class with most votes is the class where new instance is places then)

## ***Voting Classification*** ##

In [10]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier,VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [11]:
log_clf=LogisticRegression()
forst_clf=RandomForestClassifier()
svc=SVC(probability=True)
voting_clf=VotingClassifier(
    estimators=[('lc', log_clf) , ('fc' , forst_clf) , ('sv' , svc)],
    voting='soft'
)

X,y=make_moons(n_samples=10000,noise=0.4)
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)
voting_clf.fit(x_train,y_train)

In [12]:
for clf in (log_clf,forst_clf,svc,voting_clf):
    clf.fit(x_train,y_train)
    y_pred=clf.predict(x_test)
    print(clf.__class__.__name__ , accuracy_score(y_test,y_pred))

LogisticRegression 0.838
RandomForestClassifier 0.8485
SVC 0.872
VotingClassifier 0.869


## ***Bagging and pasting*** ##

- This works on a slightly different algorithm than the voting classification. 
- In bagging the the original dataset is randomly sampled in samples predetermined and different Decision trees are trained on than randomly sampled data.
- Once all trees are trained , ensemble can make prediction for a new instance by simply aggregating the predictors which is basically a statistical mode or avg in case of regression .

In [13]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier

bag_clf=BaggingClassifier(
    DecisionTreeClassifier(),n_estimators=500,max_samples=100,bootstrap=True,n_jobs=-1
)
bag_clf.fit(x_train,y_train)
pred=bag_clf.predict(x_test)
print("Score for bag classifier is " , accuracy_score(y_test,pred))

Score for bag classifier is  0.869


## **Out of bag evaluation** ##

- With bagging , some instances are sampled many times and some aren't sampled even for once known as out of bag . For reference around 63% of training instances are sampled for each predictor and around 37% are not sampled (They are not the same 37% for every predictor) . Since predictor never see these oob instances , it can be evaluated on them and there is no need of seperate cross validation sets

In [14]:
bag_clf=BaggingClassifier(
    DecisionTreeClassifier(),n_estimators=500,bootstrap=True,n_jobs=-1,oob_score=True
)
bag_clf.fit(x_train,y_train)
bag_clf.oob_score_

0.832

In [15]:

pred=bag_clf.predict(x_test)
print("Score for bag classifier is " , accuracy_score(y_test,pred))

Score for bag classifier is  0.8445


## **Random Patches Method** ##

- *In this method the the features are similarly sampled , for feature sampling instead of instance sampling the predictors are trained on different features subset . This is particularly useful when data is too much like training image classification models.*
- *Sampling both instances and features is known as Random Patches Method*

## **Random Forest** ##

- *Random forest is an ensemble method of decision trees and is trained ussing bagging ans sometimes pasting . Typically max samples is set to whole size of training set.*

In [19]:
from sklearn.ensemble import RandomForestClassifier
rnd_clf=RandomForestClassifier(n_estimators=500,max_leaf_nodes=16,n_jobs=-1)
rnd_clf.fit(x_train,y_train)
y_pred_rnd=rnd_clf.predict(x_test)
accuracy_score(y_test,y_pred_rnd)

0.8605

- *It is possible to make trees even more random by setting a threshhold of a feature which would increase bias and reduce variance*

## **Feature Importance** ##
- *Scikit computes the feature importance of each feature by looking at tree nodes that how much it reduces impurity using that feature.*
- It is basically a wheighted average , where each nodes weight is number of training instances linked with it.

In [20]:
from sklearn.datasets import load_iris
iris=load_iris()
iX=iris['data']
iy=iris['target']
rndd_clf=RandomForestClassifier(n_estimators=500,n_jobs=-1)
rndd_clf.fit(iX,iy)

In [21]:
for name ,score in zip (iris['feature_names'],rndd_clf.feature_importances_):
    print(name,score)

sepal length (cm) 0.09982770364076039
sepal width (cm) 0.025214287764934038
petal length (cm) 0.42421688488267695
petal width (cm) 0.45074112371162856


## **Boosting** ##
- *Boosting basically refers to any ensemble method that can combine many weak learners to build a strong one. It is a sequential process in which every next predictor's goal is to correct it's predecessor*

### **AdaBoost** ###

- This is the most common boosting algo

In [23]:
from sklearn.ensemble import AdaBoostClassifier

ada_clf=AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1),n_estimators=200,
    algorithm="SAMME.R",learning_rate=0.5
)
ada_clf.fit(x_train,y_train)
y_pred_ada=ada_clf.predict(x_test)
accuracy_score(y_test,y_pred_ada)

0.8665

## **Gradient Boosting** ##
- *This works in the similar way as Adaboost , with a slight difference.*
- *Instead of tweaking the instances weight at every iteration rather it tries to fit a new predictor to the residual errors made by predecessors*

In [34]:
from sklearn.tree import DecisionTreeRegressor
rx=2*np.random.rand(100,1)
ry=4+3*rx+np.random.randn(100,1)
tree_reg=DecisionTreeRegressor(max_depth=2)
tree_reg.fit(rx,ry)


In [35]:
y2=ry-tree_reg.predict(rx)
tree_reg2=DecisionTreeRegressor(max_depth=2)
tree_reg2.fit(rx,y2)

In [36]:
y3=y2-tree_reg2.predict(rx)
tree_reg3=DecisionTreeRegressor(max_depth=2)

tree_reg3.fit(rx,y3)


In [38]:
x_new=np.array([[0],[2]])
tree_regressors=[tree_reg,tree_reg2,tree_reg3]
for tree in [tree_reg,tree_reg2,tree_reg3]:
    print(tree.predict(x_new))

[4.55091279 9.28563844]
[[-3.07869693e+00 -1.00189061e+00 -4.73472565e+00 -3.07869693e+00
  -4.73472565e+00 -3.07869693e+00 -1.13686838e-15 -1.00189061e+00
  -4.73472565e+00 -1.00189061e+00 -3.07869693e+00 -1.13686838e-15
  -3.07869693e+00 -1.13686838e-15 -4.73472565e+00 -1.13686838e-15
  -4.73472565e+00 -3.07869693e+00 -4.73472565e+00 -1.13686838e-15
  -4.73472565e+00 -1.00189061e+00 -3.07869693e+00 -3.07869693e+00
  -1.00189061e+00 -1.00189061e+00 -3.07869693e+00 -1.00189061e+00
  -3.07869693e+00 -4.73472565e+00 -3.07869693e+00 -4.73472565e+00
  -4.73472565e+00 -4.73472565e+00 -1.13686838e-15 -1.00189061e+00
  -1.13686838e-15 -3.07869693e+00 -4.73472565e+00 -1.13686838e-15
  -3.07869693e+00 -4.73472565e+00 -3.07869693e+00 -1.13686838e-15
  -1.00189061e+00 -3.07869693e+00 -4.73472565e+00 -1.13686838e-15
  -4.73472565e+00 -4.73472565e+00 -3.07869693e+00 -1.13686838e-15
  -1.13686838e-15 -3.07869693e+00 -3.07869693e+00 -1.13686838e-15
  -3.07869693e+00 -1.00189061e+00 -1.13686838e-15 -3

- We can use gradient boosting regressor class as it works as a random forest regressor with increased control on hyperparameters

In [39]:
from sklearn.ensemble import GradientBoostingRegressor
gbrt=GradientBoostingRegressor(max_depth=2,n_estimators=3,learning_rate=1)

In [40]:
gbrt.fit(rx,ry)

  y = column_or_1d(y, warn=True)


- Finding optimal number of trees required

In [41]:
from sklearn.metrics import mean_squared_error
gbrtt=GradientBoostingRegressor(max_depth=2,n_estimators=120)
xx_train,xx_val,yy_train,yy_val=train_test_split(rx,ry)
gbrtt.fit(xx_train,yy_train)

errors=[mean_squared_error(yy_val,y_predd) for y_predd in gbrtt.staged_predict(xx_val)]
best_n_estimator=np.argmin(errors)+1

  y = column_or_1d(y, warn=True)


In [43]:
gbrt_best=GradientBoostingRegressor(max_depth=2,n_estimators=best_n_estimator)
gbrt_best.fit(xx_train,yy_train)

  y = column_or_1d(y, warn=True)


In [42]:
best_n_estimator

23

- All these are available in xgboost library


In [46]:
from xgboost import XGBRFRegressor
xg_reg=XGBRFRegressor()
xg_reg.fit(xx_train,yy_train)
y_predict_xg=xg_reg.predict(xx_val)

In [48]:
xg_reg.fit(xx_train,yy_train,eval_set=[(xx_val,yy_val)])
y_pred2_xg=xg_reg.predict(xx_val)

[0]	validation_0-rmse:1.40350
