<a href="https://colab.research.google.com/github/0xs1d/pwskills/blob/main/Ensemble_Assignment_Solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ensemble Learning Assignment — Bagging, Random Forest, Boosting

---

## 1. Can we use Bagging for regression problems?
Yes. Bagging works for both classification and regression. In regression, the final prediction is obtained by averaging predictions of all base regressors.


## 2. Difference between multiple model training and single model training
Single model training uses one algorithm on the full dataset. Multiple model training trains several models on different subsets and aggregates their predictions to improve stability and accuracy.


## 3. Concept of feature randomness in Random Forest
Random Forest selects a random subset of features at each split. This decorrelates trees and improves generalization.


## 4. What is OOB (Out-of-Bag) score?
OOB score evaluates model performance using samples not included in the bootstrap sample. It acts like built‑in cross‑validation.


## 5. Measuring feature importance in Random Forest
Feature importance is computed using Gini importance or permutation-based importance.


## 6. Working principle of a Bagging Classifier
It trains multiple models on bootstrap samples and combines predictions using majority vote.


## 7. Evaluating a Bagging Classifier’s performance
Use metrics such as accuracy, precision, recall, F1-score, or cross‑validation.


## 8. How a Bagging Regressor works
Trains multiple regressors on bootstrap samples and averages their predictions.


## 9. Main advantage of ensemble techniques
They reduce variance and improve predictive performance.


## 10. Main challenge of ensemble methods
They can be computationally expensive and less interpretable.


## 11. Key idea behind ensemble techniques
Combine multiple diverse models to achieve better accuracy than a single model.


## 12. What is a Random Forest Classifier?
An ensemble of decision trees using bagging and random feature selection.


## 13. Main types of ensemble techniques
Bagging, Boosting, Stacking, Blending.


## 14. What is ensemble learning?
Machine learning technique where multiple models work together to improve performance.


## 15. When should we avoid using ensemble methods?
When interpretability is required or computational resources are limited.


## 16. How Bagging helps reduce overfitting
Bagging reduces variance by averaging predictions from diverse models.


## 17. Why Random Forest is better than a single Decision Tree
It reduces overfitting and provides more stable predictions.


## 18. Role of bootstrap sampling in Bagging
It creates multiple diverse datasets, increasing model diversity.


## 19. Real-world applications of ensemble techniques
Fraud detection, medical diagnosis, credit scoring, recommendation engines.


## 20. Difference between Bagging and Boosting
Bagging builds models independently in parallel; Boosting builds models sequentially, correcting previous errors.


## Train a Bagging Classifier using Decision Trees

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_iris(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y,test_size=0.3,random_state=42)
model = BaggingClassifier(DecisionTreeClassifier(), n_estimators=20)
model.fit(Xtr,ytr)
print("Accuracy:", accuracy_score(yte, model.predict(Xte)))


## Train a Bagging Regressor and compute MSE

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_boston(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y,test_size=0.3)
reg = BaggingRegressor(DecisionTreeRegressor(), n_estimators=20)
reg.fit(Xtr,ytr)
print("MSE:", mean_squared_error(yte, reg.predict(Xte)))


## Random Forest Classifier — Feature Importance

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_breast_cancer(return_X_y=True)
rf = RandomForestClassifier().fit(X,y)
print("Feature Importance:", rf.feature_importances_[:10])


## Random Forest Regressor vs Decision Tree

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_boston(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
dt = DecisionTreeRegressor().fit(Xtr,ytr)
rf = RandomForestRegressor().fit(Xtr,ytr)
print("DT MSE:", mean_squared_error(yte, dt.predict(Xte)))
print("RF MSE:", mean_squared_error(yte, rf.predict(Xte)))


## Compute Random Forest OOB Score

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_breast_cancer(return_X_y=True)
rf = RandomForestClassifier(oob_score=True, bootstrap=True).fit(X,y)
print("OOB Score:", rf.oob_score_)


## Bagging Classifier using SVM

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_iris(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
model = BaggingClassifier(SVC(), n_estimators=10)
model.fit(Xtr,ytr)
print("Accuracy:", model.score(Xte,yte))


## Random Forest with different n_estimators

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_iris(return_X_y=True)
for n in [10,50,100,200]:
    print(n, RandomForestClassifier(n_estimators=n).fit(X,y).score(X,y))


## Bagging + Logistic Regression — AUC Score

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_breast_cancer(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
bag = BaggingClassifier(LogisticRegression(max_iter=500), n_estimators=10)
bag.fit(Xtr,ytr)
prob = bag.predict_proba(Xte)[:,1]
print("AUC:", roc_auc_score(yte, prob))


## Random Forest Regressor — Feature Importance

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_boston(return_X_y=True)
rf = RandomForestRegressor().fit(X,y)
print(rf.feature_importances_[:10])


## Bagging vs Random Forest — Accuracy Comparison

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_iris(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
bag = BaggingClassifier(DecisionTreeClassifier()).fit(Xtr,ytr)
rf = RandomForestClassifier().fit(Xtr,ytr)
print("Bagging:", bag.score(Xte,yte))
print("Random Forest:", rf.score(Xte,yte))


## Random Forest + GridSearchCV

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_iris(return_X_y=True)
params={'n_estimators':[50,100],'max_depth':[3,5,7]}
grid = GridSearchCV(RandomForestClassifier(), params, cv=3).fit(X,y)
print("Best Params:", grid.best_params_)


## Bagging Regressor — Different Estimators

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_boston(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
for n in [5,10,20]:
    br = BaggingRegressor(n_estimators=n).fit(Xtr,ytr)
    print(n, mean_squared_error(yte, br.predict(Xte)))


## Random Forest — Misclassified Samples

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_iris(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
rf = RandomForestClassifier().fit(Xtr,ytr)
pred = rf.predict(Xte)
print("Misclassified:", [(i, yte[i], pred[i]) for i in range(len(pred)) if pred[i]!=yte[i]])


## Bagging Classifier vs Decision Tree

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_iris(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
dt = DecisionTreeClassifier().fit(Xtr,ytr)
bag = BaggingClassifier(DecisionTreeClassifier()).fit(Xtr,ytr)
print("DT:", dt.score(Xte,yte))
print("Bagging:", bag.score(Xte,yte))


## Random Forest — Confusion Matrix

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_iris(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
rf = RandomForestClassifier().fit(Xtr,ytr)
print(confusion_matrix(yte, rf.predict(Xte)))


## Stacking Classifier — DT + SVM + LR

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_iris(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
estimators=[('dt',DecisionTreeClassifier()),('svm',SVC(probability=True))]
stack = StackingClassifier(estimators, final_estimator=LogisticRegression())
stack.fit(Xtr,ytr)
print("Accuracy:", stack.score(Xte,yte))


## Random Forest — Top 5 Features

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_breast_cancer(return_X_y=True)
rf = RandomForestClassifier().fit(X,y)
fi = rf.feature_importances_
idx = fi.argsort()[-5:][::-1]
print(idx, fi[idx])


## Bagging Classifier — Precision, Recall, F1

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_iris(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
bag = BaggingClassifier(DecisionTreeClassifier()).fit(Xtr,ytr)
pred = bag.predict(Xte)
print(
    precision_score(yte,pred,average='weighted'),
    recall_score(yte,pred,average='weighted'),
    f1_score(yte,pred,average='weighted')
)


## Random Forest — Effect of max_depth

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_iris(return_X_y=True)
for d in [2,4,6,8,10]:
    rf = RandomForestClassifier(max_depth=d).fit(X,y)
    print(d, rf.score(X,y))


## Bagging Regressor — DT vs KNN

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_boston(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
for base in [DecisionTreeRegressor(), KNeighborsRegressor()]:
    br = BaggingRegressor(base).fit(Xtr,ytr)
    print(type(base).__name__, mean_squared_error(yte, br.predict(Xte)))


## Random Forest — ROC-AUC Score

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_breast_cancer(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
rf = RandomForestClassifier().fit(Xtr,ytr)
prob = rf.predict_proba(Xte)[:,1]
print("ROC-AUC:", roc_auc_score(yte, prob))


## Bagging — Cross-validation

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_iris(return_X_y=True)
bag = BaggingClassifier(DecisionTreeClassifier())
scores = cross_val_score(bag, X, y, cv=5)
print(scores.mean())


## Random Forest — Precision-Recall curve

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

import matplotlib.pyplot as plt
X,y = load_breast_cancer(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
rf = RandomForestClassifier().fit(Xtr,ytr)
prob = rf.predict_proba(Xte)[:,1]
prec,rec,_ = precision_recall_curve(yte,prob)
print(prec[:10], rec[:10])


## Stacking Classifier — RF + LR

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_iris(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
estimators=[('rf',RandomForestClassifier())]
stack = StackingClassifier(estimators, final_estimator=LogisticRegression())
stack.fit(Xtr,ytr)
print(stack.score(Xte,yte))


## Bagging Regressor — Different bootstrap sample sizes

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier, BaggingRegressor, RandomForestClassifier, RandomForestRegressor, StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve

X,y = load_boston(return_X_y=True)
Xtr,Xte,ytr,yte = train_test_split(X,y)
for s in [0.4,0.6,1.0]:
    br = BaggingRegressor(max_samples=s).fit(Xtr,ytr)
    print(s, mean_squared_error(yte, br.predict(Xte)))
