# **Ensemble Learning Assignment**



## 🧠 Theoretical Questions

**Q1. Can we use Bagging for regression problems**

Yes, Bagging can be used for regression tasks. The BaggingRegressor in sklearn is specifically designed for such problems by averaging predictions of multiple base regressors.

**Q2. What is the difference between multiple model training and single model training**

Single model training involves learning one model from the data. Multiple model training (ensemble) combines predictions from several models to improve generalization and accuracy.

**Q3. Explain the concept of feature randomness in Random Forest**

In Random Forest, feature randomness means each tree considers a random subset of features while splitting, which helps in reducing correlation among trees and improves model robustness.

**Q4. What is OOB (Out-of-Bag) Score**

Out-of-Bag (OOB) Score is an internal validation method in Bagging/Random Forest. It uses data not included in a particular bootstrap sample to evaluate model performance.

**Q5. How can you measure the importance of features in a Random Forest model**

Feature importance in Random Forest is measured by how much each feature decreases impurity across all trees. sklearn provides `.feature_importances_` for this purpose.

**Q6. Explain the working principle of a Bagging Classifier**

Bagging Classifier builds multiple models (like Decision Trees) using different bootstrap samples and combines their predictions by majority voting (classification) or averaging (regression).

**Q7. How do you evaluate a Bagging Classifier’s performance**

A Bagging Classifier’s performance is evaluated using standard metrics like accuracy, precision, recall, F1-score, etc., often compared to base estimators.

**Q8. How does a Bagging Regressor work**

A Bagging Regressor works by creating multiple base regressors trained on bootstrap samples and aggregating their predictions via averaging to reduce variance.

**Q9. What is the main advantage of ensemble techniques**

Ensemble techniques reduce overfitting and variance, leading to improved predictive performance compared to individual models.

**Q10. What is the main challenge of ensemble methods**

The main challenge of ensemble methods is increased computational complexity and reduced interpretability of the final model.

**Q11. Explain the key idea behind ensemble techniques**

Ensemble techniques combine the strengths of multiple models to produce a robust predictor that often outperforms individual models.

**Q12. What is a Random Forest Classifier**

A Random Forest Classifier is an ensemble of decision trees where each tree votes and the majority vote is the final prediction.

**Q13. What are the main types of ensemble techniques**

The main types of ensemble techniques are Bagging, Boosting, and Stacking.

**Q14. What is ensemble learning in machine learning**

Ensemble learning is a method that combines predictions from multiple models to improve accuracy and robustness.

**Q15. When should we avoid using ensemble methods**

We should avoid ensemble methods when interpretability is important or the computational cost outweighs performance gain.

**Q16. How does Bagging help in reducing overfitting**

Bagging reduces overfitting by averaging multiple models trained on random subsets of data, thus reducing variance.

**Q17. Why is Random Forest better than a single Decision Tree**

Random Forest aggregates multiple trees, reducing overfitting of a single Decision Tree and increasing generalization.

**Q18. What is the role of bootstrap sampling in Bagging**

Bootstrap sampling allows training each model on a slightly different dataset, promoting diversity in predictions.

**Q19. What are some real-world applications of ensemble techniques**

Applications include fraud detection, recommendation systems, credit scoring, and medical diagnosis.

**Q20. What is the difference between Bagging and Boosting?**

Bagging reduces variance by averaging models; Boosting reduces bias by sequentially training models to correct predecessors.

## 🧪 Practical Questions

**Q21. Train a Bagging Classifier using Decision Trees on a sample dataset and print model accuracy**

In [None]:
# Q21: Bagging Classifier using Decision Trees
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=10)
model.fit(X_train, y_train)
print("Accuracy:", accuracy_score(y_test, model.predict(X_test)))

**Q22. Train a Bagging Regressor using Decision Trees and evaluate using Mean Squared Error (MSE)**

In [None]:
# Q22: Bagging Regressor using Decision Trees
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import fetch_california_housing
from sklearn.metrics import mean_squared_error

X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = BaggingRegressor(base_estimator=DecisionTreeRegressor(), n_estimators=10)
model.fit(X_train, y_train)
print("MSE:", mean_squared_error(y_test, model.predict(X_test)))

**Q23. Train a Random Forest Classifier on the Breast Cancer dataset and print feature importance scores**

In [None]:
# Q23: Random Forest Classifier on Breast Cancer dataset
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = RandomForestClassifier()
model.fit(X_train, y_train)
importances = model.feature_importances_
print("Feature importances:", importances)

**Q24. Train a Random Forest Regressor and compare its performance with a single Decision Tree**

In [None]:
# Q24: Compare RF Regressor and Decision Tree
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
dt = DecisionTreeRegressor()
rf = RandomForestRegressor()
dt.fit(X_train, y_train)
rf.fit(X_train, y_train)
print("Decision Tree Score:", dt.score(X_test, y_test))
print("Random Forest Score:", rf.score(X_test, y_test))

**Q25. Compute the Out-of-Bag (OOB) Score for a Random Forest Classifier**

In [None]:
# Q25: Compute OOB Score
model = RandomForestClassifier(oob_score=True)
model.fit(X_train, y_train)
print("OOB Score:", model.oob_score_)

**Q26. Train a Bagging Classifier using SVM as a base estimator and print accuracy**

In [None]:
# Q26: Bagging Classifier with SVM
from sklearn.svm import SVC
model = BaggingClassifier(base_estimator=SVC(), n_estimators=10)
model.fit(X_train, y_train)
print("Accuracy:", model.score(X_test, y_test))

**Q27. Train a Random Forest Classifier with different numbers of trees and compare accuracy**

In [None]:
# Q27: Random Forest Classifier with different n_estimators
for n in [10, 50, 100]:
    rf = RandomForestClassifier(n_estimators=n)
    rf.fit(X_train, y_train)
    print(f"{n} Trees Accuracy:", rf.score(X_test, y_test))

**Q28. Train a Bagging Classifier using Logistic Regression as a base estimator and print AUC score**

In [None]:
# Q28: Bagging with Logistic Regression and AUC
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
model = BaggingClassifier(base_estimator=LogisticRegression(), n_estimators=10)
model.fit(X_train, y_train)
probs = model.predict_proba(X_test)[:, 1]
print("AUC Score:", roc_auc_score(y_test, probs))

**Q29. Train a Random Forest Regressor and analyze feature importance scores**

In [None]:
# Q29: RF Regressor feature importance
model = RandomForestRegressor()
model.fit(X_train, y_train)
print("Top Features:", model.feature_importances_)

**Q30. Train an ensemble model using both Bagging and Random Forest and compare accuracy**

In [None]:
# Q30: Compare Bagging and RF accuracy
bag = BaggingClassifier()
rf = RandomForestClassifier()
bag.fit(X_train, y_train)
rf.fit(X_train, y_train)
print("Bagging Accuracy:", bag.score(X_test, y_test))
print("RF Accuracy:", rf.score(X_test, y_test))

**Q31. Train a Random Forest Classifier and tune hyperparameters using GridSearchCV**

In [None]:
# Q31: GridSearchCV for RF
from sklearn.model_selection import GridSearchCV
param_grid = {'n_estimators': [50, 100], 'max_depth': [None, 10]}
grid = GridSearchCV(RandomForestClassifier(), param_grid, cv=3)
grid.fit(X_train, y_train)
print("Best Parameters:", grid.best_params_)

**Q32. Train a Bagging Regressor with different numbers of base estimators and compare performance**

In [None]:
# Q32: Bagging Regressor with different estimators
for n in [5, 10, 20]:
    model = BaggingRegressor(n_estimators=n)
    model.fit(X_train, y_train)
    print(f"n_estimators={n}, Score:", model.score(X_test, y_test))

**Q33. Train a Random Forest Classifier and analyze misclassified samples**

In [None]:
# Q33: Analyze misclassified samples
model = RandomForestClassifier()
model.fit(X_train, y_train)
preds = model.predict(X_test)
misclassified = (preds != y_test)
print("Misclassified Samples:", sum(misclassified))

**Q34. Train a Bagging Classifier and compare its performance with a single Decision Tree Classifier**

In [None]:
# Q34: Bagging vs Single Decision Tree
dt = DecisionTreeClassifier()
bag = BaggingClassifier(base_estimator=dt)
dt.fit(X_train, y_train)
bag.fit(X_train, y_train)
print("Decision Tree Accuracy:", dt.score(X_test, y_test))
print("Bagging Accuracy:", bag.score(X_test, y_test))

**Q35. Train a Random Forest Classifier and visualize the confusion matrix**

In [None]:
# Q35: RF Confusion Matrix
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
model = RandomForestClassifier()
model.fit(X_train, y_train)
cm = confusion_matrix(y_test, model.predict(X_test))
sns.heatmap(cm, annot=True, fmt="d")
plt.title("Confusion Matrix")
plt.show()

**Q36. Train a Stacking Classifier using Decision Trees, SVM, and Logistic Regression, and compare accuracy**

In [None]:
# Q36: Stacking Classifier
from sklearn.ensemble import StackingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
base_learners = [('dt', DecisionTreeClassifier()), ('svm', SVC(probability=True))]
stack = StackingClassifier(estimators=base_learners, final_estimator=LogisticRegression())
stack.fit(X_train, y_train)
print("Stacking Accuracy:", stack.score(X_test, y_test))

**Q37. Train a Random Forest Classifier and print the top 5 most important features**

In [None]:
# Q37: Top 5 Features from RF
model = RandomForestClassifier()
model.fit(X_train, y_train)
import numpy as np
top_features = np.argsort(model.feature_importances_)[-5:]
print("Top 5 Feature Indices:", top_features)

**Q38. Train a Bagging Classifier and evaluate performance using Precision, Recall, and F1-score**

In [None]:
# Q38: Bagging with precision, recall, F1
from sklearn.metrics import precision_score, recall_score, f1_score
model = BaggingClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Precision:", precision_score(y_test, y_pred, average='macro'))
print("Recall:", recall_score(y_test, y_pred, average='macro'))
print("F1 Score:", f1_score(y_test, y_pred, average='macro'))

**Q39. Train a Random Forest Classifier and analyze the effect of max_depth on accuracy**

In [None]:
# Q39: RF accuracy with different max_depth
for depth in [5, 10, None]:
    rf = RandomForestClassifier(max_depth=depth)
    rf.fit(X_train, y_train)
    print(f"max_depth={depth}, Accuracy:", rf.score(X_test, y_test))

**Q40. Train a Bagging Regressor using different base estimators (DecisionTree and KNeighbors) and compare performance**

In [None]:
# Q40: Bagging Regressor with DT & KNN
from sklearn.neighbors import KNeighborsRegressor
for base in [DecisionTreeRegressor(), KNeighborsRegressor()]:
    model = BaggingRegressor(base_estimator=base, n_estimators=10)
    model.fit(X_train, y_train)
    print(f"{type(base).__name__} Score:", model.score(X_test, y_test))

**Q41. Train a Random Forest Classifier and evaluate its performance using ROC-AUC Score**

In [None]:
# Q41: RF ROC-AUC
from sklearn.metrics import roc_auc_score
model = RandomForestClassifier()
model.fit(X_train, y_train)
probs = model.predict_proba(X_test)[:, 1]
print("ROC AUC:", roc_auc_score(y_test, probs))

**Q42. Train a Bagging Classifier and evaluate its performance using cross-validation**

In [None]:
# Q42: Cross-validation for Bagging Classifier
from sklearn.model_selection import cross_val_score
scores = cross_val_score(BaggingClassifier(), X, y, cv=5)
print("CV Accuracy:", scores.mean())

**Q43. Train a Random Forest Classifier and plot the Precision-Recall curve**

In [None]:
# Q43: RF Precision-Recall curve
from sklearn.metrics import precision_recall_curve
probs = model.predict_proba(X_test)[:, 1]
precision, recall, _ = precision_recall_curve(y_test, probs)
plt.plot(recall, precision)
plt.title("Precision-Recall Curve")
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.show()

**Q44. Train a Stacking Classifier with Random Forest and Logistic Regression and compare accuracy**

In [None]:
# Q44: Stacking with RF & LR
base_learners = [('rf', RandomForestClassifier())]
stack = StackingClassifier(estimators=base_learners, final_estimator=LogisticRegression())
stack.fit(X_train, y_train)
print("Stacked Accuracy:", stack.score(X_test, y_test))

**Q45. Train a Bagging Regressor with different levels of bootstrap samples and compare performance**

In [None]:
# Q45: Bagging Regressor with bootstrap variation
for b in [True, False]:
    model = BaggingRegressor(bootstrap=b)
    model.fit(X_train, y_train)
    print(f"Bootstrap={b}, Score:", model.score(X_test, y_test))