---
---
# Ensemble Techniques Theoretical Answers:-
---
---

###1. Can we use Bagging for regression problems?
Yes, Bagging can be used for regression as well as classification problems. For regression, Bagging aggregates predictions using average instead of majority voting.

---
###2. What is the difference between multiple model training and single model training?
- Single model training uses one algorithm to learn patterns.

- Multiple model training (ensemble) combines several models to improve accuracy and reduce overfitting/variance.

---
###3. Explain the concept of feature randomness in Random Forest.
In Random Forest, each tree chooses a random subset of features for splitting nodes. This feature randomness increases model diversity and reduces correlation between trees.

---
###4. What is OOB (Out-of-Bag) Score?
OOB Score is the validation score calculated using data not included in the bootstrap sample for training a tree. It provides an internal estimate of model performance without using a separate validation set.

---
###5. How can you measure the importance of features in a Random Forest model?
Feature importance is measured by evaluating how much each feature decreases impurity (Gini/entropy) across all trees. You can access this via .feature_importances_ in sklearn.

---
###6. Explain the working principle of a Bagging Classifier.
- Creates multiple models (e.g., decision trees) using bootstrap samples.

- Each model makes a prediction.

- Final prediction is made by majority voting (classification).

---
###7. How do you evaluate a Bagging Classifier’s performance?
Use metrics like accuracy, precision, recall, F1-score, and confusion matrix on a test set or through cross-validation/OOB score.

---
###8. How does a Bagging Regressor work?
Similar to Bagging Classifier, but instead of voting, it averages predictions from multiple base regressors trained on different bootstrap samples.

---
###9. What is the main advantage of ensemble techniques?
They increase accuracy, reduce overfitting, and improve generalization by combining diverse models.

---
###10. What is the main challenge of ensemble methods?
They can be computationally expensive, harder to interpret, and may lead to complexity in deployment.

---
###11. Explain the key idea behind ensemble techniques.
Ensemble learning combines the strengths of multiple models to produce better performance than any single model.

---
###12. What is a Random Forest Classifier?
It is an ensemble of decision trees built using bagging and feature randomness, and it predicts using majority voting.

---
###13. What are the main types of ensemble techniques?
- Bagging (e.g., Random Forest)

- Boosting (e.g., AdaBoost, XGBoost)

- Stacking (combining models using a meta-model)

---
###14. What is ensemble learning in machine learning?
A technique where multiple models (weak or strong) are combined to produce a more accurate and robust model.

---
###15. When should we avoid using ensemble methods?
- When interpretability is crucial.

- When the dataset is small.

- When a single model performs well already.

---
###16. How does Bagging help in reducing overfitting?
By training on different random samples of data, Bagging reduces variance, thus preventing overfitting common in high-variance models like decision trees.

---
###17. Why is Random Forest better than a single Decision Tree?
Because it reduces overfitting, improves accuracy, and is more robust to noise, due to ensemble of multiple trees and random feature selection.

---
###18. What is the role of bootstrap sampling in Bagging?
Bootstrap sampling creates diverse training datasets by sampling with replacement. This helps in training diverse models, which is crucial for reducing variance.

---
###19. What are some real-world applications of ensemble techniques?
- Medical diagnosis (cancer detection)

- Credit scoring in finance

- Spam detection

- Recommendation systems

- Stock market prediction

---
###20. What is the difference between Bagging and Boosting?
| Feature        | Bagging            | Boosting                    |
| -------------- | ------------------ | --------------------------- |
| Model Training | Parallel           | Sequential                  |
| Focus          | Reduces variance   | Reduces bias                |
| Data Sampling  | Bootstrap sampling | Weighted sampling           |
| Examples       | Random Forest      | AdaBoost, XGBoost, LightGBM |

---
---
# Practical Answers:-

---
---
###21. Train a Bagging Classifier using Decision Trees on a sample dataset and print model accuracy




In [None]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=10, random_state=1)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

###22. Train a Bagging Regressor using Decision Trees and evaluate using Mean Squared Error (MSE)

In [None]:
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import fetch_california_housing
from sklearn.metrics import mean_squared_error

data = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3, random_state=42)

reg = BaggingRegressor(base_estimator=DecisionTreeRegressor(), n_estimators=10, random_state=42)
reg.fit(X_train, y_train)
y_pred = reg.predict(X_test)

print("MSE:", mean_squared_error(y_test, y_pred))

###23. Train a Random Forest Classifier on the Breast Cancer dataset and print feature importance scores.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True)
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X, y)

importances = clf.feature_importances_
for name, importance in zip(load_breast_cancer().feature_names, importances):
    print(f"{name}: {importance:.4f}")


###24. Train a Random Forest Regressor and compare its performance with a single Decision Tree

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score

X_train, X_test, y_train, y_test = train_test_split(data.data, test_size=0.3, random_state=42)
tree = DecisionTreeRegressor()
rf = RandomForestRegressor(n_estimators=100, random_state=42)

tree.fit(X_train, y_train)
rf.fit(X_train, y_train)

tree_pred = tree.predict(X_test)
rf_pred = rf.predict(X_test)

print("Decision Tree R² Score:", r2_score(y_test, tree_pred))
print("Random Forest R² Score:", r2_score(y_test, rf_pred))

###25. Compute the Out-of-Bag (OOB) Score for a Random Forest Classifier

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import
X_train, X_test, y_train, y_test = train_test_split(data.data, test_size=0.3, random_state=42)
rf_oob = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=42)
rf_oob.fit(X_train, y_train)
print("OOB Score:", rf_oob.oob_score_)

###26. Train a Bagging Classifier using SVM as a base estimator and print accuracy

In [None]:
from sklearn.svm import SVC

bag_svm = BaggingClassifier(base_estimator=SVC(), n_estimators=10, random_state=42)
bag_svm.fit(X_train, y_train)
y_pred = bag_svm.predict(X_test)
print("Bagging SVM Accuracy:", accuracy_score(y_test, y_pred))

###27. Train a Random Forest Classifier with different numbers of trees and compare accuracy.

In [None]:
for n in [10, 50, 100, 200]:
    model = RandomForestClassifier(n_estimators=n, random_state=42)
    model.fit(X_train, y_train)
    acc = model.score(X_test, y_test)
    print(f"{n} Trees: Accuracy = {acc:.4f}")

###28. Train a Bagging Classifier using Logistic Regression as a base estimator and print AUC score


In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

X_bin, y_bin = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X_bin, y_bin, test_size=0.3, random_state=42)

bag_log = BaggingClassifier(base_estimator=LogisticRegression(max_iter=1000), n_estimators=10, random_state=42)
bag_log.fit(X_train, y_train)
y_prob = bag_log.predict_proba(X_test)[:, 1]

print("AUC Score:", roc_auc_score(y_test, y_prob))

###29. Train a Random Forest Regressor and analyze feature importance scores.

In [None]:
rf_reg = RandomForestRegressor(n_estimators=100, random_state=42)
rf_reg.fit(X_train, y_train)
importances = rf_reg.feature_importances_

for name, importance in zip(data.feature_names, importances):
    print(f"{name}: {importance:.4f}")

###30. Train an ensemble model using both Bagging and Random Forest and compare accuracy.



In [None]:
bag = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=100, random_state=42)
rf = RandomForestClassifier(n_estimators=100, random_state=42)

bag.fit(X_train, y_train)
rf.fit(X_train, y_train)

bag_acc = bag.score(X_test, y_test)
rf_acc = rf.score(X_test, y_test)

print("Bagging Accuracy:", bag_acc)
print("Random Forest Accuracy:", rf_acc)

###31. Train a Random Forest Classifier and tune hyperparameters using GridSearchCV.

In [None]:
from sklearn.model_selection import GridSearchCV

params = {
    'n_estimators': [50, 100],
    'max_depth': [5, 10, 20],
    'min_samples_split': [2, 5]
}

grid = GridSearchCV(RandomForestClassifier(random_state=42), params, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)
print("Best Params:", grid.best_params_)
print("Best Accuracy:", grid.best_score_)

###32. Bagging Regressor with different numbers of base estimators.

In [None]:
for n in [10, 50, 100]:
    model = BaggingRegressor(n_estimators=n, random_state=42)
    model.fit(X_train, y_train)
    mse = mean_squared_error(y_test, model.predict(X_test))
    print(f"{n} Estimators - MSE: {mse:.4f}")

###33. Random Forest Classifier: Analyze misclassified samples.

In [None]:
rf_reg = RandomForestRegressor(n_estimators=100, random_state=42)
rf_reg.fit(X_train, y_train)
y_pred = rf_reg.predict(X_test)
misclassified = X_test[y_pred != y_test]
print("Number of Misclassified Samples:", len(misclassified))


###34. Train a Bagging Classifier and compare its performance with a single Decision Tree Classifier

In [None]:
dt = DecisionTreeClassifier(random_state=42)
bag = BaggingClassifier(base_estimator=dt, n_estimators=100, random_state=42)

dt.fit(X_train, y_train)
bag.fit(X_train, y_train)

print("Decision Tree Accuracy:", dt.score(X_test, y_test))
print("Bagging Accuracy:", bag.score(X_test, y_test))

###35. Train a Random Forest Classifier and visualize the confusion matrix.

In [None]:
from sklearn.metrics import ConfusionMatrixDisplay

rf = RandomForestClassifier()
rf.fit(X_train, y_train)
ConfusionMatrixDisplay.from_estimator(rf, X_test, y_test)

###36. Train a Stacking Classifier using Decision Trees, SVM, and Logistic Regression, and compare accuracy.

In [None]:
from sklearn.ensemble import StackingClassifier
from sklearn.svm import SVC

stack = StackingClassifier(
    estimators=[('dt', DecisionTreeClassifier()), ('svm', SVC(probability=True)), ('lr', LogisticRegression(max_iter=1000))],
    final_estimator=LogisticRegression()
)

stack.fit(X_train, y_train)
print("Stacking Accuracy:", stack.score(X_test, y_test))

###37. Train a Random Forest Classifier and print the top 5 most important features.

In [None]:
import numpy as np

importances = rf.feature_importances_
indices = np.argsort(importances)[::-1]
for i in range(5):
    print(f"{data.feature_names[indices[i]]}: {importances[indices[i]]:.4f}")

###38. Train a Bagging Classifier and evaluate performance using Precision, Recall, and F1-score.

In [None]:
from sklearn.metrics import classification_report

bag = BaggingClassifier(n_estimators=100, random_state=42)
bag.fit(X_train, y_train)
y_pred = bag.predict(X_test)
print(classification_report(y_test, y_pred))

###39.  Train a Random Forest Classifier and analyze the effect of max_depth on accuracy.

In [None]:
for d in [3, 5, 10, None]:
    rf = RandomForestClassifier(max_depth=d, random_state=42)
    rf.fit(X_train, y_train)
    print(f"max_depth={d} -> Accuracy: {rf.score(X_test, y_test):.4f}")

###40. Train a Bagging Regressor using different base estimators (DecisionTree and KNeighbors) and compare performance.

In [None]:
from sklearn.neighbors import KNeighborsRegressor

for model in [DecisionTreeRegressor(), KNeighborsRegressor()]:
    name = model.__class__.__name__
    bag = BaggingRegressor(base_estimator=model, n_estimators=10, random_state=42)
    bag.fit(X_train, y_train)
    print(f"{name} - MSE:", mean_squared_error(y_test, bag.predict(X_test)))

###41. Train a Random Forest Classifier and evaluate its performance using ROC-AUC Score.

In [None]:
from sklearn.metrics import roc_auc_score

rf.fit(X_train, y_train)
y_prob = rf.predict_proba(X_test)[:, 1]
print("ROC-AUC Score:", roc_auc_score(y_test, y_prob))

###42. Train a Bagging Classifier and evaluate its performance using cross-validatio.

In [None]:
from sklearn.model_selection import cross_val_score

bag = BaggingClassifier(n_estimators=50, random_state=42)
scores = cross_val_score(bag, X, y, cv=5, scoring='accuracy')
print("Cross-Val Accuracy:", scores.mean())

###43. Train a Random Forest Classifier and plot the Precision-Recall curve.

In [None]:
from sklearn.metrics import precision_recall_curve
import matplotlib.pyplot as plt

probs = rf.predict_proba(X_test)[:, 1]
prec, rec, _ = precision_recall_curve(y_test, probs)

plt.plot(rec, prec)
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve")
plt.show()

###44. Train a Stacking Classifier with Random Forest and Logistic Regression and compare accuracy.

In [None]:
stack2 = StackingClassifier(
    estimators=[('rf', RandomForestClassifier()), ('lr', LogisticRegression(max_iter=1000))],
    final_estimator=LogisticRegression()
)
stack2.fit(X_train, y_train)
print("Stacking Accuracy:", stack2.score(X_test, y_test))

###45. Train a Bagging Regressor with different levels of bootstrap samples and compare performance.

In [None]:
for b in [True, False]:
    reg = BaggingRegressor(n_estimators=10, bootstrap=b, random_state=42)
    reg.fit(X_train, y_train)
    mse = mean_squared_error(y_test, reg.predict(X_test))
    print(f"Bootstrap={b} -> MSE: {mse:.4f}")