# Theoretical


1.  Can we use Bagging for regression problem.
- Yes, Bagging can be used for regression problems (Bagging Regressor) as well as classification problems (Bagging Classifier).

2. What is the difference between multiple model training and single model training.
- Single model training: One model is trained on the dataset.

- Multiple model training (Ensemble): Multiple models are trained and combined to improve performance and robustness.

3. Explain the concept of feature randomness in Random Forest.
- Random Forest introduces randomness by selecting a random subset of features at each split, which helps in reducing correlation between individual trees.

4. What is OOB (Out-of-Bag) Score
- OOB Score is an internal validation score computed from data samples not included in the bootstrap sample for a given tree.

5.  How can you measure the importance of features in a Random Forest model
- Feature importance can be measured by calculating how much each feature decreases impurity across the forest or using permutation importance.

6.  Explain the working principle of a Bagging Classifier.
- Bagging builds multiple versions of a predictor (classifier) using bootstrap samples and combines their predictions (majority vote for classification).

7. How do you evaluate a Bagging Classifier’s performance.
- You can use accuracy, precision, recall, F1-score, confusion matrix, cross-validation, and ROC-AUC depending on the problem.


8.  How does a Bagging Regressor work.
- Similar to classifier, but predictions are averaged instead of voting.

9. What is the main advantage of ensemble techniques.
- They reduce variance and overfitting, improving accuracy and robustness.



10. What is the main challenge of ensemble methods
- They can be computationally expensive, difficult to interpret, and require careful tuning.



11.  Explain the key idea behind ensemble techniques.
- Combine multiple weak learners to create a strong learner with better generalization.



12. What is a Random Forest Classifier
- An ensemble of Decision Trees trained with bagging and feature randomness.



13. What are the main types of ensemble techniques.
- Bagging

- Boosting

- Stacking

14.  What is ensemble learning in machine learning
- Technique of combining predictions of multiple models to improve performance.



15.  When should we avoid using ensemble methods
- When interpretability is critical or computational resources are limited.

16. How does Bagging help in reducing overfitting
- By averaging predictions of multiple models trained on different bootstrap samples, reducing variance.

17. Why is Random Forest better than a single Decision Tree
- It reduces overfitting and variance, leading to better generalization.



18. What is the role of bootstrap sampling in Bagging
- Provides varied training data to individual models, improving robustness.



19.  What are some real-world applications of ensemble techniques
-  Fraud detection, medical diagnosis, stock market prediction, recommendation systems, etc.

20. What is the difference between Bagging and Boosting
- Bagging: Parallel, reduces variance.

- Boosting: Sequential, reduces bias and variance.



# Practical

21. Train a Bagging Classifier using Decision Trees on a sample dataset and print model accuracy.
- from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

-  Load data
X, y = load_iris(return_X_y=True)

- Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

- Build Bagging Classifier
model = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=100, random_state=42)
model.fit(X_train, y_train)

- Predict
y_pred = model.predict(X_test)

- Accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))


22. Train a Bagging Regressor using Decision Trees and evaluate using Mean Squared Error (MSE)2.
- from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_squared_error

- X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

- model = BaggingRegressor(base_estimator=DecisionTreeRegressor(), n_estimators=100, random_state=42)
model.fit(X_train, y_train)

- y_pred = model.predict(X_test)
print("MSE:", mean_squared_error(y_test, y_pred))


23. Train a Random Forest Classifier on the Breast Cancer dataset and print feature importance scores.
- from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer

- X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

- model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

- importances = model.feature_importances_
for i, imp in enumerate(importances):
    print(f"Feature {i}: Importance {imp}")


24. Train a Random Forest Regressor and compare its performance with a single Decision Tree.
- from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor

- Example: Diabetes dataset
model_rf = RandomForestRegressor(n_estimators=100, random_state=42)
model_dt = DecisionTreeRegressor(random_state=42)

- model_rf.fit(X_train, y_train)
model_dt.fit(X_train, y_train)

- y_pred_rf = model_rf.predict(X_test)
y_pred_dt = model_dt.predict(X_test)

- print("RF MSE:", mean_squared_error(y_test, y_pred_rf))
print("DT MSE:", mean_squared_error(y_test, y_pred_dt))


25. Compute the Out-of-Bag (OOB) Score for a Random Forest Classifier.
- model = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=42)
model.fit(X_train, y_train)

- print("OOB Score:", model.oob_score_)


26. Train a Bagging Classifier using SVM as a base estimator and print accuracy.
- from sklearn.svm import SVC

- model = BaggingClassifier(base_estimator=SVC(probability=True), n_estimators=50, random_state=42)
model.fit(X_train, y_train)

- y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


27. Train a Random Forest Classifier with different numbers of trees and compare accuracy.
- for n in [10, 50, 100, 200]:
    model = RandomForestClassifier(n_estimators=n, random_state=42)
    model.fit(X_train, y_train)
    acc = model.score(X_test, y_test)
    print(f"Trees: {n}, Accuracy: {acc}")


28. Train a Bagging Classifier using Logistic Regression as a base estimator and print AUC score.
- from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

- model = BaggingClassifier(base_estimator=LogisticRegression(), n_estimators=50, random_state=42)
model.fit(X_train, y_train)

- y_pred_prob = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, y_pred_prob)
print("AUC Score:", auc)


29. Train a Random Forest Regressor and analyze feature importance scores2.
- model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

- importances = model.feature_importances_
for i, imp in enumerate(importances):
    print(f"Feature {i}: Importance {imp}")


30. Train an ensemble model using both Bagging and Random Forest and compare accuracy.
- Random Forest
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
acc_rf = rf.score(X_test, y_test)

- Bagging
bag = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=100, random_state=42)
bag.fit(X_train, y_train)
acc_bag = bag.score(X_test, y_test)

- print("Random Forest Accuracy:", acc_rf)
print("Bagging Accuracy:", acc_bag)


31.  Train a Random Forest Classifier and tune hyperparameters using GridSearchCV.
- from sklearn.model_selection import GridSearchCV

- param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
}

- grid = GridSearchCV(RandomForestClassifier(random_state=42), param_grid, cv=5)
grid.fit(X_train, y_train)
print("Best params:", grid.best_params_)
print("Best accuracy:", grid.best_score_)


32. Train a Bagging Regressor with different numbers of base estimators and compare performance.
- for n in [10, 50, 100, 200]:
    model = BaggingRegressor(n_estimators=n, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print(f"n_estimators: {n}, MSE: {mean_squared_error(y_test, y_pred)}")


33. Train a Random Forest Classifier and analyze misclassified samples.
- import numpy as np

- y_pred = rf.predict(X_test)
misclassified = np.where(y_test != y_pred)[0]
print("Misclassified sample indices:", misclassified)


34. Train a Bagging Classifier and compare its performance with a single Decision Tree Classifier.
- dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train, y_train)
acc_dt = dt.score(X_test, y_test)

- bag = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=100, random_state=42)
bag.fit(X_train, y_train)
acc_bag = bag.score(X_test, y_test)

- print("DT Accuracy:", acc_dt)
print("Bagging Accuracy:", acc_bag)


35. Train a Random Forest Classifier and visualize the confusion matrix.
- from sklearn.metrics import ConfusionMatrixDisplay

- ConfusionMatrixDisplay.from_estimator(rf, X_test, y_test)


36. Train a Stacking Classifier using Decision Trees, SVM, and Logistic Regression, and compare accuracy.
- from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

- estimators = [
    ('dt', DecisionTreeClassifier(random_state=42)),
    ('svm', SVC(probability=True, random_state=42))
]

- stack_model = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression(),
    cv=5
)

- stack_model.fit(X_train, y_train)
print("Stacking Classifier Accuracy:", stack_model.score(X_test, y_test))


37. Train a Random Forest Classifier and print the top 5 most important features.
- import numpy as np

- importances = rf.feature_importances_
- indices = np.argsort(importances)[::-1]

- for i in range(5):
- print(f"Feature {indices[i]}: Importance {importances[indices[i]]}")


38. Train a Bagging Classifier and evaluate performance using Precision, Recall, and F1-score.
- from sklearn.metrics import precision_score, recall_score, f1_score

- y_pred = bag.predict(X_test)

- print("Precision:", precision_score(y_test, y_pred, average='macro'))
- print("Recall:", recall_score(y_test, y_pred, average='macro'))
- print("F1 Score:", f1_score(y_test, y_pred, average='macro'))


39. Train a Random Forest Classifier and analyze the effect of max_depth on accuracy.
- for depth in [None, 5, 10, 15, 20]:
    model = RandomForestClassifier(max_depth=depth, n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    acc = model.score(X_test, y_test)
    print(f"max_depth: {depth}, Accuracy: {acc}")


40. Train a Bagging Regressor using different base estimators (DecisionTree and KNeighbors) and compare
performance.
- from sklearn.neighbors import KNeighborsRegressor

-  DecisionTree Regressor
dt_model = BaggingRegressor(base_estimator=DecisionTreeRegressor(), n_estimators=50, random_state=42)
dt_model.fit(X_train, y_train)
dt_mse = mean_squared_error(y_test, dt_model.predict(X_test))

-  KNeighbors Regressor
kn_model = BaggingRegressor(base_estimator=KNeighborsRegressor(), n_estimators=50, random_state=42)
kn_model.fit(X_train, y_train)
kn_mse = mean_squared_error(y_test, kn_model.predict(X_test))

- print("DecisionTree Regressor MSE:", dt_mse)
- print("KNeighbors Regressor MSE:", kn_mse)


41. Train a Random Forest Classifier and evaluate its performance using ROC-AUC Score.
- from sklearn.metrics import roc_auc_score

- y_pred_prob = rf.predict_proba(X_test)[:, 1]
 roc_auc = roc_auc_score(y_test, y_pred_prob)
- print("ROC-AUC Score:", roc_auc)


42. Train a Bagging Classifier and evaluate its performance using cross-validatio.
- from sklearn.model_selection import cross_val_score

- scores = cross_val_score(bag, X_train, y_train, cv=5)
- print("Cross-validation scores:", scores)
- print("Mean CV accuracy:", scores.mean())


43. Train a Random Forest Classifier and plot the Precision-Recall curve.
- from sklearn.metrics import PrecisionRecallDisplay

- PrecisionRecallDisplay.from_estimator(rf, X_test, y_test)


44. Train a Stacking Classifier with Random Forest and Logistic Regression and compare accuracy.
- estimators = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42))
]

- stack_model = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression(),
    cv=5
)

- stack_model.fit(X_train, y_train)
- print("Stacking Classifier Accuracy:", stack_model.score(X_test, y_test))


45.  Train a Bagging Regressor with different levels of bootstrap samples and compare performance.
- for bootstrap_frac in [0.5, 0.7, 0.9, 1.0]:
    model = BaggingRegressor(n_estimators=100, max_samples=bootstrap_frac, random_state=42)
    model.fit(X_train, y_train)
    mse = mean_squared_error(y_test, model.predict(X_test))
    print(f"Bootstrap frac: {bootstrap_frac}, MSE: {mse}")
