<a href="https://colab.research.google.com/github/Tushar-rancy/ensemble_learning-Assignment/blob/main/ensemble_learning_assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Theoretical Questions

**1. Can we use Bagging for regression problems?**

Yes, Bagging can be used for regression tasks. The BaggingRegressor in scikit-learn is designed for such use cases. It aggregates the predictions from multiple regressors by averaging them.

**2. What is the difference between multiple model training and single model training?**

Single model training involves training a single algorithm, whereas multiple model training (ensemble learning) combines predictions from several models to improve accuracy and robustness.

**3. Explain the concept of feature randomness in Random Forest.**

Random Forest introduces feature randomness by selecting a random subset of features at each split, which helps to de-correlate the trees and reduce overfitting.

**4. What is OOB (Out-of-Bag) Score?**

The OOB score is an internal error estimate of a Random Forest model. It's calculated using the data not included in the bootstrap sample for each tree.

**5. How can you measure the importance of features in a Random Forest model?**

Feature importance can be measured by how much each feature decreases the impurity in a tree. Scikit-learn provides `feature_importances_` attribute for this.

**6. Explain the working principle of a Bagging Classifier.**

Bagging Classifier creates multiple subsets of the original dataset using bootstrapping, trains a base estimator on each, and aggregates predictions via voting.

**7. How do you evaluate a Bagging Classifier’s performance?**

You can evaluate it using accuracy, precision, recall, F1-score, cross-validation, or confusion matrix.

**8. How does a Bagging Regressor work?**

It trains multiple regressors on bootstrapped subsets and averages their predictions to reduce variance.

**9. What is the main advantage of ensemble techniques?**

Improved performance, reduced variance, and increased robustness.

**10. What is the main challenge of ensemble methods?**

Increased computational cost and reduced interpretability.

**11. Explain the key idea behind ensemble techniques.**

Combine predictions from multiple models to improve generalization and reduce overfitting.

**12. What is a Random Forest Classifier?**

An ensemble of decision trees where each tree is trained on a bootstrap sample with random feature selection at each split.

**13. What are the main types of ensemble techniques?**

Bagging, Boosting, Stacking, and Voting.

**14. What is ensemble learning in machine learning?**

A technique that combines predictions from multiple models to improve accuracy and robustness.

**15. When should we avoid using ensemble methods?**

When interpretability and simplicity are more important, or for small datasets.

**16. How does Bagging help in reducing overfitting?**

By averaging multiple models trained on bootstrapped samples, Bagging reduces variance.

**17. Why is Random Forest better than a single Decision Tree?**

It reduces overfitting by averaging multiple trees.

**18. What is the role of bootstrap sampling in Bagging?**

Bootstrap sampling creates different subsets for training individual models, helping reduce variance.

**19. What are some real-world applications of ensemble techniques?**

Spam detection, fraud detection, credit scoring, medical diagnosis, and recommendation systems.

**20. What is the difference between Bagging and Boosting?**

Bagging trains models in parallel to reduce variance, while Boosting trains models sequentially to reduce bias.


## Practical Questions

1. Train a Bagging Classifier using Decision Trees on a sample dataset and print model accuracy
2. Train a Bagging Regressor using Decision Trees and evaluate using Mean Squared Error (MSE)
3. Train a Random Forest Classifier on the Breast Cancer dataset and print feature importance scores
4. Train a Random Forest Regressor and compare its performance with a single Decision Tree
5. Compute the Out-of-Bag (OOB) Score for a Random Forest Classifier
6. Train a Bagging Classifier using SVM as a base estimator and print accuracy
7. Train a Random Forest Classifier with different numbers of trees and compare accuracy
8. Train a Bagging Classifier using Logistic Regression as a base estimator and print AUC score
9. Train a Random Forest Regressor and analyze feature importance scores
10. Train an ensemble model using both Bagging and Random Forest and compare accuracy
11. Train a Random Forest Classifier and tune hyperparameters using GridSearchCV
12. Train a Bagging Regressor with different numbers of base estimators and compare performance
13. Train a Random Forest Classifier and analyze misclassified samples
14. Train a Bagging Classifier and compare its performance with a single Decision Tree Classifier
15. Train a Random Forest Classifier and visualize the confusion matrix
16. Train a Stacking Classifier using Decision Trees, SVM, and Logistic Regression, and compare accuracy
17. Train a Random Forest Classifier and print the top 5 most important features
18. Train a Bagging Classifier and evaluate performance using Precision, Recall, and F1-score
19. Train a Random Forest Classifier and analyze the effect of max_depth on accuracy
20. Train a Bagging Regressor using different base estimators (DecisionTree and KNeighbors) and compare performance
21. Train a Random Forest Classifier and evaluate its performance using ROC-AUC Score
22. Train a Bagging Classifier and evaluate its performance using cross-validation
23. Train a Random Forest Classifier and plot the Precision-Recall curve
24. Train a Stacking Classifier with Random Forest and Logistic Regression and compare accuracy
25. Train a Bagging Regressor with different levels of bootstrap samples and compare performance

In [None]:
# Bagging Classifier Example
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=10, random_state=0)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
