In [None]:
# Bagging in Machine Learning

## Q1. How does bagging reduce overfitting in decision trees?
Bagging reduces overfitting by training multiple decision trees on different bootstrap samples of the data and averaging their predictions. This process decreases the variance of the model, leading to better generalization and reduced overfitting.

## Q2. What are the advantages and disadvantages of using different types of base learners in bagging?
**Advantages:**
- **Diverse base learners**: Capture various aspects of the data and improve ensemble performance.
- **Flexibility**: Leverage strengths of different models.

**Disadvantages:**
- **Complexity**: Increased model complexity due to managing and combining different types of base learners.
- **Interpretability**: Harder to interpret ensembles with diverse base learners.

## Q3. How does the choice of base learner affect the bias-variance tradeoff in bagging?
- **High-bias base learners**: Can lead to a higher bias overall, as they are less complex and might not capture the data well.
- **High-variance base learners**: Bagging reduces variance by averaging predictions, helping to control overfitting. 

## Q4. Can bagging be used for both classification and regression tasks? How does it differ in each case?
- **Classification**: Bagging combines the predictions of multiple classifiers, typically using majority voting for final classification.
- **Regression**: Bagging combines the predictions of multiple regressors by averaging their predictions to get the final output.

## Q5. What is the role of ensemble size in bagging? How many models should be included in the ensemble?
- **Role**: Larger ensembles generally provide better performance as they average out more noise and reduce variance.
- **Number of models**: There is no fixed number; typically, an ensemble size between 50 to 100 models is used. The exact number depends on the problem and computational resources.

## Q6. Can you provide an example of a real-world application of bagging in machine learning?
An example of a real-world application of bagging is **random forests**, which use bagging with decision trees as base learners. Random forests are used in various fields, such as finance for credit scoring, healthcare for disease prediction, and image recognition for classifying objects in pictures.

### Example Code for Bagging with Decision Trees in Python


In [1]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# Initialize Bagging Classifier
bagging_clf = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=50, random_state=0)

# Train model
bagging_clf.fit(X_train, y_train)

# Make predictions
y_pred = bagging_clf.predict(X_test)

# Evaluate performance
accuracy = accuracy_score(y_test, y_pred)
accuracy

0.9777777777777777