# Ensemble Learning – Questions and Answers (1–10)
Each question is followed by its answer and corresponding Python code where required.

### Question 1: What is Ensemble Learning?
**Answer:** Ensemble learning is a technique that combines multiple machine learning models to produce a stronger predictive model. The key idea is that combining several weak or moderately strong learners reduces variance, bias, and improves overall predictive performance.

### Question 2: Difference between Bagging and Boosting
**Answer:** Bagging trains models independently on different bootstrap samples and averages predictions, mainly reducing variance. Boosting trains models sequentially where each model focuses on correcting previous errors, reducing both bias and variance.

### Question 3: What is bootstrap sampling?
**Answer:** Bootstrap sampling is sampling with replacement from the dataset to create multiple training subsets. In Bagging and Random Forest, it allows each model to be trained on a slightly different dataset, improving robustness.

### Question 4: What are OOB samples?
**Answer:** Out-of-Bag samples are the data points not selected in a bootstrap sample. These samples can be used as a validation set to estimate model performance without needing separate validation data.

### Question 5: Feature importance: Decision Tree vs Random Forest
**Answer:** A single Decision Tree computes importance based on splits in that tree only, making it unstable. Random Forest averages importance across many trees, producing more reliable feature importance estimates.

### Question 6: Random Forest Feature Importance

In [None]:

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

data = load_breast_cancer()
X, y = data.data, data.target

model = RandomForestClassifier(random_state=42)
model.fit(X, y)

importance = pd.Series(model.feature_importances_, index=data.feature_names)
print(importance.sort_values(ascending=False).head(5))


### Question 7: Bagging vs Single Decision Tree

In [None]:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.metrics import accuracy_score

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

dt = DecisionTreeClassifier().fit(X_train, y_train)
bag = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=50).fit(X_train, y_train)

print("Decision Tree:", accuracy_score(y_test, dt.predict(X_test)))
print("Bagging:", accuracy_score(y_test, bag.predict(X_test)))


### Question 8: Random Forest GridSearchCV

In [None]:

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True)

params = {'n_estimators':[50,100], 'max_depth':[None,5,10]}
grid = GridSearchCV(RandomForestClassifier(), params, cv=3)
grid.fit(X, y)

print("Best Params:", grid.best_params_)
print("Best Score:", grid.best_score_)


### Question 9: Bagging vs Random Forest Regressor

In [None]:

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor, RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.tree import DecisionTreeRegressor

X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

bag = BaggingRegressor(estimator=DecisionTreeRegressor(), n_estimators=50).fit(X_train, y_train)
rf = RandomForestRegressor(n_estimators=100).fit(X_train, y_train)

print("Bagging MSE:", mean_squared_error(y_test, bag.predict(X_test)))
print("Random Forest MSE:", mean_squared_error(y_test, rf.predict(X_test)))



### Question 10: Real-world Ensemble Strategy
**Answer:**  
1. Choose Bagging if variance is high and data is noisy; choose Boosting if bias is high.  
2. Handle overfitting using cross-validation, early stopping (boosting), and limiting tree depth.  
3. Select decision trees as base models due to flexibility.  
4. Evaluate using K-fold cross-validation with metrics like ROC-AUC.  
5. Ensemble learning improves decision-making by producing more stable predictions, reducing risk in financial decisions such as loan default prediction.
