# Boosting Techniques – Questions and Answers (1–10)
Each question is followed by its answer and required Python implementation.

### Question 1: What is Boosting in Machine Learning?
**Answer:** Boosting is an ensemble technique that combines multiple weak learners sequentially, where each new model focuses on correcting the errors of the previous models. This improves predictive accuracy by reducing bias and variance.

### Question 2: Difference between AdaBoost and Gradient Boosting
**Answer:** AdaBoost adjusts sample weights to focus on misclassified samples, while Gradient Boosting trains models sequentially by minimizing a loss function using gradient descent.

### Question 3: How does regularization help in XGBoost?
**Answer:** Regularization in XGBoost (L1 and L2 penalties) prevents overfitting by penalizing complex models and encouraging simpler trees.

### Question 4: Why is CatBoost efficient for categorical data?
**Answer:** CatBoost handles categorical variables natively using ordered target encoding, reducing preprocessing effort and preventing target leakage.

### Question 5: Real-world applications where boosting is preferred
**Answer:** Boosting is preferred in credit risk prediction, fraud detection, recommendation systems, and ranking problems where high predictive accuracy is essential.

### Question 6: AdaBoost Classifier

In [None]:

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = AdaBoostClassifier()
model.fit(X_train, y_train)

preds = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, preds))


### Question 7: Gradient Boosting Regressor

In [None]:

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import r2_score

X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = GradientBoostingRegressor()
model.fit(X_train, y_train)

preds = model.predict(X_test)
print("R2 Score:", r2_score(y_test, preds))


### Question 8: XGBoost with GridSearchCV

In [None]:

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

params = {'learning_rate':[0.01,0.1,0.2]}
grid = GridSearchCV(XGBClassifier(eval_metric='logloss'), params, cv=3)
grid.fit(X_train, y_train)

best = grid.best_estimator_
print("Best Params:", grid.best_params_)
print("Accuracy:", accuracy_score(y_test, best.predict(X_test)))


### Question 9: CatBoost Classifier with Confusion Matrix

In [None]:

from catboost import CatBoostClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = CatBoostClassifier(verbose=0)
model.fit(X_train, y_train)

cm = confusion_matrix(y_test, model.predict(X_test))
sns.heatmap(cm, annot=True, fmt="d")
plt.show()



### Question 10: Boosting Pipeline for Loan Default Prediction
**Answer:**  
1. Preprocess data by imputing missing values, encoding categorical variables, and scaling numerical features if required.  
2. Choose CatBoost if categorical features dominate, XGBoost for high performance and flexibility, and AdaBoost for simpler problems.  
3. Tune hyperparameters using GridSearchCV or Bayesian optimization.  
4. Evaluate using ROC-AUC, Precision-Recall, and F1-score due to class imbalance.  
5. The business benefits from improved default prediction accuracy, better risk management, and more informed lending decisions.
