## 🧠 Theoretical Questions

**Q1. What is Boosting in Machine Learning**

Boosting is an ensemble technique that combines multiple weak learners to form a strong learner by focusing more on the errors made by previous models.

**Q2. How does Boosting differ from Bagging**

Bagging builds models independently in parallel; Boosting builds models sequentially, each trying to correct the errors of the previous one.

**Q3. What is the key idea behind AdaBoost**

AdaBoost adjusts the weights of observations based on previous classification performance and gives more focus to misclassified data.

**Q4. Explain the working of AdaBoost with an example**

AdaBoost trains a sequence of weak learners, updating weights of incorrectly classified instances and combining models using weighted majority vote.

**Q5. What is Gradient Boosting, and how is it different from AdaBoost**

Gradient Boosting builds models sequentially like AdaBoost but uses gradient descent to minimize a loss function for better predictions.

**Q6. What is the loss function in Gradient Boosting**

The loss function in Gradient Boosting is typically the squared error for regression or log loss for classification, minimized using gradient descent.

**Q7. How does XGBoost improve over traditional Gradient Boosting**

XGBoost enhances Gradient Boosting with regularization, tree pruning, parallel processing, and handling missing data efficiently.

**Q8. What is the difference between XGBoost and CatBoost**

XGBoost is faster and more tunable, while CatBoost handles categorical variables natively and avoids overfitting using ordered boosting.

**Q9. What are some real-world applications of Boosting techniques**

Boosting is used in fraud detection, customer churn prediction, text classification, medical diagnostics, and recommendation systems.

**Q10. How does regularization help in XGBoost**

Regularization in XGBoost controls overfitting by adding penalty terms to the loss function for model complexity.

**Q11. What are some hyperparameters to tune in Gradient Boosting models**

Hyperparameters include learning rate, number of estimators, max depth, min samples split, subsample ratio, and regularization terms.

**Q12. What is the concept of Feature Importance in Boosting**

Feature importance in Boosting is calculated based on how frequently a feature is used in trees and its contribution to reducing error.

**Q13. Why is CatBoost efficient for categorical data?**

CatBoost efficiently processes categorical data by converting them internally using techniques like ordered statistics and avoids target leakage.

## 🧪 Practical Questions

**Q14. Train an AdaBoost Classifier on a sample dataset and print model accuracy**

In [None]:
# Q14: AdaBoost Classifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = AdaBoostClassifier(n_estimators=50)
model.fit(X_train, y_train)
print("Accuracy:", accuracy_score(y_test, model.predict(X_test)))

**Q15. Train an AdaBoost Regressor and evaluate performance using Mean Absolute Error (MAE)**

In [None]:
# Q15: AdaBoost Regressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = AdaBoostRegressor(n_estimators=50)
model.fit(X_train, y_train)
print("MAE:", mean_absolute_error(y_test, model.predict(X_test)))

**Q16. Train a Gradient Boosting Classifier on the Breast Cancer dataset and print feature importance**

In [None]:
# Q16: Gradient Boosting Classifier Feature Importance
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = GradientBoostingClassifier()
model.fit(X_train, y_train)
print("Feature Importance:", model.feature_importances_)

**Q17. Train a Gradient Boosting Regressor and evaluate using R-Squared Score**

In [None]:
# Q17: Gradient Boosting Regressor R2 Score
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import r2_score

model = GradientBoostingRegressor()
model.fit(X_train, y_train)
print("R2 Score:", r2_score(y_test, model.predict(X_test)))

**Q18. Train an XGBoost Classifier on a dataset and compare accuracy with Gradient Boosting**

In [None]:
# Q18: XGBoost Classifier
from xgboost import XGBClassifier

model = XGBClassifier()
model.fit(X_train, y_train)
print("Accuracy:", model.score(X_test, y_test))

**Q19. Train a CatBoost Classifier and evaluate using F1-Score**

In [None]:
# Q19: CatBoost Classifier F1 Score
from catboost import CatBoostClassifier
from sklearn.metrics import f1_score

model = CatBoostClassifier(verbose=0)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("F1 Score:", f1_score(y_test, y_pred, average='macro'))

**Q20. Train an XGBoost Regressor and evaluate using Mean Squared Error (MSE)**

In [None]:
# Q20: XGBoost Regressor MSE
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

model = XGBRegressor()
model.fit(X_train, y_train)
print("MSE:", mean_squared_error(y_test, model.predict(X_test)))

**Q21. Train an AdaBoost Classifier and visualize feature importance**

In [None]:
# Q21: AdaBoost Feature Importance Plot
import matplotlib.pyplot as plt

model = AdaBoostClassifier()
model.fit(X_train, y_train)
importance = model.feature_importances_
plt.bar(range(len(importance)), importance)
plt.title("Feature Importance")
plt.show()

**Q22. Train a Gradient Boosting Regressor and plot learning curves**

In [None]:
# Q22: Gradient Boosting Regressor Learning Curves
train_scores, test_scores = [], []
for i in range(1, 51):
    model = GradientBoostingRegressor(n_estimators=i)
    model.fit(X_train, y_train)
    train_scores.append(model.score(X_train, y_train))
    test_scores.append(model.score(X_test, y_test))

plt.plot(train_scores, label='Train')
plt.plot(test_scores, label='Test')
plt.legend()
plt.title("Learning Curve")
plt.show()

**Q23. Train an XGBoost Classifier and visualize feature importance**

In [None]:
# Q23: XGBoost Feature Importance Plot
from xgboost import plot_importance

model = XGBClassifier()
model.fit(X_train, y_train)
plot_importance(model)
plt.show()

**Q24. Train a CatBoost Classifier and plot the confusion matrix**

In [None]:
# Q24: CatBoost Confusion Matrix
from sklearn.metrics import confusion_matrix
import seaborn as sns

model = CatBoostClassifier(verbose=0)
model.fit(X_train, y_train)
cm = confusion_matrix(y_test, model.predict(X_test))
sns.heatmap(cm, annot=True)
plt.title("Confusion Matrix")
plt.show()

**Q25. Train an AdaBoost Classifier with different numbers of estimators and compare accuracy**

In [None]:
# Q25: AdaBoost varying estimators
for n in [10, 50, 100]:
    model = AdaBoostClassifier(n_estimators=n)
    model.fit(X_train, y_train)
    print(f"Estimators={n}, Accuracy:", model.score(X_test, y_test))

**Q26. Train a Gradient Boosting Classifier and visualize the ROC curve**

In [None]:
# Q26: Gradient Boosting ROC Curve
from sklearn.metrics import roc_curve

model = GradientBoostingClassifier()
model.fit(X_train, y_train)
probs = model.predict_proba(X_test)[:, 1]
fpr, tpr, _ = roc_curve(y_test, probs)
plt.plot(fpr, tpr)
plt.title("ROC Curve")
plt.xlabel("FPR")
plt.ylabel("TPR")
plt.show()

**Q27. Train an XGBoost Regressor and tune the learning rate using GridSearchCV**

In [None]:
# Q27: XGBoost GridSearchCV Learning Rate
from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(XGBRegressor(), {'learning_rate': [0.01, 0.1, 0.2]}, cv=3)
grid.fit(X_train, y_train)
print("Best Params:", grid.best_params_)

**Q28. Train a CatBoost Classifier on an imbalanced dataset and compare performance with class weighting**

In [None]:
# Q28: CatBoost on Imbalanced Dataset
from sklearn.utils import class_weight

weights = class_weight.compute_sample_weight('balanced', y_train)
model = CatBoostClassifier(verbose=0)
model.fit(X_train, y_train, sample_weight=weights)
print("Accuracy:", model.score(X_test, y_test))

**Q29. Train an AdaBoost Classifier and analyze the effect of different learning rates**

In [None]:
# Q29: AdaBoost - Effect of Learning Rate
for lr in [0.01, 0.1, 1.0]:
    model = AdaBoostClassifier(learning_rate=lr)
    model.fit(X_train, y_train)
    print(f"Learning Rate={lr}, Accuracy:", model.score(X_test, y_test))

**Q30. Train an XGBoost Classifier for multi-class classification and evaluate using log-loss**

In [None]:
# Q30: XGBoost Multi-Class Classification
model = XGBClassifier(objective='multi:softprob', num_class=3)
model.fit(X_train, y_train)
from sklearn.metrics import log_loss
print("Log Loss:", log_loss(y_test, model.predict_proba(X_test)))