---

### 1. What is Boosting in Machine Learning?

Boosting is an ensemble machine learning technique that combines multiple weak learners (models that perform slightly better than random guessing, e.g., shallow decision trees) to create a strong predictive model. Unlike standalone models, boosting trains weak learners sequentially, where each learner focuses on correcting the errors of its predecessors, improving overall accuracy.

- **Key Idea**: Boosting reduces bias and variance by iteratively learning from mistakes, assigning higher weights to misclassified or poorly predicted instances.
- **Example**: AdaBoost, Gradient Boosting, XGBoost, and CatBoost are popular boosting algorithms.

---

### 2. How does Boosting differ from Bagging?

Boosting and Bagging are both ensemble techniques, but they differ significantly in their approach. Since you’ve explored Bagging Classifiers and Regressors (e.g., with Decision Trees, SVM, Logistic Regression), here’s a direct comparison:

| **Aspect**              | **Bagging**                                                                 | **Boosting**                                                                |
|-------------------------|-----------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| **Full Form**           | Bootstrap Aggregating                                                | Iterative improvement by focusing on errors                                |
| **Training**            | Trains multiple models independently in parallel on random bootstrap samples. | Trains models sequentially, with each model learning from previous errors. |
| **Instance Weighting**  | All instances have equal weight; uses random sampling with replacement.      | Adjusts weights of instances, giving more focus to misclassified ones.      |
| **Model Dependency**    | Models are independent; no interaction between them.                        | Models are dependent; each corrects the mistakes of the previous ones.      |
| **Objective**           | Reduces variance by averaging (e.g., Random Forest, Bagging Classifier).     | Reduces bias and variance by focusing on hard-to-predict instances.         |
| **Example**             | Bagging Classifier with Decision Trees (as in your previous tasks).         | AdaBoost, Gradient Boosting, XGBoost.                                      |

- **Connection to Your Work**: In your Bagging Classifier tasks (e.g., using Decision Trees or SVM), models were trained independently on bootstrap samples and combined via majority voting or averaging. Boosting, however, would adjust weights to focus on misclassified instances, potentially improving accuracy on difficult cases.

---

### 3. What is the key idea behind AdaBoost?

**AdaBoost** (Adaptive Boosting) is a boosting algorithm that combines weak learners (typically decision stumps, i.e., single-level decision trees) into a strong classifier by focusing on misclassified instances.

- **Key Idea**: AdaBoost assigns weights to training instances, initially equal. After each weak learner is trained, it increases the weights of misclassified instances and decreases the weights of correctly classified ones. Subsequent learners focus more on the harder instances. The final prediction is a weighted combination of all weak learners’ predictions, where each learner’s weight depends on its accuracy.

---

### 4. Explain the working of AdaBoost with an example.

**How AdaBoost Works**:
1. **Initialize Weights**: Assign equal weights to all training instances (e.g., for \( N \) instances, each has weight \( 1/N \)).
2. **Train Weak Learner**: Train a weak learner (e.g., a decision stump) on the weighted dataset.
3. **Compute Error**: Calculate the weighted error rate of the learner.
4. **Assign Learner Weight**: Compute the learner’s weight based on its error (lower error → higher weight).
5. **Update Instance Weights**: Increase weights for misclassified instances and decrease for correctly classified ones, then normalize weights.
6. **Repeat**: Train subsequent learners on the updated weights, iterating for a fixed number of rounds or until convergence.
7. **Final Prediction**: Combine weak learners’ predictions via a weighted majority vote (classification) or weighted average (regression).

**Example**:
Suppose you’re classifying Iris flowers (Setosa vs. Non-Setosa) using AdaBoost with decision stumps:
- **Dataset**: 10 samples, initially each with weight \( 1/10 = 0.1 \).
- **Step 1**: Train a decision stump (e.g., “If petal length > 2.5, predict Non-Setosa”). It misclassifies 3 instances.
- **Step 2**: Compute the stump’s error (weighted error = 0.3) and assign it a weight (e.g., \( \alpha = 0.5 \cdot \ln((1-0.3)/0.3) \approx 0.423 \)).
- **Step 3**: Increase weights of the 3 misclassified instances (e.g., from 0.1 to 0.15) and decrease others, then normalize.
- **Step 4**: Train the next stump on the new weights, focusing on the misclassified instances.
- **Step 5**: Repeat for, say, 10 stumps.
- **Final Model**: Predict by summing the weighted votes of all stumps (e.g., a sample gets a score based on each stump’s prediction and weight).

**Output**: The final model is more accurate than a single stump, as it emphasizes harder-to-classify samples.

---

### 5. What is Gradient Boosting, and how is it different from AdaBoost?

**Gradient Boosting** is a boosting algorithm that builds an ensemble of weak learners (typically decision trees) sequentially, where each learner corrects the errors of the previous ones by minimizing a loss function using gradient descent.

- **How It Works**: Instead of adjusting instance weights like AdaBoost, Gradient Boosting fits each new learner to the *residual errors* (negative gradients of the loss function) of the previous ensemble’s predictions. The final prediction is the sum of all learners’ predictions, scaled by a learning rate.

**Differences from AdaBoost**:
| **Aspect**              | **AdaBoost**                                                       | **Gradient Boosting**                                              |
|-------------------------|--------------------------------------------------------------------|--------------------------------------------------------------------|
| **Error Correction**    | Adjusts instance weights based on misclassifications.              | Fits new learners to residuals (negative gradients) of the loss.   |
| **Loss Function**       | Implicitly uses exponential loss for classification.               | Explicitly optimizes a user-defined loss (e.g., MSE, log loss).    |
| **Learner Weighting**   | Weights learners based on their error rate.                        | Scales learners’ contributions with a learning rate.               |
| **Flexibility**         | Primarily for classification, less flexible loss functions.        | Supports various loss functions for regression and classification. |
| **Example**             | Focuses on misclassified points (e.g., in Iris classification).    | Fits trees to residuals (e.g., in California Housing regression). |

- **Connection to Your Work**: In your Bagging Regressor tasks (e.g., with Decision Trees), you averaged independent trees to reduce variance. Gradient Boosting, unlike Bagging, would sequentially fit trees to residuals, reducing bias and potentially achieving lower MSE.

---

### 6. What is the loss function in Gradient Boosting?

The loss function in Gradient Boosting measures the error between predicted and actual values, guiding the optimization process. Each new weak learner is trained to minimize this loss by following its negative gradient.

- **Common Loss Functions**:
  - **Regression**:
    - **Mean Squared Error (MSE)**: \( L(y, \hat{y}) = \frac{1}{2}(y - \hat{y})^2 \). Used in your Bagging Regressor tasks for California Housing.
    - **Mean Absolute Error (MAE)**: \( L(y, \hat{y}) = |y - \hat{y}| \).
  - **Classification**:
    - **Log Loss (Logistic Loss)**: For binary classification, \( L(y, \hat{y}) = -[y \log(\hat{y}) + (1-y) \log(1-\hat{y})] \).
    - **Exponential Loss**: Used implicitly in AdaBoost, also available in Gradient Boosting.
  - **Custom Losses**: Gradient Boosting allows custom loss functions, as long as they are differentiable.

- **Role**: The loss function determines the residuals (negative gradients) that each new learner fits. For example, in regression with MSE, residuals are \( y - \hat{y} \).

---

### 7. How does XGBoost improve over traditional Gradient Boosting?

**XGBoost** (Extreme Gradient Boosting) is an optimized implementation of Gradient Boosting with several enhancements:

1. **Regularization**: Adds L1 (Lasso) and L2 (Ridge) regularization to penalize complex models, reducing overfitting.
2. **Second-Order Optimization**: Uses second-order derivatives (Hessian) of the loss function for faster convergence, unlike traditional Gradient Boosting’s first-order approach.
3. **Parallelization**: Supports parallel tree construction, making it faster than traditional Gradient Boosting.
4. **Handling Missing Values**: Automatically learns how to handle missing data by assigning them to the best branch.
5. **Tree Pruning**: Uses a “max_depth” parameter and prunes trees greedily, stopping when splits no longer improve the loss.
6. **Weighted Quantile Sketch**: Efficiently handles large datasets by approximating split points.
7. **Sparsity Awareness**: Optimizes for sparse data (e.g., one-hot encoded features).
8. **Cross-Validation**: Built-in support for cross-validation to prevent overfitting.

- **Connection to Your Work**: Compared to your Bagging Regressor (e.g., MSE of ~0.2558 on California Housing), XGBoost typically achieves lower MSE due to its sequential learning and regularization, especially on complex datasets.

---

### 8. What is the difference between XGBoost and CatBoost?

| **Aspect**              | **XGBoost**                                                                 | **CatBoost**                                                               |
|-------------------------|-----------------------------------------------------------------------------|---------------------------------------------------------------------------|
| **Categorical Features** | Requires manual encoding (e.g., one-hot encoding) for categorical data.     | Natively handles categorical features using ordered target encoding.       |
| **Training Speed**      | Fast due to parallelization, but encoding categorical data adds overhead.   | Often faster for datasets with categorical features due to native handling.|
| **Overfitting Control** | Uses L1/L2 regularization and tree pruning.                                | Uses symmetric trees and ordered boosting to reduce overfitting.           |
| **Ease of Use**         | Requires more preprocessing for categorical data.                          | Simpler to use with categorical data; less preprocessing needed.           |
| **Performance**         | Strong on numerical data; may overfit with small datasets.                 | Robust for mixed data types; performs well on small datasets.              |
| **Implementation**      | Open-source, widely used, with GPU support.                                | Open-source, with GPU support, optimized for categorical data.             |

- **Connection to Your Work**: If you used a dataset with categorical features in your Bagging tasks, CatBoost would be more efficient than XGBoost, as it avoids manual encoding.

---

### 9. What are some real-world applications of Boosting techniques?

Boosting techniques like AdaBoost, Gradient Boosting, XGBoost, and CatBoost are widely used in real-world applications due to their high accuracy:
1. **Finance**: Credit scoring, fraud detection (e.g., XGBoost for detecting fraudulent transactions).
2. **Healthcare**: Disease prediction (e.g., CatBoost for predicting cancer risk from medical records).
3. **Marketing**: Customer churn prediction, recommendation systems (e.g., Gradient Boosting for predicting user preferences).
4. **Natural Language Processing**: Text classification, sentiment analysis (e.g., XGBoost for spam detection).
5. **Retail**: Sales forecasting, inventory optimization (e.g., Gradient Boosting for demand prediction, similar to your California Housing regression tasks).
6. **Competitions**: Kaggle and other machine learning competitions, where XGBoost and CatBoost often dominate due to their performance.

---

### 10. How does regularization help in XGBoost?

Regularization in XGBoost prevents overfitting by adding penalties to the loss function, discouraging overly complex models:
- **L1 Regularization (Lasso)**: Adds the absolute value of leaf weights (\( \sum |w| \)) to the loss, promoting sparsity (fewer non-zero weights).
- **L2 Regularization (Ridge)**: Adds the squared value of leaf weights (\( \sum w^2 \)) to the loss, penalizing large weights to smooth predictions.
- **Benefits**:
  - Reduces overfitting by constraining tree complexity.
  - Improves generalization to unseen data.
  - Stabilizes predictions on noisy datasets (e.g., unlike your Bagging Regressor, which relies solely on averaging to reduce variance).
- **Parameters**: Controlled by `lambda` (L2) and `alpha` (L1) in XGBoost.

---

### 11. What are some hyperparameters to tune in Gradient Boosting models?

Key hyperparameters to tune in Gradient Boosting models (e.g., Gradient Boosting, XGBoost, CatBoost):
1. **n_estimators**: Number of boosting iterations (trees). Higher values improve performance but increase computation (e.g., you used 10 in Bagging tasks; 100–1000 is common for boosting).
2. **learning_rate**: Scales each tree’s contribution. Lower values (e.g., 0.01–0.1) improve accuracy but require more trees.
3. **max_depth**: Maximum depth of each tree. Deeper trees (e.g., 3–10) capture complex patterns but risk overfitting.
4. **min_child_weight** (XGBoost): Minimum sum of instance weights in a leaf. Higher values prevent overfitting.
5. **subsample**: Fraction of samples used per tree (e.g., 0.5–1.0). Similar to `max_samples` in your Bagging tasks but applied sequentially.
6. **colsample_bytree**: Fraction of features used per tree (e.g., 0.5–1.0), similar to Random Forest’s feature randomness.
7. **lambda/alpha** (XGBoost): L2/L1 regularization strength to control overfitting.
8. **depth** (CatBoost): Controls tree depth, similar to `max_depth`.
9. **iterations** (CatBoost): Equivalent to `n_estimators`.

- **Tuning Strategy**: Use grid search or random search with cross-validation (like in your Bagging cross-validation task) to find optimal values.

---

### 12. What is the concept of Feature Importance in Boosting?

**Feature Importance** in boosting measures how much each feature contributes to the model’s predictions:
- **How It’s Calculated**:
  - **Gain**: The improvement in loss (e.g., MSE in regression) from splits using a feature, averaged across all trees (used in XGBoost).
  - **Weight**: Number of times a feature is used in splits (used in Gradient Boosting).
  - **Cover**: Number of samples affected by splits on a feature (used in XGBoost).
- **Interpretation**: Higher importance indicates a feature is more influential in reducing the loss or improving predictions.
- **Use Cases**: Feature selection, understanding model behavior, and identifying key predictors (e.g., in your California Housing tasks, median income might have high importance).

- **Connection to Your Work**: Unlike Bagging, where feature importance is less straightforward, boosting models like XGBoost provide clear importance scores to analyze which features drive predictions.

---

### 13. Why is CatBoost efficient for categorical data?

**CatBoost** (Categorical Boosting) is designed to handle categorical features efficiently:
1. **Native Categorical Handling**: Uses ordered target encoding, converting categories to numerical values based on target statistics (e.g., mean target value per category), avoiding manual one-hot encoding required in XGBoost.
2. **Ordered Boosting**: Reduces overfitting by using a permutation-based approach to calculate target statistics, ensuring unbiased gradient estimates.
3. **Symmetric Trees**: Builds balanced trees, reducing memory usage and speeding up predictions compared to traditional Gradient Boosting.
4. **Robust to Overfitting**: Handles high-cardinality categorical features (e.g., many unique values) without exploding feature dimensionality.
5. **Efficiency**: Faster training and prediction for datasets with categorical features, as it skips preprocessing steps.

- **Connection to Your Work**: If your Bagging Classifier tasks involved datasets with categorical features, CatBoost would simplify preprocessing and potentially outperform Bagging by leveraging categorical data effectively.

---



In [None]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset
data = load_iris()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train AdaBoost Classifier
adaboost_clf = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=1, random_state=42),
    n_estimators=50,
    random_state=42
)
adaboost_clf.fit(X_train, y_train)

# Predict and calculate accuracy
y_pred = adaboost_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# Print accuracy
print("AdaBoost Classifier Accuracy: {:.4f}".format(accuracy))

AdaBoost Classifier Accuracy: 0.9333


In [None]:
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

# Load the California Housing dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train AdaBoost Regressor
adaboost_reg = AdaBoostRegressor(
    estimator=DecisionTreeRegressor(max_depth=3, random_state=42),
    n_estimators=50,
    random_state=42
)
adaboost_reg.fit(X_train, y_train)

# Predict and calculate MAE
y_pred = adaboost_reg.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)

# Print MAE
print("AdaBoost Regressor Mean Absolute Error: {:.4f}".format(mae))

AdaBoost Regressor Mean Absolute Error: 0.6498


In [None]:
# 16. Train a Gradient Boosting Classifier on the Breast Cancer dataset and print feature importance.
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import pandas as pd

# Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train Gradient Boosting Classifier
gb_clf = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb_clf.fit(X_train, y_train)

# Get feature importance
feature_importance = pd.DataFrame({
    'Feature': data.feature_names,
    'Importance': gb_clf.feature_importances_
}).sort_values(by='Importance', ascending=False)

# Print feature importance
print("Gradient Boosting Classifier Feature Importance:")
print(feature_importance)

Gradient Boosting Classifier Feature Importance:
                    Feature  Importance
7       mean concave points    0.450528
27     worst concave points    0.240103
20             worst radius    0.075589
22          worst perimeter    0.051408
21            worst texture    0.039886
23               worst area    0.038245
1              mean texture    0.027805
26          worst concavity    0.018725
16          concavity error    0.013068
13               area error    0.008415
10             radius error    0.006870
24         worst smoothness    0.004811
19  fractal dimension error    0.004224
11            texture error    0.003604
5          mean compactness    0.002996
15        compactness error    0.002511
4           mean smoothness    0.002467
17     concave points error    0.002038
28           worst symmetry    0.001478
12          perimeter error    0.001157
6            mean concavity    0.000922
18           symmetry error    0.000703
14         smoothness error    

In [None]:
# 17. Train a Gradient Boosting Regressor and evaluate using R-Squared Score.

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Load the California Housing dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train Gradient Boosting Regressor
gb_reg = GradientBoostingRegressor(n_estimators=100, random_state=42)
gb_reg.fit(X_train, y_train)

# Predict and calculate R² score
y_pred = gb_reg.predict(X_test)
r2 = r2_score(y_test, y_pred)

# Print R² score
print("Gradient Boosting Regressor R² Score: {:.4f}".format(r2))

Gradient Boosting Regressor R² Score: 0.7756


In [9]:
# 18. Train an XGBoost Classifier on a dataset and compare accuracy with Gradient Boosting.

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import xgboost as xgb

# Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train Gradient Boosting Classifier
gb_clf = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb_clf.fit(X_train, y_train)
gb_pred = gb_clf.predict(X_test)
gb_accuracy = accuracy_score(y_test, gb_pred)

# Initialize and train XGBoost Classifier
xgb_clf = xgb.XGBClassifier(n_estimators=100, use_label_encoder=False, eval_metric='logloss', random_state=42)
xgb_clf.fit(X_train, y_train)
xgb_pred = xgb_clf.predict(X_test)
xgb_accuracy = accuracy_score(y_test, xgb_pred)

# Print accuracies
print("Gradient Boosting Classifier Accuracy: {:.4f}".format(gb_accuracy))
print("XGBoost Classifier Accuracy: {:.4f}".format(xgb_accuracy))

Gradient Boosting Classifier Accuracy: 0.9561
XGBoost Classifier Accuracy: 0.9561


Parameters: { "use_label_encoder" } are not used.



In [11]:
# 19. Train a CatBoost Classifier and evaluate using F1-score.

from catboost import CatBoostClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

# Load the Iris dataset
data = load_iris()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train CatBoost Classifier
cat_clf = CatBoostClassifier(iterations=100, random_state=42, verbose=0)
cat_clf.fit(X_train, y_train)

# Predict and calculate F1-score
y_pred = cat_clf.predict(X_test)
f1 = f1_score(y_test, y_pred, average='macro')

# Print F1-score
print("CatBoost Classifier F1-Score (macro): {:.4f}".format(f1))

ModuleNotFoundError: No module named 'catboost'

In [12]:
# 20. Train an XGBoost Regressor and evaluate using Mean Squared Error (MSE).

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import xgboost as xgb

# Load the California Housing dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train XGBoost Regressor
xgb_reg = xgb.XGBRegressor(n_estimators=100, random_state=42)
xgb_reg.fit(X_train, y_train)

# Predict and calculate MSE
y_pred = xgb_reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)

# Print MSE
print("XGBoost Regressor Mean Squared Error: {:.4f}".format(mse))


XGBoost Regressor Mean Squared Error: 0.2226


In [14]:
# 21. Train an AdaBoost Classifier and visualize feature importance.

import matplotlib.pyplot as plt
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd

# Load the Iris dataset
data = load_iris()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train AdaBoost Classifier
adaboost_clf = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=1, random_state=42),
    n_estimators=50,
    random_state=42
)
adaboost_clf.fit(X_train, y_train)

# Get feature importance
feature_importance = pd.DataFrame({
    'Feature': data.feature_names,
    'Importance': adaboost_clf.feature_importances_
}).sort_values(by='Importance', ascending=False)

# Print feature importance
print("AdaBoost Classifier Feature Importance:")
print(feature_importance)

# Visualize feature importance
plt.figure(figsize=(8, 6))
plt.barh(feature_importance['Feature'], feature_importance['Importance'], color='#1f77b4')
plt.xlabel('Importance')
plt.title('AdaBoost Classifier Feature Importance')
plt.gca().invert_yaxis()
plt.savefig('adaboost_feature_importance.png')
plt.close()

AdaBoost Classifier Feature Importance:
             Feature  Importance
3   petal width (cm)    0.423293
2  petal length (cm)    0.374844
1   sepal width (cm)    0.159977
0  sepal length (cm)    0.041886


In [17]:
# 22. Train a Gradient Boosting Regressor and plot learning curves.

import matplotlib.pyplot as plt
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split, learning_curve

# Load the California Housing dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Gradient Boosting Regressor
gb_reg = GradientBoostingRegressor(n_estimators=100, random_state=42)

# Compute learning curves
train_sizes, train_scores, test_scores = learning_curve(
    gb_reg, X, y, cv=5, scoring='neg_mean_squared_error', train_sizes=np.linspace(0.1, 1.0, 10)
)

# Calculate mean and std
train_scores_mean = -np.mean(train_scores, axis=1)
test_scores_mean = -np.mean(test_scores, axis=1)

# Plot learning curves
plt.figure(figsize=(8, 6))
plt.plot(train_sizes, train_scores_mean, label='Training MSE', color='#1f77b4')
plt.plot(train_sizes, test_scores_mean, label='Validation MSE', color='#ff7f0e')
plt.xlabel('Training Set Size')
plt.ylabel('Mean Squared Error')
plt.title('Gradient Boosting Regressor Learning Curves')
plt.legend()
plt.grid(True)
plt.savefig('gb_learning_curves.png')
plt.close()

In [18]:
# 23. Train an XGBoost Classifier and visualize feature importance.

import matplotlib.pyplot as plt
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import xgboost as xgb

# Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train XGBoost Classifier
xgb_clf = xgb.XGBClassifier(n_estimators=100, use_label_encoder=False, eval_metric='logloss', random_state=42)
xgb_clf.fit(X_train, y_train)

# Get feature importance
feature_importance = pd.DataFrame({
    'Feature': data.feature_names,
    'Importance': xgb_clf.feature_importances_
}).sort_values(by='Importance', ascending=False)

# Print feature importance
print("XGBoost Classifier Feature Importance:")
print(feature_importance)

# Visualize feature importance
plt.figure(figsize=(8, 6))
plt.barh(feature_importance['Feature'], feature_importance['Importance'], color='#1f77b4')
plt.xlabel('Importance')
plt.title('XGBoost Classifier Feature Importance')
plt.gca().invert_yaxis()
plt.savefig('xgb_feature_importance.png')
plt.close()

Parameters: { "use_label_encoder" } are not used.



XGBoost Classifier Feature Importance:
                    Feature  Importance
27     worst concave points    0.285641
7       mean concave points    0.235738
22          worst perimeter    0.174253
20             worst radius    0.075987
23               worst area    0.057031
21            worst texture    0.021676
26          worst concavity    0.018667
12          perimeter error    0.018553
1              mean texture    0.014433
0               mean radius    0.012819
3                 mean area    0.012336
16          concavity error    0.011444
13               area error    0.008527
6            mean concavity    0.007317
10             radius error    0.005298
24         worst smoothness    0.005140
19  fractal dimension error    0.004664
4           mean smoothness    0.004639
11            texture error    0.004528
15        compactness error    0.004081
9    mean fractal dimension    0.003935
28           worst symmetry    0.003510
14         smoothness error    0.003320
1

In [19]:
# 24. Train a CatBoost Classifier and plot the confusion matrix.

import matplotlib.pyplot as plt
from catboost import CatBoostClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# Load the Iris dataset
data = load_iris()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train CatBoost Classifier
cat_clf = CatBoostClassifier(iterations=100, random_state=42, verbose=0)
cat_clf.fit(X_train, y_train)

# Predict and compute confusion matrix
y_pred = cat_clf.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

# Plot confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=data.target_names)
disp.plot(cmap='Blues')
plt.title('CatBoost Classifier Confusion Matrix')
plt.savefig('catboost_confusion_matrix.png')
plt.close()

ModuleNotFoundError: No module named 'catboost'

In [None]:
# 25. Train an AdaBoost Classifier with different numbers of estimators and compare accuracy.

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset
data = load_iris()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Test different numbers of estimators
n_estimators_list = [10, 50, 100, 200]

# Store accuracies
accuracies = []

for n in n_estimators_list:
    adaboost_clf = AdaBoostClassifier(
        estimator=DecisionTreeClassifier(max_depth=1, random_state=42),
        n_estimators=n,
        random_state=42
    )
    adaboost_clf.fit(X_train, y_train)
    y_pred = adaboost_clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"AdaBoost Classifier (n_estimators={n}) Accuracy: {accuracy:.4f}")

In [None]:
# 26. Train a Gradient Boosting Classifier and visualize the ROC curve.

import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve, auc

# Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train Gradient Boosting Classifier
gb_clf = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb_clf.fit(X_train, y_train)

# Predict probabilities
y_pred_proba = gb_clf.predict_proba(X_test)[:, 1]

# Compute ROC curve and AUC
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='#1f77b4', label=f'ROC curve (AUC = {roc_auc:.4f})')
plt.plot([0, 1], [0, 1], color='gray', linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Gradient Boosting Classifier ROC Curve')
plt.legend()
plt.grid(True)
plt.savefig('gb_roc_curve.png')
plt.close()

In [21]:
# 27. Train an XGBoost Regressor and tune the learning rate using GridSearchCV.

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error
import xgboost as xgb

# Load the California Housing dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize XGBoost Regressor
xgb_reg = xgb.XGBRegressor(random_state=42)

# Define parameter grid
param_grid = {'learning_rate': [0.01, 0.1, 0.3]}

# Perform GridSearchCV
grid_search = GridSearchCV(xgb_reg, param_grid, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)
grid_search.fit(X_train, y_train)

# Get best model and predict
best_xgb = grid_search.best_estimator_
y_pred = best_xgb.predict(X_test)
mse = mean_squared_error(y_test, y_pred)

# Print results
print(f"Best Learning Rate: {grid_search.best_params_['learning_rate']}")
print(f"XGBoost Regressor Mean Squared Error: {mse:.4f}")

Best Learning Rate: 0.3
XGBoost Regressor Mean Squared Error: 0.2226


In [23]:
# 28. Train a CatBoost Classifier on an imbalanced dataset and compare performance with class weighting.

from catboost import CatBoostClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

# Generate imbalanced dataset
X, y = make_classification(n_classes=2, class_sep=2, weights=[0.9, 0.1], n_samples=1000, random_state=42)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train CatBoost without class weights
cat_clf = CatBoostClassifier(iterations=100, random_state=42, verbose=0)
cat_clf.fit(X_train, y_train)
y_pred = cat_clf.predict(X_test)
f1_no_weight = f1_score(y_test, y_pred)

# Train CatBoost with class weights
cat_clf_weighted = CatBoostClassifier(iterations=100, auto_class_weights='Balanced', random_state=42, verbose=0)
cat_clf_weighted.fit(X_train, y_train)
y_pred_weighted = cat_clf_weighted.predict(X_test)
f1_weighted = f1_score(y_test, y_pred_weighted)

# Print F1-scores
print(f"CatBoost Classifier F1-Score (No Weights): {f1_no_weight:.4f}")
print(f"CatBoost Classifier F1-Score (Balanced Weights): {f1_weighted:.4f}")

ModuleNotFoundError: No module named 'catboost'

In [25]:
# 29. Train an AdaBoost Classifier and analyze the effect of different learning rates.

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset
data = load_iris()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Test different learning rates
learning_rates = [0.1, 0.5, 1.0, 2.0]

# Store accuracies
for lr in learning_rates:
    adaboost_clf = AdaBoostClassifier(
        estimator=DecisionTreeClassifier(max_depth=1, random_state=42),
        n_estimators=50,
        learning_rate=lr,
        random_state=42
    )
    adaboost_clf.fit(X_train, y_train)
    y_pred = adaboost_clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"AdaBoost Classifier (learning_rate={lr}) Accuracy: {accuracy:.4f}")

AdaBoost Classifier (learning_rate=0.1) Accuracy: 1.0000
AdaBoost Classifier (learning_rate=0.5) Accuracy: 0.9667
AdaBoost Classifier (learning_rate=1.0) Accuracy: 0.9333
AdaBoost Classifier (learning_rate=2.0) Accuracy: 0.8667


In [26]:
# 30. Train an XGBoost Classifier for multi-class classification and evaluate using log-loss.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss
import xgboost as xgb

# Load the Iris dataset
data = load_iris()
X, y = data.data, data.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train XGBoost Classifier
xgb_clf = xgb.XGBClassifier(n_estimators=100, objective='multi:softprob', eval_metric='mlogloss', random_state=42)
xgb_clf.fit(X_train, y_train)

# Predict probabilities and calculate log-loss
y_pred_proba = xgb_clf.predict_proba(X_test)
logloss = log_loss(y_test, y_pred_proba)

# Print log-loss
print("XGBoost Classifier Log-Loss: {:.4f}".format(logloss))

XGBoost Classifier Log-Loss: 0.0093
