*1.  What is Boosting in Machine Learning?*

Boosting is an ensemble machine learning technique that aims to improve the performance of weak learners (typically simple models like decision trees) by combining them into a single strong learner. The core idea is to train models sequentially, where each new model focuses on correcting the errors made by the previous ones.

*2.  How does Boosting differ from Bagging ?*

Boosting and Bagging are both ensemble learning techniques used to improve the performance of machine learning models by combining multiple learners. However, they differ significantly in how they build and combine these learners.

Key Differences Between Boosting and Bagging

| Feature                 | **Bagging**                                                                     | **Boosting**                                                            |
| ----------------------- | ------------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
| **Goal**                | Reduce variance (prevent overfitting)                                           | Reduce bias (improve model accuracy)                                    |
| **Learning Process**    | Learners are trained **independently** in **parallel**                          | Learners are trained **sequentially**, each depending on the previous   |
| **Data Sampling**       | Uses **bootstrap sampling** (random samples with replacement)                   | No bootstrap sampling; each model focuses on errors from previous model |
| **Model Focus**         | All models get equal weight                                                     | Later models focus more on previously misclassified examples            |
| **Combining Models**    | Typically uses **averaging** (regression) or **majority vote** (classification) | Uses **weighted sum** of predictions                                    |
| **Common Algorithms**   | Random Forest (bagging of decision trees)                                       | AdaBoost, Gradient Boosting, XGBoost, LightGBM                          |
| **Risk of Overfitting** | Lower (good for high variance models)                                           | Higher (can overfit if not tuned properly)                              |


*3.  What is the key idea behind AdaBoost?*

Key Idea Behind AdaBoost (Adaptive Boosting):
AdaBoost improves the performance of weak learners by focusing on the mistakes made by previous models and adjusting the weights of the training data accordingly. Each new model tries to correct the errors of the combined previous models.

Core Concepts of AdaBoost:
Weak Learners:

Typically shallow decision trees (stumps).

Each performs only slightly better than random guessing.

Weighted Training:

Initially, all data points are equally weighted.

After each round, weights of misclassified points increase, so the next model focuses more on them.

Model Combination:

After all learners are trained, their predictions are combined using a weighted vote, where more accurate learners have more influence.

Adaptivity:

The name “Adaptive Boosting” comes from the way the algorithm adapts to the mistakes of earlier learners.

*4.  Explain the working of AdaBoost with an example?*

AdaBoost learns from mistakes and builds a strong classifier by combining weak ones that focus on different subsets of the data.

AdaBoost Step-by-Step Example
✅ Problem: Simple binary classification
Imagine we have the following dataset:

Instance	Feature (X)	Label (Y)
1	Red	+1
2	Red	+1
3	Blue	-1
4	Blue	-1
5	Blue	+1

We’ll train weak learners (decision stumps) using AdaBoost for 3 rounds.

🧩 Step 1: Initialize Weights
Since we have 5 instances, each gets an equal weight:

𝑤
𝑖
=
1
5
=
0.2
w
i
​
 =
5
1
​
 =0.2
🧩 Step 2: Train First Weak Learner
Let’s say the first decision stump is:

mathematica
Copy
Edit
If color == Red → Predict +1
Else → Predict -1
Predictions:

Instance	X	True Y	Predicted Y	Correct?
1	Red	+1	+1	✅
2	Red	+1	+1	✅
3	Blue	-1	-1	✅
4	Blue	-1	-1	✅
5	Blue	+1	-1	❌

Total error (ε₁) = weight of misclassified = 0.2

Model weight (α₁):

𝛼
1
=
1
2
ln
⁡
(
1
−
𝜖
1
𝜖
1
)
=
1
2
ln
⁡
(
0.8
0.2
)
=
0.5
⋅
ln
⁡
(
4
)
≈
0.693
α
1
​
 =
2
1
​
 ln(
ϵ
1
​

1−ϵ
1
​

​
 )=
2
1
​
 ln(
0.2
0.8
​
 )=0.5⋅ln(4)≈0.693
🧩 Step 3: Update Weights
Misclassified point (instance 5) → increase its weight.

Correctly classified points → decrease their weights.

Update rule:

𝑤
𝑖
𝑛
𝑒
𝑤
=
𝑤
𝑖
⋅
𝑒
−
𝛼
𝑦
𝑖
ℎ
𝑖
(
𝑥
𝑖
)
w
i
new
​
 =w
i
​
 ⋅e
−αy
i
​
 h
i
​
 (x
i
​
 )

Normalize all weights so they sum to 1.

Result: instance 5 gets higher weight in the next round.

🧩 Step 4: Train Second Weak Learner
The next learner focuses more on instance 5. Suppose it predicts +1 for Blue this time.

Predictions:

This time it may classify instance 5 correctly and another one incorrectly, based on the new weights.

Repeat steps to compute new error (ε₂), model weight (α₂), and update weights again.

🧩 Step 5: Final Model
After T rounds, the final prediction is:

𝐻
(
𝑥
)
=
sign
(
∑
𝑡
=
1
𝑇
𝛼
𝑡
⋅
ℎ
𝑡
(
𝑥
)
)
H(x)=sign(
t=1
∑
T
​
 α
t
​
 ⋅h
t
​
 (x))
Each weak learner’s vote is weighted by its accuracy (α).

More accurate learners have more influence on the final result.

*5.  What is Gradient Boosting, and how is it different from AdaBoost?*

Gradient Boosting is an ensemble machine learning technique that builds a strong model by sequentially adding weak learners (usually decision trees), where each new learner is trained to minimize the errors (residuals) of the current model using gradient descent.

Instead of adjusting sample weights (as in AdaBoost), Gradient Boosting fits each new model to the negative gradient (i.e., residual errors) of the loss function with respect to the current model’s predictions.

Gradient Boosting vs. AdaBoost: Key Differences


| Feature                 | **AdaBoost**                                | **Gradient Boosting**                         |
| ----------------------- | ------------------------------------------- | --------------------------------------------- |
| **Error Handling**      | Adjusts weights of misclassified samples    | Fits the new model to the residual errors     |
| **Loss Function**       | Typically exponential loss (binary)         | Any differentiable loss (e.g., MSE, log loss) |
| **Optimization Method** | Heuristic (sample reweighting)              | Gradient descent in function space            |
| **Flexibility**         | Mostly for classification                   | Works well for regression & classification    |
| **Model Focus**         | Learner focuses on hard-to-classify samples | Learner fits to gradient of loss (residuals)  |
| **Interpretability**    | Slightly more intuitive                     | More flexible but harder to interpret         |
| **Performance**         | Fast and effective for simple tasks         | Often better for complex tasks (with tuning)  |


*6.  What is the loss function in Gradient Boosting?*

In Gradient Boosting, the loss function quantifies how well the model is performing. The algorithm uses this loss function to compute gradients (i.e., errors) and guides the model to reduce those errors in successive steps — just like in regular gradient descent.

Common Loss Functions in Gradient Boosting
1. For Regression Tasks:
Loss Function	Formula	Gradient (used as residual)
Squared Error (MSE)
𝐿
(
𝑦
,
𝑦
^
)
=
1
2
(
𝑦
−
𝑦
^
)
2
L(y,
y
^
​
 )=
2
1
​
 (y−
y
^
​
 )
2

−
∂
𝐿
∂
𝑦
^
=
𝑦
−
𝑦
^
−
∂
y
^
​

∂L
​
 =y−
y
^
​

Absolute Error (MAE)	( L(y, \hat{y}) =	y - \hat{y}
Huber Loss	Combines MSE and MAE for robustness	Varies depending on residual size

2. For Classification Tasks:
Problem Type	Loss Function	Notes
Binary Classification	Log Loss (Binary Cross-Entropy):
𝐿
(
𝑦
,
𝑝
^
)
=
−
[
𝑦
log
⁡
𝑝
^
+
(
1
−
𝑦
)
log
⁡
(
1
−
𝑝
^
)
]
L(y,
p
^
​
 )=−[ylog
p
^
​
 +(1−y)log(1−
p
^
​
 )]	Most common; smooth, differentiable
Multiclass Classification	Multiclass Log Loss	Uses softmax and cross-entropy for multiple classes

🧠 Intuition:
Loss function tells you how wrong your model is.

Gradient of that loss tells you how to fix it.

Each weak learner is trained to predict the gradient (the residual).

🔄 Example (Squared Error for Regression):
Suppose the current prediction is
𝑦
^
𝑖
y
^
​
  
i
​


True value is
𝑦
𝑖
y
i
​


Loss:

𝐿
(
𝑦
𝑖
,
𝑦
^
𝑖
)
=
1
2
(
𝑦
𝑖
−
𝑦
^
𝑖
)
2
L(y
i
​
 ,
y
^
​
  
i
​
 )=
2
1
​
 (y
i
​
 −
y
^
​
  
i
​
 )
2

Gradient (residual):

𝑟
𝑖
=
𝑦
𝑖
−
𝑦
^
𝑖
r
i
​
 =y
i
​
 −
y
^
​
  
i
​

The weak learner is trained to fit these residuals, then the model is updated.

🛠 Custom Loss Functions
Advanced gradient boosting frameworks like XGBoost, LightGBM, and CatBoost allow you to:

Use custom loss functions (as long as they’re differentiable).

Control training with regularization and other constraints.

*7.  How does XGBoost improve over traditional Gradient Boosting?*

XGBoost (Extreme Gradient Boosting) is a powerful and widely used implementation of gradient boosting. It significantly improves over traditional Gradient Boosting (like scikit-learn's GradientBoostingClassifier) in speed, accuracy, and efficiency, thanks to several technical innovations.

Key Improvements of XGBoost over Traditional Gradient Boosting

| Feature                       | Traditional Gradient Boosting    | XGBoost                                                                                        |
| ----------------------------- | -------------------------------- | ---------------------------------------------------------------------------------------------- |
| **Optimization Method**       | Uses plain gradient descent      | Uses **second-order gradient descent** (uses both gradient and Hessian for better convergence) |
| **Regularization**            | Usually not included             | Includes **L1 & L2 regularization** to reduce overfitting                                      |
| **Handling Missing Data**     | Often needs preprocessing        | **Built-in handling** of missing values                                                        |
| **Tree Construction**         | Level-wise                       | **Depth-wise (greedy)** and more efficient                                                     |
| **Parallelization**           | Typically not parallelized       | **Parallel processing** of trees for faster training                                           |
| **Shrinkage (Learning Rate)** | Manual                           | Supported with advanced tuning                                                                 |
| **Feature Importance**        | Basic support                    | Detailed **gain/cover/frequency-based** importance metrics                                     |
| **Early Stopping**            | Limited                          | Fully supported                                                                                |
| **Cross-validation**          | External (manual with `sklearn`) | **Built-in CV** support                                                                        |


*8.  What is the difference between XGBoost and CatBoost?*

 XGBoost and CatBoost are both gradient boosting frameworks designed for performance and accuracy — but they differ in several important ways, especially in how they handle categorical data, efficiency, and ease of use.

 XGBoost vs CatBoost – Key Differences

 | Feature                  | **XGBoost**                                                | **CatBoost**                                                            |
| ------------------------ | ---------------------------------------------------------- | ----------------------------------------------------------------------- |
| **Developer**            | DMLC (Distributed ML Community)                            | Yandex (Russian tech company)                                           |
| **Categorical Features** | Must be manually encoded (e.g., one-hot or label encoding) | **Natively handles** categorical variables with automatic encoding      |
| **Training Speed**       | Fast, but may need tuning                                  | Often faster **out-of-the-box** for categorical data                    |
| **Default Accuracy**     | High, needs preprocessing                                  | **Very high**, minimal preprocessing needed                             |
| **Missing Values**       | Handled well (auto-learn split directions)                 | Also handled automatically                                              |
| **Overfitting Control**  | L1/L2 regularization, tree pruning                         | Similar regularization + **ordered boosting** for better generalization |
| **Ease of Use**          | Requires manual preprocessing                              | Very **user-friendly**, less data prep                                  |
| **Support for Text**     | Not natively                                               | **Supports text** features (tokenization, embeddings)                   |
| **GPU Support**          | Yes (with proper config)                                   | Yes, **optimized for GPU** training                                     |
| **Interpretability**     | SHAP, feature importance                                   | SHAP, built-in visualization tools                                      |


*9. What are some real-world applications of Boosting techniques?*

Boosting techniques — especially Gradient Boosting, XGBoost, LightGBM, and CatBoost — are widely used in real-world machine learning due to their high predictive accuracy, robustness, and versatility. Here are some common and impactful real-world applications:

Real-World Applications of Boosting
1. 🏦 Finance
Credit scoring: Predicting creditworthiness of loan applicants.

Fraud detection: Identifying anomalous transactions.

Stock price prediction: Modeling time-series behavior for short-term forecasting.

✅ Why Boosting?
Handles complex, non-linear relationships and class imbalance very well (e.g., fraud detection).

2. 🏥 Healthcare
Disease diagnosis: Predicting diseases based on symptoms, lab results, or imaging data.

Patient risk prediction: Estimating the likelihood of readmission or complications.

Drug discovery: Modeling chemical properties and biological activity.

✅ Why Boosting?
Delivers high accuracy and works well with structured medical records and lab data.

3. 🛒 E-Commerce & Retail
Customer churn prediction: Identifying customers likely to stop using a service.

Product recommendation systems: Personalizing shopping experience.

Demand forecasting: Predicting future sales based on historical patterns.

✅ Why Boosting?
Excels with tabular data like customer history, product details, and transaction logs.

4. 🎯 Marketing & Advertising
Click-through rate (CTR) prediction: Optimizing which ads to show.

Lead scoring: Identifying which prospects are most likely to convert.

Campaign optimization: Predicting outcomes of different ad strategies.

✅ Why Boosting?
CatBoost and LightGBM are especially strong for categorical-heavy ad and user data.

5. 🎓 Education
Student performance prediction: Identifying students at risk of failing.

Personalized learning: Adapting material based on predicted learning pace.

6. 🏢 Human Resources
Resume screening: Predicting job fit or success probability.

Attrition prediction: Forecasting which employees are likely to leave.

7. 🚗 Transportation & Logistics
ETA prediction: Estimating delivery or arrival times.

Route optimization: Predicting traffic congestion or delivery delays.

8. 📱 Telecommunications
Network failure prediction: Proactively identifying infrastructure issues.

Customer segmentation: Understanding usage behavior and tailoring plans.

9. 🏛️ Government & Public Policy
Tax fraud detection

Predictive policing

Welfare eligibility models

*10.  How does regularization help in XGBoost?*

Regularization in XGBoost plays a crucial role in improving generalization and reducing the risk of overfitting — especially when working with complex models or noisy data.

Regularization adds a penalty to the model's complexity in the objective function, discouraging overly complex models (e.g., very deep trees or high weights). This leads to simpler, more generalizable models.

XGBoost’s objective function is:

Obj
=
∑
𝑖
=
1
𝑛
𝐿
(
𝑦
𝑖
,
𝑦
^
𝑖
)
+
∑
𝑘
=
1
𝐾
Ω
(
𝑓
𝑘
)
Obj=
i=1
∑
n
​
 L(y
i
​
 ,
y
^
​
  
i
​
 )+
k=1
∑
K
​
 Ω(f
k
​
 )
Where:

𝐿
(
𝑦
𝑖
,
𝑦
^
𝑖
)
L(y
i
​
 ,
y
^
​
  
i
​
 ): loss function (e.g., log loss, MSE)

Ω
(
𝑓
𝑘
)
Ω(f
k
​
 ): regularization term for tree
𝑓
𝑘
f
k
​

Regularization Helps :

1.  Reduces Overfitting
Penalizing large trees or large weights prevents the model from fitting noise in the data.

2.  Improves Generalization
Encourages simpler models that perform better on unseen data.

3.  Controls Model Complexity
Parameters like max_depth, gamma, lambda, and alpha explicitly regulate how complex trees can grow.

4.  Encourages Sparsity
L1 regularization (alpha) can zero out unnecessary leaf weights, leading to sparse models.

*11.  What are some hyperparameters to tune in Gradient Boosting models?*

Tuning hyperparameters in Gradient Boosting models is key to getting the best performance while avoiding overfitting or underfitting. Here are some of the most important hyperparameters to tune, applicable across popular implementations like scikit-learn’s GradientBoosting, XGBoost, LightGBM, and CatBoost (though names may vary slightly).

 Key Hyperparameters to Tune in Gradient Boosting Models:

 | Hyperparameter                               | Description                                                                                                                     | Effect on Model                                                                       |
| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
| **n\_estimators**                            | Number of boosting rounds (trees)                                                                                               | More trees can improve performance but risk overfitting                               |
| **learning\_rate (eta)**                     | Step size shrinkage applied to each tree                                                                                        | Smaller values require more trees; controls overfitting                               |
| **max\_depth**                               | Maximum depth of each tree                                                                                                      | Controls tree complexity; deeper trees fit more complex patterns but risk overfitting |
| **min\_samples\_split / min\_child\_weight** | Minimum samples required to split an internal node (scikit-learn) / Minimum sum of instance weights needed in a child (XGBoost) | Prevents splitting on noisy data; higher values make trees more conservative          |
| **min\_samples\_leaf**                       | Minimum samples required to be at a leaf node                                                                                   | Prevents leaves with very few samples                                                 |
| **subsample**                                | Fraction of samples used for fitting each tree                                                                                  | Introduces randomness; helps reduce overfitting                                       |
| **colsample\_bytree / max\_features**        | Fraction of features used for each tree                                                                                         | Random feature selection; helps reduce overfitting                                    |
| **gamma / min\_split\_loss**                 | Minimum loss reduction required to make a split (XGBoost)                                                                       | Controls tree pruning; larger values make splitting stricter                          |
| **reg\_alpha**                               | L1 regularization on leaf weights (XGBoost, LightGBM)                                                                           | Encourages sparsity; helps with feature selection                                     |
| **reg\_lambda**                              | L2 regularization on leaf weights (XGBoost, LightGBM)                                                                           | Smooths leaf weights; helps prevent overfitting                                       |

*12.  What is the concept of Feature Importance in Boosting?*

Feature Importance in boosting models helps you understand which input features contribute the most to the model’s predictions. It’s a key tool for interpreting complex models like Gradient Boosting, XGBoost, LightGBM, and CatBoost.

Feature importance measures how valuable each feature is for making accurate predictions in the model. It answers questions like:

Which features influence the outcome most?

Which features can be safely ignored or removed?

How to better understand model decisions?

Feature Importance in Popular Boosting Libraries:

| Library                      | How to Get Feature Importance                                                                       |
| ---------------------------- | --------------------------------------------------------------------------------------------------- |
| **XGBoost**                  | `model.get_score(importance_type='gain')` or `'weight'` or `'cover'`                                |
| **LightGBM**                 | `model.feature_importance(importance_type='gain')` or `'split'`                                     |
| **CatBoost**                 | `model.get_feature_importance()` with types like `'PredictionValuesChange'`, `'LossFunctionChange'` |
| **sklearn GradientBoosting** | `model.feature_importances_` (based on mean decrease impurity)                                      |


*13.  Why is CatBoost efficient for categorical data?*

CatBoost is specifically designed to handle categorical features efficiently and effectively, which sets it apart from many other boosting algorithms like XGBoost or LightGBM that require manual preprocessing of categorical data.

 CatBoost Efficient for Categorical Data :

1. Native Categorical Feature Support
CatBoost directly accepts categorical features without needing manual encoding like one-hot or label encoding.

This avoids the curse of dimensionality and sparseness issues common with one-hot encoding.

2. Ordered Target Statistics (Ordered Boosting)
CatBoost uses a smart technique called ordered target statistics or ordered boosting:

It replaces categories with statistics computed on the training target (like the average target value for each category).

To avoid target leakage, these statistics are computed in an online fashion, only using data from previous examples, preserving the proper training data distribution.

3. Efficient Handling of High-Cardinality Categories
Unlike naive encoding, CatBoost can efficiently handle categorical features with many unique values (high cardinality).

It learns useful representations without exploding feature space.

4. Reduced Overfitting on Categorical Data
The ordered boosting approach reduces overfitting that can occur when naive target encoding is used.

This leads to better generalization.

5. Built-in Support for Missing Values in Categories
Missing categorical data is handled gracefully without needing explicit imputation.

In [None]:
# 14.  Train an AdaBoost Classifier on a sample dataset and print model accuracy

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize AdaBoost classifier with default parameters
model = AdaBoostClassifier(random_state=42)

# Train the model
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

print(f"AdaBoost Classifier Accuracy: {accuracy:.4f}")


In [None]:
# 15.  Train an AdaBoost Regressor and evaluate performance using Mean Absolute Error (MAE)

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import mean_absolute_error

# Load dataset
data = load_boston()
X, y = data.data, data.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize AdaBoost regressor
model = AdaBoostRegressor(random_state=42)

# Train the model
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Calculate Mean Absolute Error
mae = mean_absolute_error(y_test, y_pred)

print(f"AdaBoost Regressor Mean Absolute Error: {mae:.4f}")


In [None]:
# 16.  Train a Gradient Boosting Classifier on the Breast Cancer dataset and print feature importance

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target
feature_names = data.feature_names

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train the Gradient Boosting Classifier
model = GradientBoostingClassifier(random_state=42)
model.fit(X_train, y_train)

# Get feature importances
importances = model.feature_importances_

# Create a DataFrame for better visualization
feat_imp_df = pd.DataFrame({
    'Feature': feature_names,
    'Importance': importances
}).sort_values(by='Importance', ascending=False)

print("Feature Importances:")
print(feat_imp_df)

# Optional: plot feature importance
plt.figure(figsize=(10,6))
plt.barh(feat_imp_df['Feature'], feat_imp_df['Importance'])
plt.xlabel('Feature Importance')
plt.title('Feature Importance in Gradient Boosting Classifier')
plt.gca().invert_yaxis()
plt.show()


In [None]:
# 17.  Train a Gradient Boosting Regressor and evaluate using R-Squared Score

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import r2_score

# Load dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize Gradient Boosting Regressor
model = GradientBoostingRegressor(random_state=42)

# Train the model
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Calculate R-squared score
r2 = r2_score(y_test, y_pred)

print(f"Gradient Boosting Regressor R-squared Score: {r2:.4f}")


In [None]:
# 18.  Train an XGBoost Classifier on a dataset and compare accuracy with Gradient Boosting

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize models
gb_model = GradientBoostingClassifier(random_state=42)
xgb_model = XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=42)

# Train Gradient Boosting
gb_model.fit(X_train, y_train)
gb_preds = gb_model.predict(X_test)
gb_accuracy = accuracy_score(y_test, gb_preds)

# Train XGBoost
xgb_model.fit(X_train, y_train)
xgb_preds = xgb_model.predict(X_test)
xgb_accuracy = accuracy_score(y_test, xgb_preds)

print(f"Gradient Boosting Classifier Accuracy: {gb_accuracy:.4f}")
print(f"XGBoost Classifier Accuracy: {xgb_accuracy:.4f}")


In [None]:
# 19.  Train a CatBoost Classifier and evaluate using F1-Score

from catboost import CatBoostClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize CatBoost Classifier (silent=True to suppress training output)
model = CatBoostClassifier(random_state=42, silent=True)

# Train the model
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Calculate F1-Score
f1 = f1_score(y_test, y_pred)

print(f"CatBoost Classifier F1-Score: {f1:.4f}")


In [None]:
#20.  Train an XGBoost Regressor and evaluate using Mean Squared Error (MSE)

from xgboost import XGBRegressor
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize XGBoost Regressor
model = XGBRegressor(random_state=42)

# Train the model
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)

print(f"XGBoost Regressor Mean Squared Error: {mse:.4f}")


In [None]:
 # 21.  Train an AdaBoost Classifier and visualize feature importance

import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target
feature_names = data.feature_names

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train AdaBoost Classifier
model = AdaBoostClassifier(random_state=42)
model.fit(X_train, y_train)

# Get feature importances
importances = model.feature_importances_

# Sort feature importances in descending order
indices = np.argsort(importances)[::-1]

# Plot feature importances
plt.figure(figsize=(10,6))
plt.title("Feature Importances from AdaBoost Classifier")
plt.bar(range(len(importances)), importances[indices], align="center")
plt.xticks(range(len(importances)), feature_names[indices], rotation=90)
plt.xlabel("Features")
plt.ylabel("Importance")
plt.tight_layout()
plt.show()


In [None]:
# 22.  Train a Gradient Boosting Regressor and plot learning curves

import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize Gradient Boosting Regressor with warm_start=True to track progress
model = GradientBoostingRegressor(n_estimators=200, random_state=42)

# Train model and record training and validation errors at each iteration
train_errors = []
val_errors = []

for n_estimators in range(1, 201):
    model.set_params(n_estimators=n_estimators)
    model.fit(X_train, y_train)

    train_pred = model.predict(X_train)
    val_pred = model.predict(X_val)

    train_mse = mean_squared_error(y_train, train_pred)
    val_mse = mean_squared_error(y_val, val_pred)

    train_errors.append(train_mse)
    val_errors.append(val_mse)

# Plot learning curves
plt.figure(figsize=(10,6))
plt.plot(range(1, 201), train_errors, label='Training MSE')
plt.plot(range(1, 201), val_errors, label='Validation MSE')
plt.xlabel('Number of Trees (Estimators)')
plt.ylabel('Mean Squared Error')
plt.title('Learning Curves for Gradient Boosting Regressor')
plt.legend()
plt.show()


In [None]:
# 23.  Train an XGBoost Classifier and visualize feature importance

import matplotlib.pyplot as plt
from xgboost import XGBClassifier, plot_importance
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target
feature_names = data.feature_names

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train XGBoost Classifier
model = XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=42)
model.fit(X_train, y_train)

# Plot feature importance
plt.figure(figsize=(10,8))
plot_importance(model, max_num_features=15, height=0.8, importance_type='gain', xlabel='Gain', show_values=False)
plt.title('Feature Importance - XGBoost Classifier')
plt.show()


In [None]:
# 24.  Train a CatBoost Classifier and plot the confusion matrix

from catboost import CatBoostClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train CatBoost Classifier
model = CatBoostClassifier(random_state=42, silent=True)
model.fit(X_train, y_train)

# Predict on test set
y_pred = model.predict(X_test)

# Compute confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Plot confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=data.target_names)
disp.plot(cmap=plt.cm.Blues)
plt.title('CatBoost Classifier Confusion Matrix')
plt.show()


In [None]:
# 25.  Train an AdaBoost Classifier with different numbers of estimators and compare accuracy

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Different numbers of estimators to try
estimators_list = [10, 50, 100, 200, 300]

accuracies = []

for n_estimators in estimators_list:
    model = AdaBoostClassifier(n_estimators=n_estimators, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    accuracies.append(acc)
    print(f"Estimators: {n_estimators} - Accuracy: {acc:.4f}")

# Plot the results
plt.figure(figsize=(8,5))
plt.plot(estimators_list, accuracies, marker='o')
plt.title('AdaBoost Accuracy vs Number of Estimators')
plt.xlabel('Number of Estimators')
plt.ylabel('Accuracy')
plt.grid(True)
plt.show()


In [None]:
# 26.  Train a Gradient Boosting Classifier and visualize the ROC curve

import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve, auc

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Gradient Boosting Classifier
model = GradientBoostingClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict probabilities for the positive class
y_probs = model.predict_proba(X_test)[:, 1]

# Compute ROC curve and AUC
fpr, tpr, thresholds = roc_curve(y_test, y_probs)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure(figsize=(8,6))
plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC curve (AUC = {roc_auc:.3f})')
plt.plot([0, 1], [0, 1], color='gray', lw=1, linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve - Gradient Boosting Classifier')
plt.legend(loc='lower right')
plt.grid(True)
plt.show()


In [None]:
# 27.  Train an XGBoost Regressor and tune the learning rate using GridSearchCV

from xgboost import XGBRegressor
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error

# Load dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize XGBoost Regressor
xgb = XGBRegressor(random_state=42)

# Define parameter grid to tune learning rate
param_grid = {
    'learning_rate': [0.01, 0.05, 0.1, 0.2, 0.3]
}

# Setup GridSearchCV
grid_search = GridSearchCV(estimator=xgb, param_grid=param_grid, cv=3, scoring='neg_mean_squared_error', verbose=1)

# Run grid search
grid_search.fit(X_train, y_train)

# Best parameters and best score
print(f"Best learning rate: {grid_search.best_params_['learning_rate']}")
print(f"Best CV MSE: {-grid_search.best_score_:.4f}")

# Evaluate on test set with the best model
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
test_mse = mean_squared_error(y_test, y_pred)
print(f"Test set Mean Squared Error: {test_mse:.4f}")


In [None]:
# 28.  Train a CatBoost Classifier on an imbalanced dataset and compare performance with class weighting

from catboost import CatBoostClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import numpy as np

# Create imbalanced dataset
X, y = make_classification(
    n_samples=5000,
    n_features=20,
    n_classes=2,
    weights=[0.9, 0.1],  # 90% of class 0, 10% of class 1 (minority)
    flip_y=0,
    random_state=42
)

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train CatBoost without class weights
model_no_weights = CatBoostClassifier(random_state=42, silent=True)
model_no_weights.fit(X_train, y_train)
pred_no_weights = model_no_weights.predict(X_test)

# Compute class weights manually
# Weight for class i = total_samples / (num_classes * samples_in_class_i)
class_counts = np.bincount(y_train)
total = len(y_train)
num_classes = len(class_counts)
class_weights = {i: total / (num_classes * count) for i, count in enumerate(class_counts)}

# Train CatBoost with class weights
model_with_weights = CatBoostClassifier(class_weights=class_weights, random_state=42, silent=True)
model_with_weights.fit(X_train, y_train)
pred_with_weights = model_with_weights.predict(X_test)

# Compare performance using classification report (includes precision, recall, F1)
print("Without Class Weights:")
print(classification_report(y_test, pred_no_weights))

print("With Class Weights:")
print(classification_report(y_test, pred_with_weights))


In [None]:
# 29.  Train an AdaBoost Classifier and analyze the effect of different learning rates

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Different learning rates to try
learning_rates = [0.01, 0.05, 0.1, 0.2, 0.5, 1]

accuracies = []

for lr in learning_rates:
    model = AdaBoostClassifier(learning_rate=lr, n_estimators=50, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    accuracies.append(acc)
    print(f"Learning Rate: {lr} - Accuracy: {acc:.4f}")

# Plot the results
plt.figure(figsize=(8,5))
plt.plot(learning_rates, accuracies, marker='o')
plt.title('AdaBoost Accuracy vs Learning Rate')
plt.xlabel('Learning Rate')
plt.ylabel('Accuracy')
plt.grid(True)
plt.show()


In [None]:
# 30.  Train an XGBoost Classifier for multi-class classification and evaluate using log-loss.

from xgboost import XGBClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize XGBoost Classifier for multi-class (use softprob for probabilities)
model = XGBClassifier(
    objective='multi:softprob',
    num_class=3,
    use_label_encoder=False,
    eval_metric='mlogloss',
    random_state=42
)

# Train model
model.fit(X_train, y_train)

# Predict probabilities for test set
y_proba = model.predict_proba(X_test)

# Calculate log-loss
loss = log_loss(y_test, y_proba)
print(f"Multi-class Log Loss: {loss:.4f}")
