<h1 style="background-color: #f8f0fa;
            border-left: 5px solid #1b4332;
            font-family: 'Trebuchet MS', sans-serif;
            border-right: 5px solid #1b4332;
            padding: 12px;
            border-radius: 50px 50px;
            color: #1b4332;
            text-align:center;
            font-size:45px;"><strong>😊XGBoost Algorithm Algorithm🌟</strong></h1>
<hr style="border-top: 5px solid #264653;">

This notebook demonstrates the implementation of **XGBoost (Extreme Gradient Boosting)**, 
a powerful gradient boosting method commonly used for structured/tabular data. 
It provides efficient, accurate predictions by combining decision trees with boosting techniques.

In this example, we’ll build a simple classification model using XGBoost on a small dataset, 
demonstrating data preparation, model training, and evaluation.



## Key Concepts in XGBoost

1. **Tree Building**: XGBoost builds sequential decision trees to minimize residuals from previous iterations.

2. **Regularization**: It includes L1 and L2 regularization to control overfitting.

3. **Learning Rate**: Controls the contribution of each new tree, allowing smooth convergence.

4. **Hyperparameters**: Adjusting parameters like max depth and subsample fraction improves model flexibility.

5. **Early Stopping**: Stops training when performance on validation data no longer improves.



## Example Dataset

This example uses a small classification dataset with **10 samples** and **3 features**.

| Sample | Feature 1 | Feature 2 | Feature 3 | Target |
|--------|-----------|-----------|-----------|--------|
| 1      | 1.0       | 2.0       | 3.0       | 1      |
| 2      | 2.0       | 1.0       | 2.0       | 0      |
| 3      | 3.0       | 2.0       | 1.0       | 1      |
| 4      | 1.0       | 3.0       | 2.0       | 0      |
| 5      | 2.0       | 1.0       | 1.0       | 1      |
| 6      | 3.0       | 3.0       | 2.0       | 0      |
| 7      | 1.0       | 1.0       | 1.0       | 1      |
| 8      | 2.0       | 2.0       | 3.0       | 0      |
| 9      | 3.0       | 1.0       | 2.0       | 1      |
| 10     | 1.0       | 3.0       | 3.0       | 0      |


In [None]:

import pandas as pd

# Define the dataset
data = {
    'Sample': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Feature_1': [1.0, 2.0, 3.0, 1.0, 2.0, 3.0, 1.0, 2.0, 3.0, 1.0],
    'Feature_2': [2.0, 1.0, 2.0, 3.0, 1.0, 3.0, 1.0, 2.0, 1.0, 3.0],
    'Feature_3': [3.0, 2.0, 1.0, 2.0, 1.0, 2.0, 1.0, 3.0, 2.0, 3.0],
    'Target': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0]
}

df = pd.DataFrame(data)
df.set_index('Sample', inplace=True)
df



## Train-Test Split

Split the data into training and testing sets to evaluate model performance on unseen data.


In [None]:

from sklearn.model_selection import train_test_split

# Split data into features (X) and target (y)
X = df[['Feature_1', 'Feature_2', 'Feature_3']]
y = df['Target']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
X_train, X_test, y_train, y_test



## Step 1: Train XGBoost Model

Train an XGBoost model using the training data, setting key parameters.


In [None]:

import xgboost as xgb
from sklearn.metrics import accuracy_score

# Initialize the XGBoost classifier
xgb_model = xgb.XGBClassifier(objective='binary:logistic', max_depth=3, learning_rate=0.1, n_estimators=10)

# Train the model
xgb_model.fit(X_train, y_train)



## Step 2: Predict and Evaluate

Predict on the test set and evaluate using accuracy.


In [None]:

# Predict on the test set
y_pred = xgb_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Test Set Accuracy:", accuracy)
