# XGBoost
🎉 Congratulations! You’ve reached the final step of your machine learning journey in this course.  
And we’re finishing strong with one of the **most powerful algorithms** in the industry: **XGBoost**.

Let’s see if this model can beat everything we’ve built so far! 💪

---

## 🚀 What is XGBoost?

**XGBoost (Extreme Gradient Boosting)** is an advanced machine learning algorithm designed for **speed and performance**. It works incredibly well for both **classification** and **regression** problems.

Why is XGBoost so powerful?

- Built-in regularization (prevents overfitting)
- Fast, efficient, and scalable
- Handles missing values automatically
- Often wins machine learning competitions!

---

> 🔄 **XGBoost stands for "Extreme Gradient Boosting"** — a cutting-edge implementation of the **Boosting** technique in Ensemble Learning.

### 🧩 Why is XGBoost an Ensemble Model?

XGBoost belongs to the family of **Ensemble Learning** algorithms. Specifically, it falls under:

* ✅ **Boosting**: A method where multiple weak learners (typically decision trees) are trained sequentially. Each new model tries to **fix the errors** made by the previous one.

This makes XGBoost **different from Bagging methods** like Random Forest:

| Feature           | Random Forest (Bagging) | XGBoost (Boosting)              |
| ----------------- | ----------------------- | ------------------------------- |
| Learning Strategy | Parallel                | Sequential                      |
| Error Handling    | Voting/Averaging        | Focuses on previous errors      |
| Regularization    | Not built-in            | Built-in L1 & L2 regularization |
| Speed             | Fast in training        | Slower but often more accurate  |

> 🧠 In short: XGBoost = multiple smart trees + error correction + regularization = **high-performance model** 💪

---

## ⭐ Importing the libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## ⭐ Importing the dataset

In [None]:
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

## ⭐ Splitting the dataset into the Training set and Test set

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

## ⭐ Training XGBoost on the Training set

In [None]:
from xgboost import XGBClassifier
classifier = XGBClassifier()
classifier.fit(X_train, y_train)

## ⭐ Making the Confusion Matrix

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

## ⭐ Applying k-Fold Cross Validation

In [None]:
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
print("Accuracy: {:.2f} %".format(accuracies.mean()*100))
print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))