<font color="red" size="6">Ensemble methods</font>
<p> <font color="Yellow" size="5"><b>4_XGBoost</font>

XGBoost (Extreme Gradient Boosting) is a popular, highly efficient, and scalable implementation of the Gradient Boosting framework. It has become one of the most widely used algorithms in machine learning competitions due to its speed, performance, and scalability. XGBoost is designed to optimize both accuracy and computational efficiency and includes regularization to prevent overfitting.

<font color="pink" size=4>Key Features of XGBoost:</font>
<ol>
    <li><font color="orange">Gradient Boosting:</font> Like traditional gradient boosting, XGBoost builds an ensemble of weak learners (usually decision trees) in a sequential manner, where each new model focuses on the residuals (errors) of the previous model.</li>
     <li><font color="orange">Regularization:</font> XGBoost incorporates both L1 (Lasso) and L2 (Ridge) regularization to reduce overfitting and improve model generalization.</li>
     <li><font color="orange">Handling Missing Data:</font> XGBoost can handle missing data directly during training by automatically learning the best direction to split missing values.</li>
     <li><font color="orange">Parallelization:</font> XGBoost is optimized for performance and supports parallel and distributed computing for faster training.</li>
     <li><font color="orange">Tree Pruning:</font> XGBoost uses a more sophisticated approach to pruning decision trees, which helps to build deeper trees when necessary, enhancing model performance.</li>
     <li><font color="orange">Custom Objective Functions and Evaluation Metrics:</font> XGBoost allows users to define custom objective functions and evaluation metrics for specific use cases.</li></ol>

In [2]:
import xgboost as xgb
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# 1. Load the Wine dataset
data = load_wine()
X = data.data
y = data.target

# 2. Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 3. Create the XGBoost classifier
xgb_model = xgb.XGBClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

# 4. Train the model
xgb_model.fit(X_train, y_train)

# 5. Make predictions on the test set
y_pred = xgb_model.predict(X_test)

# 6. Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

# Display the classification report and confusion matrix
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))


Accuracy: 0.9815

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       0.95      1.00      0.98        21
           2       1.00      0.93      0.96        14

    accuracy                           0.98        54
   macro avg       0.98      0.98      0.98        54
weighted avg       0.98      0.98      0.98        54


Confusion Matrix:
[[19  0  0]
 [ 0 21  0]
 [ 0  1 13]]
