# XGBoost - Extreme Gradient Boosting

### XGBoost is primarily designed for supervised learning tasks, particularly for tabular data, and it excels in regression and classification problems. It implements gradient boosting on decision trees and is not specifically designed for tasks commonly associated with neural networks, such as image recognition, natural language processing (NLP), or other deep learning applications. However, it’s essential to clarify a few points regarding the capabilities of XGBoost and how it can fit into the broader context of machine learning

## Importing the libraries

In [31]:
!pip install xgboost



In [32]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Importing the dataset

In [33]:
dataset = pd.read_csv('../../datasets/breast-cancer.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

## Splitting the dataset into the Training set and Test set

In [34]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

print("Unique values in y_train:", np.unique(y_train))
# Convert 2 to 0 and 4 to 1
y_train = np.where(y_train == 2, 0, np.where(y_train == 4, 1, y_train))
# Verify the conversion
print("Unique values after conversion:", np.unique(y_train))

Unique values in y_train: [2 4]
Unique values after conversion: [0 1]


## Training XGBoost on the Training set

In [35]:
# Now fit the model
from xgboost import XGBClassifier

classifier = XGBClassifier()
classifier.fit(X_train, y_train)

XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, device=None, early_stopping_rounds=None,
              enable_categorical=False, eval_metric=None, feature_types=None,
              gamma=None, grow_policy=None, importance_type=None,
              interaction_constraints=None, learning_rate=None, max_bin=None,
              max_cat_threshold=None, max_cat_to_onehot=None,
              max_delta_step=None, max_depth=None, max_leaves=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              multi_strategy=None, n_estimators=None, n_jobs=None,
              num_parallel_tree=None, random_state=None, ...)

## Making the Confusion Matrix

In [36]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[ 0  0  0  0]
 [ 0  0  0  0]
 [85  2  0  0]
 [ 1 49  0  0]]


0.0

## Applying k-Fold Cross Validation

In [37]:
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
print("Accuracy: {:.2f} %".format(accuracies.mean()*100))
print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))

Accuracy: 96.71 %
Standard Deviation: 2.28 %
