### XG Boost

XGBoost is an efficient and scalable implementation of the gradient boosting framework. It is widely used for both classification and regression tasks in machine learning. XGBoost is known for its speed, performance, and regularization techniques, making it a popular choice in various data science competitions.

**Working of XGBoost:**

1. Initialize Model: Start with an initial model, often the mean of the target variable.

2. Compute Residuals: Compute the residuals by subtracting the predicted values from the actual target values.

3. Train Weak Learner on Negative Gradient: Train a weak learner (decision tree) on the negative gradient of the loss function. The negative gradient points in the direction of steepest decrease in the loss, so the weak learner is fit to correct the mistakes of the previous model.

4. Compute Learning Rate and Update Model: Compute the learning rate (shrinkage factor) and update the model by adding the predictions of the weak learner scaled by the learning rate. The learning rate controls the contribution of each weak learner to the final model.

5. Regularization: Apply regularization techniques such as tree pruning and feature importance to control overfitting and improve generalization.

6. Repeat: Repeat steps 2-5 for a predefined number of iterations or until a stopping criterion is met.

7. Final Prediction: The final prediction is the sum of the predictions from all weak learners.

**Simple Explanation:**

1. XGBoost builds a series of decision trees sequentially.
2. Each tree corrects the errors of the previous one.
3. Trees are trained on the negative gradient of the loss function.
4. Regularization techniques are used to prevent overfitting.
5. The final prediction is the sum of predictions from all trees.

In [3]:
import xgboost as xgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load breast cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an XGBoost classifier
xgb_classifier = xgb.XGBClassifier(n_estimators=50, learning_rate=0.1, random_state=42)

# Train the classifier
xgb_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = xgb_classifier.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.956140350877193
