# XGBoost
### What is XGBoost?
XGBoost stands for Extreme Graident Boosting. It is a highly optimized, distributable, scalable, graident-boosted and bagged decision tree. This is, in most situations, the best version of a decision tree. It is also very new, originating in 2016 with the publication of this paper: https://browse.arxiv.org/pdf/1603.02754.pdf.

![A graph depicting model prediction power and training time between models. XGBoost is the most powerful and the second easiest to train, only slower than a simple logistic regression.](assets/tree-performance.png)

### How does XGBoost work?
XGBoost is significantly more complicated than random forest and gradient boosting. This is one of its only downsides; it is more of a black box. The most significant change in the XGBoost algorithm is that a forest of shallow decision trees are each trained and then frankensteind together to cancel out each others' weaknesses between generations of boosting. This, combined with hardware optimizations and a distributed algorithm, makes XGBoost both faster and more accurate for most workloads.

![A history of decision trees and their development](assets/tree-history.png)

### How is XGBoost used?

XGBoost can be used in any situation where a decision tree, random forest, or boosted tree can be used. It predicts categorizations and regressions.

# XGBoost Exercise
Below is the basic code required to run XGBoost. Using what you've learned in the previous modules, expand that code with the following functionalities:
- Train the model on 2 different datasets and compare which dataset is better predicted with the model
- Change the model parameters
- Run hyperparametarization


In [2]:
# importing
# run 'pip install xgboost' so you have access to the library
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits

# loading and splitting
x, y = load_digits(return_X_y=True)
train_x, test_x, train_y, test_y = train_test_split(x, y, test_size=0.2, random_state=2)

# instantiation
model = XGBClassifier()

# training
model.fit(train_x, train_y)

# testing
model.predict(test_x)
print(f"The model has an accuracy of {model.score(test_x, test_y)*100:.3f}%")

The model has an accuracy of 95.833%
