### Extreme Gradient Boosting with XGBoost

#### Chapter 1. Classification with XGBoost

##### PART 1.1 Introducing XGBoost

What makes XGBoost so popular?

1. speed and power
2. core algorithm is parallelizable
3. consistently outperforms single-algorithm methods
4. state-of-art performance in many Machine Learning tasks

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
import xgboost as xgb
import numpy as np

# load data
churn_data = pd.read_csv('')

# create arrays for features and target: X, y
X, y = churn_data.iloc[:, :-1], churn_data.iloc[:, -1]

# create the training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2, 
                                                    random_state=123)

# instantiate a XGBClassifier
xg_cl = xgb.XGBClassifier(objective='binary:logistic',
                         n_estimators=10,
                         seed=123)

# fit the classifiter to the training set
xg_cl.fit(X_train, y_train)

# predict the labels of the test set
preds = xg_cl.predict(X_test)

# compute the accuracy
accuracy = float(np.sum(preds==y_test))/y_test.shape[0]
print('Accuracy: %f' %(accuracy))


##### Part 1.2 Decision Tree

Decision tree is XGBoost's base learner. A question is asked on each decision tree node and there are two possible choices on each node. At the bottom of each decision tree, there is a single possible decision. It is constructed iteratively until a stopping criterion is met.

Individual decision trees are in general low-bias, high-variance learning models. Thus, XGBoost uses **CART** (classification and regression trees), each leaf always contains a real-value score, and it can be converted into categories if necessary.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

# load data
breast_cancer = pd.read_csv('')

# create arrays for the features and target: X, y
X, y = breast_cancer.iloc[:, :-1], breast_cancer.iloc[:, -1]

# create  the training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.1,
                                                    random_state=123)

# instantiate the classifier
dt_clf_4 = DecisionTreeClassifier(max_depth=4)

# fit the classifier
dt_clf_4.fit(X_train, y_train)

# predict the labels of test set
y_pred_4 = dt_clf_4.predict(X_test)

# compute the accuracy
accuracy = float(np.sum(y_pred_4==y_test))/y_test.shape[0]
print('Accuracy: %f' %(accuracy))

##### PART 1.3 Boosting

Boosting is not a specific meachine learning algorithm, it is a concept that can be applied to a set of machine learning models, it can be called a **"meta-algorithm"**.

In short, boosting can convert a collection of weak learners into a stronger learner.

How it works?

1. Iteratively learning a set of weak models on subsets of the data
2. Weigning each weak prediction according to each weak learner's performance
3. Combine the weighed predictions to obtain a single weighted prediction
4. Then the result is much better than the individual predictions themselves!

In [None]:

# load data
churn_data = pd.read_csv('')

# create arrays for the features and the target: X, y
X, y = churn_data.iloc[:, :-1], churn_data.iloc[:, -1]

# create the DMatrix from X and y
churn_dmatrix = xgb.DMatrix(data=X, label=y)

# create the parameter dictionary for cross-validation
params = {'objective': 'reg:logistic', 'max_depth':3}

# perform cross-validation
cv_results = xgb.cv(params=params,
                    dtrain=churn_dmatrix, nfold=3, 
                    num_boost_round=5, metrics='error',
                    as_pandas=True, seed=123)
print(cv_results)

# print the accuracy
print((1-cv_results['test-error-mean']).iloc[-1])

##### Part 1.4 When should I use XGBoost?

When to use:

1. You have large number of training samples (> 1000)
2. You have a mixture of categorical and numerical features or just numerical features

When not to use:

1. Image recognition
2. Computer vision
3. Natural Language Processing (NLP) and understanding problems
4. When the number of training samples is smaller than the number of features

#### Chapter 2. Regression with XGBoost

##### Part 2.1 Regression review

What is a regression problem?
The outcome is real-valued.

Common regression metrics:
Root mean squared error (RMSE)
Mean absolute error  (MAE)

Loss functions and base learners:
Loss function quantifies how far off a prediction is from the actual result. Our goal is to minimize the loss function of all of the data points we pass in.

The loss function names in XGBoost:
1. reg:linear -> use for regressin problems
2. reg:logistic -> use for classification problems when you want just decision
3. binary:logistic -> use for classification problems when you want probability rather than decision.

Baes learners are learners that are slightly better than random guess.

In [None]:
# method 1.
# load data
df = pd.read_csv('')

# create features and target: X, y
X, y = df.iloc[:, :-1], df.iloc[:, -1]

# create the training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=123)

# instantiate the XGBRegressor
xg_reg = xgb.XGBRegressor(objective='reg:linear', 
                          n_estimators=10, 
                          random_state=123)

# fit the classifier
xg_reg.fit(X_train, y_train)

# predict the labels of the test set
preds = xg_reg.predict(X_test)

# compute the rmse
rmse = np.sqrt(mean_squared_error(y_test, preds))
print('RMSE: %f' %(rmse)