### Objective or Loss Function

- QUANTIFIES how far off a prediction is from an actual result.

The aim with a model is ideally to minimize this error (or loss function) for ALL of the data points that we pass through.

### Common Loss Functions for XGBoost

Regression Problems:
- reg:linear

Classification Problems:
- reg:logistic  (just decision, not probability)


### Base Learners

XGBoost involves creating a meta-model that is composed of many indiv. models that combine to give a final prediction.

THE INDIVIDUAL MODELS ARE the BASE LEARNERS

### Base Learners in XGBoost

### Two kinds of base learners:
- tree and linear

Linear Based Learners:
- sum of linear terms
- boosted model is weighted sum of linear models
- rarely used

Tree Based Learners:
- Decision Tree
- Boosted Model is weighted sum of decision trees (nonlinear)
- almost exclusively used in XGBoost

In [None]:
#example

import xgboost as xgb
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split

X, y = data.iloc[:, :-1], data.iloc[:, -1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

#regressor object
xg_reg = xgb.XGBRegressor(objective='reg:linear', n_estimators=10, seed=123)

xg_reg.fit(X_train, y_train)

preds = xg_reg.predict(X_test)

rmse = np.sqrt(mean_squared_error(y_test, preds))
print("RMSE: %f" % (rmse))

#### Linear Base Learners, need the Learning API

In [None]:
#after train/test split
#need to turn train/test splits into DMatrixes

#this is required by the Learning API
DM_train = xgb.DMatrix(data=X_train, label=y_train)
DM_train = xgb.DMatrix(data=X_test, label=y_test)

#specifying the base learner we want
params = {"booster":"gblinear", 
          "objective":"reg:linear"}

xg_reg = xgb.train(params = params,
                  dtrain=DM_train,
                  num_boost_round=10)

preds = xg_reg.predict(DM_test)

rmse = np.sqrt(mean_squared_error(y_test, preds))
print("RMSE: %f" % (rmse))


In [None]:
#performing cross-validation with 5 boosting rounds and "rmse as metric"

# Create the parameter dictionary: params
params = {"objective":"reg:linear", "max_depth":4}

# Perform cross-validation: cv_results
cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=4, 
                    num_boost_round=5, metrics="mae", as_pandas=True, seed=123)

# Print cv_results
print(cv_results)

# Extract and print final round boosting round metric
print((cv_results["test-mae-mean"]).tail(1))

#### Regularization

Is it a form of control over the model complexity.

#### Regularization parameters in XGBoost:

- **gamma**: minimum loss reduction allowed for a split to occur
- **alpha**: L1 regularization on leaf weights, larger values mean more regularization
- **lambda**: L2 regularization on leaf weights, smoother than L1

In [None]:
#L1 regularization in XGBoost, example

#define X and y
X, y = data.iloc[:, :-1], data.iloc[:, -1]

#create the maxtrix
DMatrix = xgb.DMatrix(data=X, label=y)

#defining params
params={"objective":"reg:linar", "max_depth":4}

#creating three alpha L1 values
l1_params = [1,10,100]

#storing these rmse for each alpha values in a list
rmses_l1=[]

#loop to iterate over each entry in l1_params list and do the following:
for reg in l1_params:
    params["alpha"] = reg
    cv_results = xgb.cv(dtrain=DMatrix, params=params, nfold=4,
                       num_boost_round=10, metrics="rmse", as_pandas=True, seed=123)
    rmses_l1.append(cv_results["test-rmse-mean"].tail(1).values[0])
    
print("Best rmse as a function of L1:")
print(pd.DataFrame(list(zip(l1_params, rmses_l1)), columns=["l1", "rmse"]))