# Boosting with xgboost

The example is from https://www.datacamp.com/community/tutorials/xgboost-in-python. The analisis presented is interesting. We use Boston House Prices data to show an example of xgboost.

In [None]:
!pip install xgboost

In [None]:
import xgboost as xgb
import pandas as pd
import numpy as np

from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

In [None]:
boston = load_boston()

In [None]:
data = pd.DataFrame(boston.data, columns=boston.feature_names)

We first take a look at the data

In [None]:
data.describe()

Column names are not that helpful, if you want to know about them, look at the description below:

In [None]:
print(boston.DESCR)

The objective will be to predict the properties prices, we add that info in a new column:

In [None]:
data['PRICE'] = boston.target

In [None]:
X, y = data.iloc[:,:-1],data.iloc[:,-1]

The next step is just an optimization for performance...

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Create the model and train it. The APIs are similar to the ones of sklearn :)

In [None]:
xg_reg = xgb.XGBRegressor(objective ='reg:squarederror', colsample_bytree = 0.3, learning_rate = 0.1,
                          max_depth = 5, alpha = 10, n_estimators = 10)

In [None]:
xg_reg.fit(X_train,y_train)

preds = xg_reg.predict(X_test)

Evaluate the model:

In [None]:
rmse = np.sqrt(mean_squared_error(y_test, preds))
print("RMSE: %f" % (rmse))

# What about classification?

In [None]:
from sklearn.metrics import classification_report
from sklearn.datasets import load_iris

In [None]:
iris = load_iris()

In [None]:
X = iris["data"]
y = iris["target"]

The `objective` parameter is the target function to minimize. For multiclass classification usually we use `multi:softmax`. You can check the [documentation](https://xgboost.readthedocs.io/en/latest/parameter.html#learning-task-parameters) for other objectives.

In [None]:
xg_clf = xgb.XGBClassifier(objective = "multi:softmax", colsample_bytree = 0.3, learning_rate = 0.1,
                           max_depth = 5, alpha = 10, n_estimators = 10)

In [None]:
xg_clf.fit(X,y)
preds = xg_clf.predict(X)

In [None]:
print(classification_report(y, xg_clf.predict(X)))