##  Gradient Boosting Models with XGBoost in Python ( Save Model )

In [2]:
pwd

'/Users/medamin/_Projets/_DataScience/AWS_sageMaker/Lab_1'

In [1]:
from numpy import loadtxt
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [5]:
# load data
df = loadtxt('data/diabetes.csv', delimiter=",")


In [6]:
# split data into X and y
X = df[:,0:8]
Y = df[:,8]

In [12]:
# split data into train and test sets
seed = 7
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)


## Train the XGBoost Model

In [13]:
# fit model no training data
model = XGBClassifier()
model.fit(X_train, y_train)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bynode=1, colsample_bytree=1, gamma=0, learning_rate=0.1,
       max_delta_step=0, max_depth=3, min_child_weight=1, missing=None,
       n_estimators=100, n_jobs=1, nthread=None,
       objective='binary:logistic', random_state=0, reg_alpha=0,
       reg_lambda=1, scale_pos_weight=1, seed=None, silent=None,
       subsample=1, verbosity=1)

In [14]:
print(model)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bynode=1, colsample_bytree=1, gamma=0, learning_rate=0.1,
       max_delta_step=0, max_depth=3, min_child_weight=1, missing=None,
       n_estimators=100, n_jobs=1, nthread=None,
       objective='binary:logistic', random_state=0, reg_alpha=0,
       reg_lambda=1, scale_pos_weight=1, seed=None, silent=None,
       subsample=1, verbosity=1)


### Make Predictions with XGBoost Model

In [15]:
# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]

In [16]:
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

Accuracy: 77.95%


### Summary

In this post you discovered how to develop your first XGBoost model in Python.

Specifically, you learned:

How to install XGBoost on your system ready for use with Python.
How to prepare data and train your first XGBoost model on a standard machine learning dataset.
How to make predictions and evaluate the performance of a trained XGBoost model using scikit-learn.
Do you have any questions about XGBoost or about this post? Ask your questions in the comments and I will do my best to answer.

###  Serialize Your XGBoost Model with Pickle

Pickle is the standard way of serializing objects in Python.

You can use the Python pickle API to serialize your machine learning algorithms and save the serialized format to a file, for example:

In [23]:
import pickle

In [24]:
# save model to file
pickle.dump(model, open("pima.pickle.dat", "wb"))

In [26]:
# some time later...
 
# load model from file
loaded_model = pickle.load(open("pima.pickle.dat", "rb"))


In [27]:
# make predictions for test data
y_pred = loaded_model.predict(X_test)
predictions = [round(value) for value in y_pred]


In [28]:
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

Accuracy: 77.95%
