# Forest Inference Library (FIL)
The forest inference library is used to load saved forest models of xgboost, lightgbm and perform inference on them. It can be used to perform both classification and regression. In this notebook, we'll begin by fitting a model with XGBoost and saving it. We'll then load the saved model into FIL and use it to infer on new data.

FIL works in the same way with lightgbm model as well.

The model accepts both numpy arrays and cuDF dataframes. In order to convert your dataset to cudf format please read the cudf documentation on https://docs.rapids.ai/api/cudf/stable. 

For additional information on the forest inference library please refer to the documentation on https://docs.rapids.ai/api/cuml/stable/api.html#forest-inferencing

In [1]:
import cupy
import os

from cuml.test.utils import array_equal
from cuml.common.import_utils import has_xgboost

from cuml.datasets import make_classification
from cuml.metrics import accuracy_score
from cuml.model_selection import train_test_split
    
from cuml import ForestInference

### Check for xgboost
Checks if xgboost is present, if not then it throws an error.

In [2]:
if has_xgboost():
    import xgboost as xgb
else:
    raise ImportError("Please install xgboost using the conda package,"
                      "e.g.: conda install -c conda-forge xgboost")

## Define parameters

In [3]:
# synthetic data size
n_rows = 10000
n_columns = 100
n_categories = 2
random_state = cupy.random.RandomState(43210)

# fraction of data used for model training
train_size = 0.8

# trained model output filename
model_path = 'xgb_2000.model'

# num of iterations for which xgboost is trained
num_rounds = 2000

# maximum tree depth in each training round
max_depth = 15

## Generate data

In [4]:
# create the dataset
X, y = make_classification(
    n_samples=n_rows,
    n_features=n_columns,
    n_informative=int(n_columns/5),
    n_classes=n_categories,
    random_state=42
)

# convert the dataset to float32
X = X.astype('float32')
y = y.astype('float32')

# split the dataset into training and validation splits
X_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size=0.8)

## Train helper function
Defines a simple function that trains the XGBoost model and returns the trained model.

For additional information on the xgboost library please refer to the documentation on : 
https://xgboost.readthedocs.io/en/latest/parameter.html

In [5]:
def train_xgboost_model(
    X_train, 
    y_train,
    model_path='xgb.model',
    num_rounds=100, 
    max_depth=20
):
    
    # set the xgboost model parameters
    params = {
        'verbosity': 0, 
        'eval_metric':'error',
        'objective':'binary:logistic',
        'max_depth': max_depth,
        'tree_method': 'gpu_hist'
    }
    
    # convert training data into DMatrix
    dtrain = xgb.DMatrix(X_train, label=y_train)
    
    # train the xgboost model
    trained_model = xgb.train(params, dtrain, num_rounds)

    # save the trained xgboost model
    trained_model.save_model(model_path)

    return trained_model

## Predict helper function
Uses the trained xgboost model to perform prediction and return the labels.

In [6]:
def predict_xgboost_model(X_validation, y_validation, xgb_model):

    # predict using the xgboost model
    dvalidation = xgb.DMatrix(X_validation, label=y_validation)
    predictions = xgb_model.predict(dvalidation)

    # convert the predicted values from xgboost into class labels
    predictions = cupy.around(predictions)
    
    return predictions

## Train and Predict the model
Invoke the function to train the model and get predictions so that we can validate them.

In [7]:
%%time
# train the xgboost model
xgboost_model = train_xgboost_model(
    X_train, 
    y_train, 
    model_path,
    num_rounds,
    max_depth
)

CPU times: user 7.58 s, sys: 204 ms, total: 7.78 s
Wall time: 7.79 s


In [8]:
%%time
# test the xgboost model
trained_model_preds = predict_xgboost_model(
    X_validation,
    y_validation,
    xgboost_model
)

CPU times: user 16.7 ms, sys: 0 ns, total: 16.7 ms
Wall time: 16.1 ms
