References: <br>
https://github.com/rapidsai/cuml/blob/branch-0.17/notebooks/forest_inference_demo.ipynb

# Forest Inference Library (FIL)
The forest inference library is used to load saved forest models of xgboost, lightgbm and perform inference on them (classification and regression).

**Plan**
<br>
- Fit a model with XGBoost
- Save model
- Load saved model into FIL
- Use model to infer on new data

FIL works in the same way with lightgbm model.

The model accepts both numpy arrays and cuDF dataframes.

[forest inference library](https://docs.rapids.ai/api/cuml/stable/api.html#forest-inferencing)

In [1]:
import numpy as np
import os

# cuML
# from cuml.test.utils import array_equal
from cuml.common.import_utils import has_xgboost
from cuml import ForestInference

# sklearn
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

## Check for xgboost
Checks if xgboost is present, else throw error.

In [2]:
if has_xgboost():
    import xgboost as xgb
else:
    raise ImportError("Please install xgboost using the conda package,"
                      " Use conda install -c conda-forge xgboost "
                      "command to install xgboost")

## Train helper function
Defines a function that trains XGBoost model and returns trained model.

[xgboost library](https://xgboost.readthedocs.io/en/latest/parameter.html)

In [3]:
def train_xgboost_model(X_train, y_train, num_rounds, model_path):

    # set xgboost model parameters
    params = {'silent': 1, 'eval_metric':'error',
              'objective':'binary:logistic',
              'max_depth': 25}
    # process inputs
    dtrain = xgb.DMatrix(X_train, label=y_train)

    # train xgboost model
    bst = xgb.train(params, dtrain, num_rounds)

    # save trained xgboost model
    bst.save_model(model_path)

    return bst

## Predict helper function
Uses trained xgboost model to perform prediction and return labels.

In [4]:
def predict_xgboost_model(X_validation, y_validation, xgb_model):

    # process input
    dvalidation = xgb.DMatrix(X_validation, label=y_validation)
    # predict using xgboost model
    xgb_preds = xgb_model.predict(dvalidation)

    # convert predicted values from xgboost into class labels
    xgb_preds = np.around(xgb_preds)
    
    return xgb_preds

## Define parameters

In [5]:
n_rows = 10000
n_columns = 100
n_categories = 2

# object that generates random numbers drawn from a variety of probability distributions
random_state = np.random.RandomState(43210)

# enter path to directory where trained model will be saved
model_path = 'xgb.model'

# num of iterations for which the model is trained
num_rounds = 15

## Generate data

In [6]:
# create dataset (n-class classification problem)
X, y = make_classification(n_samples=n_rows,
                           n_features=n_columns,
                           n_informative=int(n_columns/5),
                           n_classes=n_categories,
                           random_state=random_state)
train_size = 0.8

# convert dataset to np.float32
X = X.astype(np.float32)
y = y.astype(np.float32)

# split dataset into training and validation splits
X_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size=train_size)

## Train and Predict model
Invoke function to train model and get predictions to validate them.

In [7]:
# train xgboost model
xgboost_model = train_xgboost_model(X_train, y_train, num_rounds, model_path)

Parameters: { "silent" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




In [8]:
%%time
# test xgboost model
trained_model_preds = predict_xgboost_model(X_validation, y_validation, xgboost_model)

CPU times: user 11.5 ms, sys: 0 ns, total: 11.5 ms
Wall time: 5.7 ms


## Load Forest Inference Library (FIL)
The load function of the ForestInference class accepts the following parameters:

filename : str <br>
    Path to saved model file in a treelite-compatible format
    <br>
    (See https://treelite.readthedocs.io/en/latest/treelite-api.html
    

output_class : bool <br>
    If true, return a 1 or 0 depending on whether the raw prediction
    exceeds the threshold.
    <br>
    If False, just return the raw prediction.
    

threshold : float <br>
    Cutoff value above which a prediction is set to 1.0
    <br>
    Only used if the model is classification and `output_class` is `True`


algo : string name of the algo from (from algo_t enum)
<br>

- 'NAIVE' - simple inference using shared memory
- 'TREE_REORG' - similar to naive but trees rearranged to be more coalescing-friendly
- 'BATCH_TREE_REORG' - similar to TREE_REORG but predicting  multiple rows per thread block

<br>
model_type : str
<br>
    Format of saved treelite model to load.
    <br>
    Can be 'xgboost', 'lightgbm'


## Loaded saved model
Use FIL to load the saved xgboost model

In [9]:
fm = ForestInference.load(filename=model_path,
                          algo='BATCH_TREE_REORG',
                          output_class=True,
                          threshold=0.50,
                          model_type='xgboost')

## Predict using FIL

In [14]:
%%time
# perform prediction on the model loaded from path
fil_preds = fm.predict(X_validation)

CPU times: user 93 µs, sys: 2.56 ms, total: 2.65 ms
Wall time: 1.75 ms


## Evaluate results
Verify predictions for original and FIL model match.

In [19]:
print("The shape of predictions obtained from xgboost : ",(trained_model_preds).shape)
print("The shape of predictions obtained from FIL : ",(fil_preds).shape)
# print("Are the predictions for xgboost and FIL the same : " ,   array_equal(trained_model_preds, fil_preds))
print("Are the predictions for xgboost and FIL the same : " ,  (trained_model_preds == fil_preds).all())

The shape of predictions obtained from xgboost :  (2000,)
The shape of predictions obtained from FIL :  (2000,)
Are the predictions for xgboost and FIL the same :  True
