# MLhandlers Tutorial

This notebook function handles training and logging of xgboost models **only**, exposing both the sklearn and low level api"s.<br>
More information about XGBoost - [here](https://en.wikipedia.org/wiki/XGBoost)

Multiple model types that can be estimated using the XGBoost Scikit-Learn API. <br>
Input can either be a predefined json model configuration or one
of the five xgboost model types. <br>
In either case one can pass in a params dict to modify defaults values.<br>
Based on `mlutils.models.gen_sklearn_model`, see the function
`sklearn_classifier` in the function-marketplace repository. <br>
> **param model_type:**
  one of "classifier", "regressor",
  "ranker", "rf_classifier", or
  "rf_regressor"<br>
> **param xgb_params:** class init parameters

## Steps
1. [Data Exploration](#Data-Exploration)
2. [Importing the function](#Importing-the-function)
3. [Setup XGBoost parameters](#Setup-XGBoost-parameters)
4. [Running the function locally](#Running-the-function-locally)
5. [Getting the model](#Getting-the-model)
6. [Some plotting](#Some-plotting)
7. [Running the function remotely](#Running-the-function-remotely)

In [17]:
import mlrun
mlrun.set_environment(project='ml-handlers')

> 2022-01-27 18:09:54,377 [info] loaded project ml-handlers from MLRun DB


('ml-handlers', 'v3io:///projects/{{run.project}}/artifacts')

### **Fetching a Dataset**

To generate the dataset we used the "gen_class_data" function from the hub, 
which wraps scikit-learn's [make_classification](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html#sklearn-datasets-make-classification).<br> 
See the link for a description of all parameters.

In [18]:
import pandas as pd
dataset_url = 'https://s3.wasabisys.com/iguazio/data/function-marketplace-data/xgb_trainer/classifier-data.csv'

In [19]:
df = pd.read_csv(dataset_url)
df.head()

Unnamed: 0,feat_0,feat_1,feat_2,feat_3,feat_4,labels
0,-0.265115,-1.93226,0.303992,-1.863833,-1.045635,1
1,-3.135479,-2.835548,1.338381,-1.385303,-2.276456,0
2,-1.519005,-1.807549,0.697304,-1.11886,1.1049,0
3,-0.632087,-0.345659,0.244329,-0.046066,0.44728,0
4,-1.405883,-1.746045,0.653617,-1.110985,-1.675466,0


### **Importing the MLhandlers functions from the Marketplace**

In [20]:
# If GPU is available - set to True
# GPU = False

# fn = mlrun.import_function("hub://xgb_trainer")
# fn.image = "mlrun/ml-models" if not GPU else "mlrun/ml-models-gpu"
# fn.apply(mlrun.auto_mount())

### **Training a model**

#### Using the train handler

In [21]:
import mlrun
from mlrun import code_to_function, import_function

fn = code_to_function(name='train', filename="./guy_branch/MLhandlers.py", kind='job', handler='train')
fn.export(target='./guy_branch/MLhandlers.yaml')
fn = mlrun.import_function("./guy_branch/MLhandlers.yaml")

> 2022-01-27 18:09:56,627 [info] function spec saved to path: ./guy_branch/MLhandlers.yaml


#### Define task parameters¶

In [22]:
train_params = {"label_column": "labels",
              "model_class": "sklearn.linear_model.LinearRegression"}

#### Running the Training job

In [23]:
train_run = fn.run(params = train_params,    
                   inputs={"dataset": dataset_url},
                   local=True)

> 2022-01-27 18:09:58,015 [info] starting run train-train uid=96e5337c21364df9a29975472b76e78c DB=http://mlrun-api:8080


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
ml-handlers,...2b76e78c,0,Jan 27 18:09:58,completed,train-train,v3io_user=alexzkind=owner=alexzhost=alexz-jupyter-ff89ccdb5-wqw97,dataset,label_column=labelsmodel_class=sklearn.linear_model.LinearRegression,r2_score=0.5642779199333172mean_absolute_error=0.26963481295297687mean_squared_error=0.33003567756136554,test_setLinearRegression





> 2022-01-27 18:09:59,245 [info] run executed, status=completed


#### How about other frameworks?

In [24]:
task_params = {"label_column": "labels",
              "model_class": "xgb.regressor"}

In [25]:
xgb_run = fn.run(params = task_params,    
                   inputs={"dataset": dataset_url},
                   local=True)

> 2022-01-27 18:09:59,372 [info] starting run train-train uid=a9686560711f4b4585d4525397511b2d DB=http://mlrun-api:8080


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
ml-handlers,...97511b2d,0,Jan 27 18:09:59,completed,train-train,v3io_user=alexzkind=owner=alexzhost=alexz-jupyter-ff89ccdb5-wqw97,dataset,label_column=labelsmodel_class=xgb.regressor,r2_score=0.8655049655624085mean_absolute_error=0.07196490722894669mean_squared_error=0.18335462030763802,test_setXGBRegressor





> 2022-01-27 18:10:03,741 [info] run executed, status=completed


### **Evaluating a model**

#### Use the evaluate handler

In [26]:
import mlrun
from mlrun import code_to_function, import_function

fn = code_to_function(name='evaluate', filename="./guy_branch/MLhandlers.py", kind='job', handler='evaluate')
fn.export(target='./guy_branch/MLhandlers.yaml')
fn = mlrun.import_function("./guy_branch/MLhandlers.yaml")

> 2022-01-27 18:10:03,812 [info] function spec saved to path: ./guy_branch/MLhandlers.yaml


#### Define task parameters¶

In [27]:
eval_params = {"label_column": "labels",
              "model_path": train_run.outputs["LinearRegression"]}

#### Running our Evaluation job

In [28]:
eval_run = fn.run(
                params = eval_params,
                inputs = {'dataset': dataset_url},
                local=True
                )

> 2022-01-27 18:10:03,855 [info] starting run evaluate-evaluate uid=1174f564a4b44e81ad892b033db11128 DB=http://mlrun-api:8080


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
ml-handlers,...3db11128,0,Jan 27 18:10:04,completed,evaluate-evaluate,v3io_user=alexzkind=owner=alexzhost=alexz-jupyter-ff89ccdb5-wqw97,dataset,label_column=labelsmodel_path=store://artifacts/ml-handlers/LinearRegression:96e5337c21364df9a29975472b76e78c,r2_score=0.5563201595484754mean_absolute_error=0.2715118044214969mean_squared_error=0.33304605711190044,test_set





> 2022-01-27 18:10:04,942 [info] run executed, status=completed


### **Predicting on new data**

#### Fetching a new dataset

In [29]:
new_dataset_url = 'https://s3.wasabisys.com/iguazio/data/function-marketplace-data/xgb_trainer/classifier-data.csv'

#### Use the predict handler

In [32]:
fn = code_to_function(name='predict', filename="./guy_branch/MLhandlers.py", kind='job', handler='predict')
fn.export(target='MLhandlers.yaml')
fn = mlrun.import_function("./MLhandlers.yaml")

> 2022-01-27 18:10:25,834 [info] function spec saved to path: MLhandlers.yaml


#### Define task parameters¶

In [33]:
predict_params = {"label_column": "labels",
              "model_path": train_run.outputs["LinearRegression"]}

#### Running our Predict job

In [34]:
predict_run = fn.run(
                params = predict_params,
                inputs = {'dataset': new_dataset_url},
                local=True
                )

> 2022-01-27 18:10:32,779 [info] starting run predict-predict uid=bacfba553d92427ab99896b620602f4e DB=http://mlrun-api:8080


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
ml-handlers,...20602f4e,0,Jan 27 18:10:32,completed,predict-predict,v3io_user=alexzkind=owner=alexzhost=alexz-jupyter-ff89ccdb5-wqw97,dataset,label_column=labelsmodel_path=store://artifacts/ml-handlers/LinearRegression:96e5337c21364df9a29975472b76e78c,,





> 2022-01-27 18:10:33,335 [info] run executed, status=completed


### **Getting the model**

In [None]:
from mlrun.artifacts import get_model
import pickle

model_file, model_obj, _ = get_model(train_run.artifact('model'))
model = pickle.load(open(model_file,'rb'))

In [None]:
print(f"model score : {model.score(train_run.artifact('test_set').as_df().drop(['labels'],axis=1),train_run.artifact('test_set').as_df()['labels'])}")

### **Some plotting**

#### Display the probability calibration

In [9]:
train_run.artifact('probability-calibration').show()

In [10]:
train_run.artifact('feature-importances').show()

### **Running the function remotely**

In [11]:
fn.spec.build.commands=['pip install xgboost==1.3.1']
fn.deploy(with_mlrun=False)

> 2021-10-13 10:10:24,882 [info] Started building image: .mlrun/func-function-marketplace-xgb-trainer:latest
[36mINFO[0m[0000] Retrieving image manifest mlrun/mlrun:0.7.1  
[36mINFO[0m[0000] Retrieving image manifest mlrun/mlrun:0.7.1  
[36mINFO[0m[0000] Built cross stage deps: map[]                
[36mINFO[0m[0000] Retrieving image manifest mlrun/mlrun:0.7.1  
[36mINFO[0m[0001] Retrieving image manifest mlrun/mlrun:0.7.1  
[36mINFO[0m[0002] Executing 0 build triggers                   
[36mINFO[0m[0002] Unpacking rootfs as cmd RUN pip install xgboost==1.3.1 requires it. 
[36mINFO[0m[0024] RUN pip install xgboost==1.3.1               
[36mINFO[0m[0024] Taking snapshot of full filesystem...        
[36mINFO[0m[0035] cmd: /bin/sh                                 
[36mINFO[0m[0035] args: [-c pip install xgboost==1.3.1]        
[36mINFO[0m[0035] Running: [/bin/sh -c pip install xgboost==1.3.1] 
Collecting xgboost==1.3.1
  Downloading xgboost-1.3.1-py3-none-manylinux

True

In [12]:
fn.run(inputs={"dataset"  : 'https://s3.wasabisys.com/iguazio/data/function-marketplace-data/xgb_trainer/classifier-data.csv'},
       params=task_params)

> 2021-10-13 10:11:39,577 [info] starting run xgb-trainer-train_model uid=7332ff5d727948c89221d4645b84d028 DB=http://mlrun-api:8080
> 2021-10-13 10:11:39,764 [info] Job is running in the background, pod: xgb-trainer-train-model-4scfq
> 2021-10-13 10:11:55,207 [info] run executed, status=completed
final state: completed


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
function-marketplace,...5b84d028,0,Oct 13 10:11:51,completed,xgb-trainer-train_model,v3io_user=danikind=jobowner=danihost=xgb-trainer-train-model-4scfq,dataset,model_type=classifierCLASS_tree_method=histCLASS_objective=binary:logisticCLASS_booster=gbtreeFIT_verbose=0label_column=labels,accuracy=0.9552test-error=0.0448rocauc=0.9799618829687036brier_score=0.038984999293145965f1-score=0.954983922829582precision_score=0.965679190751445recall_score=0.9445229681978798,test_setprobability-calibrationconfusion-matrixfeature-importancesprecision-recall-binaryroc-binarymodel





> 2021-10-13 10:11:58,969 [info] run executed, status=completed


<mlrun.model.RunObject at 0x7f08906c3210>

[Back to the top](#XGBoost-trainer)