# Importing mlflow models in DSS

In this notebook we show through a simple example how to import a machine learning model trained *entirely out of DSS* into a SavedModel in a project's Flow. We use the [Catboost]() framework to perform a binary classification task on the [UCI Bank dataset]().

## Step 1: create the code env in DSS

In the *Administration > Code envs* section of DSS, create a new **python 3.6 or above** code environment and add the following packages, then build the code-env:

`
mlflow
mlflow[extras]
catboost==0.26.1
pandas>=1.0,<1.1
`

> **This notebook should be running using that code env ! **

Write down the name of that code env, you will need it to call `import_mlflow_version_from_path()`.

## Step 2: import packages

In [0]:
import dataiku
import dataikuapi
import os
import pandas

from dataikuapi.dss.ml import DSSPredictionMLTaskSettings

import pandas as pd
import mlflow.catboost
from catboost import CatBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
from datetime import datetime

## Step 3: train your model

We will train a Catboost model on our training dataset

In [0]:
# Import training data
trainaing_data = dataiku.Dataset("training_data")
df = trainaing_data.get_dataframe()
cat_cols= ["job", "marital", "education", "default", "housing","loan", "month"]
cont_cols= ["age", "balance", "day", "duration", "campaign"]
target= ["y"]

# Train a catboost model on training data
cat_col_idx = [df.columns.get_loc(c) for c in cat_cols]
X = df.drop(target, axis=1)
y = LabelBinarizer().fit_transform(df[target[0]])

X_train, X_val, y_train, y_val = train_test_split(X, y, train_size=0.7, random_state=1337)
model = CatBoostClassifier(iterations=100, learning_rate=0.05, depth=15, eval_metric="AUC")
model.fit(X_train, y_train,
          cat_features=cat_col_idx,
          eval_set=(X_val, y_val))

# Save the model to a managed folder
catboost_models_folder = dataiku.Folder("catboost_models")
catboost_models_folder_dir = catboost_models_folder.get_path()
ts = datetime.now().strftime("%Y%m%d-%H%M%S")
model_dir = "{}/catboost-uci-bank-{}".format(catboost_models_folder_dir,ts)

mlflow.catboost.save_model(model, model_dir)
print("Model saved at {} !".format(os.path.abspath(model_dir)))

## Step 4: get a handle on a SavedModel

In [0]:
client = dataiku.api_client()
project = client.get_default_project()

# Get or create SavedModel
sm_name = "catboost-uci-bank"
sm_id = None
for sm in project.list_saved_models():
    if sm_name != sm["name"]:
        continue
    else:
        sm_id = sm["id"]
        print("Found SavedModel {} with id {}".format(sm_name, sm_id))
        break
if sm_id:
    sm = project.get_saved_model(sm_id)
else:
    sm = project.create_mlflow_pyfunc_model(name=sm_name,
                                            prediction_type=DSSPredictionMLTaskSettings.PredictionTypes.BINARY)
    sm_id = sm.id
    print("SavedModel not found, created new one with id {}".format(sm_id))

## Step 5: Import mlflow model into a SavedModel version

In [0]:
# Change the following values to match your setup !

version_id = "v01" # Change this to iterate to a new version

# Create version in SavedModel
for v in sm.list_versions():
    if v["id"] == version_id:
        raise Exception("SavedModel version already exists! Choose a new version name.")

sm_version = sm.import_mlflow_version_from_path(version_id=version_id,
                                                path=model_dir,
                                                code_env_name="mlflow_catboost")
# Evaluate the version using the previously created Dataset
sm_version.set_core_metadata(target_column_name="y",
                             class_labels=["no", "yes"],
                             get_features_from_dataset="eval_data")
sm_version.evaluate("eval_data")

If you go to the SavedModel's version screen, you should now be able to see properly all the "Performance" visualizations.