# Onboarding a Model by Uploading the Model Artifact

In this notebook we present the steps for using onboarding a model with its model artifact.  When Fiddler is provided with the actual model artifact, it can produce high-fidelity explainability.  Models within Fiddler that use a surrogate model or no model artifact at all, provide approximative explainability or no explainability at all.

Fiddler is the pioneer in enterprise Model Performance Management (MPM), offering a unified platform that enables Data Science, MLOps, Risk, Compliance, Analytics, and LOB teams to **monitor, explain, analyze, and improve ML deployments at enterprise scale**. 
Obtain contextual insights at any stage of the ML lifecycle, improve predictions, increase transparency and fairness, and optimize business revenue.

---

You can experience Fiddler's NLP monitoring ***in minutes*** by following these five quick steps:

1. Connect to Fiddler
2. Upload a baseline dataset
3. Upload a model package directory containing the **1) model.yaml, 2) package.py and 3) model artifact**
4. Publish production events
5. Get insights (including high-fidelity XAI!)

# 0. Imports

In [None]:
!pip install -q fiddler-client

import fiddler as fdl
import pandas as pd
import yaml
import datetime
import time
from IPython.display import clear_output

print(f"Running Fiddler client version {fdl.__version__}")

# 1. Connect to Fiddler

Before you can add information about your model with Fiddler, you'll need to connect using our API client.

---

**We need a few pieces of information to get started.**
1. The URL you're using to connect to Fiddler
2. Your organization ID
3. Your authorization token

The latter two of these can be found by pointing your browser to your Fiddler URL and navigating to the **Settings** page.

In [None]:
URL = 'https://unity3d.fiddler.ai' # Make sure to include the full URL (including https://).
ORG_ID = 'unity3d'
AUTH_TOKEN = 'TgvvXt1u8yT0mGaoksPCcsUKEzGuQeMOvqydIFxbZYk'

Now just run the following code block to connect to the Fiddler API!

In [None]:
client = fdl.FiddlerApi(
    url=URL,
    org_id=ORG_ID,
    auth_token=AUTH_TOKEN
)

Once you connect, you can create a new project by specifying a unique project ID in the client's `create_project` function.

In [None]:
PROJECT_ID = 'example_model_upload_for_xai'

if not PROJECT_ID in client.list_projects():
    print(f'Creating project: {PROJECT_ID}')
    client.create_project(PROJECT_ID)
else:
    print(f'Project: {PROJECT_ID} already exists')

# 2. Upload a baseline dataset

In this example, we'll be considering the case where we're a bank and we have **a model that predicts churn for our customers**.  
We want to know when our model's predictions start to drift—that is, **when churn starts to increase** within our customer base.
  
In order to get insights into the model's performance, **Fiddler needs a small  sample of data that can serve as a baseline** for making comparisons with data in production.


---


*For more information on how to design a baseline dataset, [click here](https://docs.fiddler.ai/docs/designing-a-baseline-dataset).*

In [None]:
PATH_TO_BASELINE_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/hawaii_baseline_dataset.csv'

baseline_df = pd.read_csv(PATH_TO_BASELINE_CSV)
baseline_df

Fiddler uses this baseline dataset to keep track of important information about your data.
  
This includes **data types**, **data ranges**, and **unique values** for categorical variables.

---

You can construct a `DatasetInfo` object to be used as **a schema for keeping track of this information** by running the following code block.

In [None]:
dataset_info = fdl.DatasetInfo.from_dataframe(baseline_df, max_inferred_cardinality=100)
dataset_info

Then use the client's `upload_dataset` function to send this information to Fiddler!
  
*Just include:*
1. A unique dataset ID
2. The baseline dataset as a pandas DataFrame
3. The `DatasetInfo` object you just created

In [None]:
DATASET_ID = 'churn_data'

client.upload_dataset(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    dataset={
        'baseline': baseline_df
    },
    info=dataset_info
)

If you click on your project, you should now be able to see the newly created dataset on the UI.

<table>
    <tr>
        <td>
            <img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_new_dataset.png" />
        </td>
    </tr>
</table>

## 3. Upload your model package

Now it's time to upload your model package to Fiddler.  To complete this step, we need to ensure we have 3 assets in a subdirectory.  

*(Note: it doesn't matter what this directory is called, but for this example we will call it **model**)*.

In [None]:
import os
os.makedirs("model")

***Your model package directory needs to contain:***
1. A **model.yaml** file which describes the model's properties to Fiddler
2. A **package.py** file which explains to Fiddler how to invoke your model's prediction endpoint
3. And the **model artifact** itself

---

### 3.1  Create the **model.yaml** file 

This is done by first creating our [model_info](https://docs.fiddler.ai/reference/fdlmodelinfo) object and then writing it out to a **model.yaml** file.


In [None]:
metadata_cols = ['gender']
decision_cols = ['decisions']
feature_columns = ['creditscore', 'geography', 'age', 'tenure',
       'balance', 'numofproducts', 'hascrcard', 'isactivemember',
       'estimatedsalary']


model_info = fdl.ModelInfo.from_dataset_info(
    dataset_info=client.get_dataset_info(PROJECT_ID, DATASET_ID),
    target='churn', 
    features=feature_columns,
    decision_cols = decision_cols,
    metadata_cols = metadata_cols,
     outputs=['probability_churn'],
    display_name='Random Forest Model',
    description='This is models customer bank churn'
)

model_info

In [None]:
# Saving the model.yaml file using the model info dict

with open('model/model.yaml', 'w') as yaml_file:
    yaml.dump({'model': model_info.to_dict()}, yaml_file)

### 3.2 Create the **package.py** file

The contents of the cell below will be written into our package.py file.  This is the step that will be most unique based on model type, framework and use case.  The model's *package.py* file also allows for preprocessing transformations and other processing before the model's prediction endpoint is called.  For more information on how to properly create the *package.py* file, please reference [Uploading a Model Artifact](https://docs.fiddler.ai/docs/uploading-a-model-artifact#packagepy-script) from the product documentation.

In [None]:
%%writefile model/package.py

import pandas as pd
from pathlib import Path
import os
from sklearn.ensemble import RandomForestClassifier
import pickle as pkl

 
PACKAGE_PATH = Path(__file__).parent
TARGET = 'churn'
PREDICTION = 'probability_churn'

class Random_Forest:


    def __init__(self, model_path, output_column=None):
        """
        :param model_path: The directory where the model is saved.
        :param output_column: list of column name(s) for the output.
        """
        self.model_path = model_path
        self.output_column = output_column
        
       
        file_path = os.path.join(self.model_path, 'model.pkl')
        with open(file_path, 'rb') as file:
            self.model = pkl.load(file)
    
    
    def predict(self, input_df):
        return pd.DataFrame(
            self.model.predict_proba(input_df.loc[:, input_df.columns != TARGET])[:,1], 
            columns=self.output_column)
    

def get_model():
    return Random_Forest(model_path=PACKAGE_PATH, output_column=[PREDICTION])

### 3.3  Ensure your model's artifact is in the **/model** directory

Make sure your model artifact (*e.g. the model .pkl file*) is also present in the model package directory.  Once it is there with the **model.yaml** file and the **pacakge.py** file, the model package directory can be uploaded to Fiddler.

In [None]:
import urllib.request
urllib.request.urlretrieve("https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/model/model.pkl", "model/model.pkl")

### Finally, upload the model package directory

In [None]:
MODEL_ID = 'customer_churn_rf'

client.upload_model_package('model/', project_id=PROJECT_ID, model_id=MODEL_ID)

On the project page, you should now be able to see the newly created model.

<table>
    <tr>
        <td>
            <img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_new_model.png" />
        </td>
    </tr>
</table>

# 4. Publish production events

Your model artifact is uploaded and registered.  Now it's time to start publishing some production data!  
Fiddler will **monitor this data and compare it to your baseline to generate powerful insights into how your model is behaving**.  

With the model artifact available to Fiddler, high-fidelity explanations are also avaialbe.


---


Each record sent to Fiddler is called **an event**.  An event is just **a dictionary that maps column names to column values**.
  
Let's load in some sample events from a CSV file.  Then we can create an artificial timestamp for the events and publish them to fiddler one by one in a streaming fashion.

In [None]:
PATH_TO_EVENTS_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/hawaii_drift_demo_large.csv'

event_log = pd.read_csv(PATH_TO_EVENTS_CSV)
event_log

In [None]:
NUM_EVENTS_TO_SEND = 11500

#ONE_MINUTES_MS = 60000
FIVE_MINUTES_MS = 300000
#FIFTEEN_MINUTES_MS = FIVE_MINUTES_MS * 3
ONE_DAY_MS = 8.64e+7
NUM_DAYS_BACK_TO_START=39 #set the start of the event data publishing this many days in the past
start_date = round(time.time() * 1000) - (ONE_DAY_MS * NUM_DAYS_BACK_TO_START) 
print(datetime.datetime.fromtimestamp(start_date/1000.0))

In [None]:
def event_generator_df():
    for ind, row in event_log.iterrows():
        event_dict = dict(row)
        event_id = event_dict.pop('event_id')
        event_time = start_date + ind * FIVE_MINUTES_MS #publish an event every FIVE_MINUTES_MS
        yield event_id, event_dict, event_time
        
event_queue_df = event_generator_df()

def get_next_event_df():
    return next(event_queue_df)

In [None]:
for ind in range(NUM_EVENTS_TO_SEND):
    event_id_tmp, event_dict, event_time = get_next_event_df()
   
    result = client.publish_event(PROJECT_ID,
                                  MODEL_ID,
                                  event_dict,
                                  event_timestamp=event_time,
                                  event_id= event_id_tmp,
                                  update_event= False)
    
    readable_timestamp = datetime.datetime.fromtimestamp(event_time/1000.0)
    clear_output(wait = True)
    
    print(f'Sending {ind+1} / {NUM_EVENTS_TO_SEND} \n{readable_timestamp} UTC: \n{event_dict}')
    time.sleep(0.001)

# 5. Get insights

**You're all done!**
  
Now just head to your Fiddler URL and start getting enhanced observability into your model's performance.

Run the following code block to get your URL.

In [None]:
print('/'.join([URL, 'projects', PROJECT_ID, 'models', MODEL_ID, 'monitor']))

*Please allow 3-5 minutes for monitoring data to populate the charts.*
  
The following screen will be available to you upon completion.
<table>
    <tr>
        <td>
            <img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_monitoring.png" />
        </td>
    </tr>
</table>



---


**Questions?**  
  
Check out [our docs](https://docs.fiddler.ai/) for a more detailed explanation of what Fiddler has to offer.

If you're still looking for answers, fill out a ticket on [our support page](https://fiddlerlabs.zendesk.com/) and we'll get back to you shortly.