# Fiddler Quick Start Guide

Fiddler is a powerful observability tool for monitoring the health of your ML models in production.  
With the ability to set **realtime alerts** on metrics like **model performance**, **data drift**, and **data integrity**, Fiddler generates value at every stage of the production lifecycle.


---


You can start using Fiddler ***in minutes*** by following these five quick steps:

1. Connect to Fiddler
2. Upload a baseline dataset
3. Register your model with Fiddler
4. Publish production events
5. Get insights

## 0. Imports

In [None]:
!pip install -q fiddler-client;

import numpy as np
import pandas as pd
import fiddler as fdl

print(f"Running client version {fdl.__version__}")

[K     |████████████████████████████████| 67 kB 2.7 MB/s 
[K     |████████████████████████████████| 7.8 MB 11.8 MB/s 
[K     |████████████████████████████████| 131 kB 39.0 MB/s 
[K     |████████████████████████████████| 66 kB 2.5 MB/s 
[K     |████████████████████████████████| 79 kB 4.4 MB/s 
[K     |████████████████████████████████| 138 kB 52.1 MB/s 
[K     |████████████████████████████████| 127 kB 56.2 MB/s 
[?25h  Building wheel for ordered-set (setup.py) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.[0m
Running client version 0.7.5


## 1. Connect to Fiddler

Before you can register your model with Fiddler, you'll need to connect using our API client.


---


**We need a few pieces of information to get started.**
1. The URL you're using to connect to Fiddler

In [None]:
URL = #

2. Your organization ID
3. Your authorization token

Both of these can be found by clicking the URL you entered and navigating to the **Settings** page.

<table>
    <tr>
        <td><img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_settings_page_numbered.png" /></td>
        <td><img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_org_id_numbered.png" /></td>
    </tr>
    <tr>
        <td><img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_new_key_numbered.png" /></td>
        <td><img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_auth_token_numbered.png" /></td>
    </tr>
</table>

In [None]:
ORG_ID = #
AUTH_TOKEN = #

Now just run the following code block to connect to the Fiddler API!

In [None]:
client = fdl.FiddlerApi(
    url=URL,
    org_id=ORG_ID,
    auth_token=AUTH_TOKEN
)

Once you connect, you can create a new project by specifying a unique project ID in the client's `create_project` function.

In [None]:
PROJECT_ID = 'quickstart_example'

client.create_project(PROJECT_ID)

{'project_name': 'quickstart_example'}

You should now be able to see the newly created project on the UI.

<table>
    <tr>
        <td>
            <img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_new_project.png" />
        </td>
    </tr>
</table>

## 2. Upload a baseline dataset

In this example, we'll be considering the case where we're a bank and we have **a model that predicts churn for our customers**.  
We want to know when our model's predictions start to drift—that is, **when churn starts to increase** within our customer base.
  
In order to get insights into the model's performance, **Fiddler needs a small  sample of data that can serve as a baseline** for making comparisons with data in production.


---


*For more information on how to design a baseline dataset, [click here](http://internal.docs.fiddler.ai.s3-website-us-west-1.amazonaws.com/pages/user-guide/data-science-concepts/monitoring/constructing-a-baseline-dataset/).*

In [None]:
PATH_TO_BASELINE_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/new-quickstart/content_root/tutorial/quickstart/churn_baseline.csv'

baseline_df = pd.read_csv(PATH_TO_BASELINE_CSV)
baseline_df

Unnamed: 0,creditscore,geography,gender,age,tenure,balance,numofproducts,hascrcard,isactivemember,estimatedsalary,churn,predicted_churn,decision
0,545,Texas,Male,37,9,110483.86,1,1,1,127394.67,yes,0.897202,low_risk
1,497,Texas,Female,55,7,131778.66,1,1,1,9972.64,yes,0.997441,low_risk
2,509,New York,Female,29,0,107712.57,2,1,1,92898.17,yes,0.920563,low_risk
3,743,Hawaii,Nonbinary,39,6,0.00,2,1,0,44265.28,yes,0.779282,low_risk
4,699,Florida,Female,25,8,0.00,2,1,1,52404.47,yes,0.825474,low_risk
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,686,Texas,Male,39,3,129626.19,2,1,1,103220.56,yes,0.760645,low_risk
19996,446,Massachusetts,Female,45,10,125191.69,1,1,1,128260.86,no,0.216093,high_risk
19997,794,California,Male,35,6,0.00,2,1,1,68730.91,yes,0.982021,low_risk
19998,832,California,Male,61,2,0.00,1,0,1,127804.66,no,0.071598,high_risk


Fiddler uses this baseline dataset to keep track of important information about your data.
  
This includes **data types**, **data ranges**, and **unique values** for categorical variables.

---

You can construct a `DatasetInfo` object to be used as **a schema for keeping track of this information** by running the following code block.

In [None]:
dataset_info = fdl.DatasetInfo.from_dataframe(baseline_df, max_inferred_cardinality=100)
dataset_info

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,creditscore,INTEGER,,False,350 - 850
1,geography,CATEGORY,6.0,False,
2,gender,CATEGORY,3.0,False,
3,age,INTEGER,,False,18 - 92
4,tenure,INTEGER,,False,0 - 10
5,balance,FLOAT,,False,"0.0 - 250,900.0"
6,numofproducts,INTEGER,,False,1 - 4
7,hascrcard,INTEGER,,False,0 - 1
8,isactivemember,INTEGER,,False,0 - 1
9,estimatedsalary,FLOAT,,False,"11.58 - 200,000.0"


Then use the client's `upload_dataset` function to send this information to Fiddler!
  
*Just include:*
1. A unique dataset ID
2. The baseline dataset as a pandas DataFrame
3. The `DatasetInfo` object you just created

In [None]:
DATASET_ID = 'churn_data'

client.upload_dataset(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    dataset={
        'baseline': baseline_df
    },
    info=dataset_info
)

Uploading the dataset churn_data ...


{'col_count': 13,
 'log': ['Importing dataset churn_data',
  'Creating table for churn_data',
  'Importing data file: baseline.csv'],
 'row_count': 20000}

If you click on your project, you should now be able to see the newly created dataset on the UI.

<table>
    <tr>
        <td>
            <img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_new_dataset.png" />
        </td>
    </tr>
</table>

## 3. Register your model

Now it's time to register your model with Fiddler.


---


You'll need to specify some more **information about how your model operates**.
  
*Just include:*
1. The **task** your model is performing (regression, binary classification, etc.)
2. The **target** (ground truth) column
3. The **output** (prediction) column
4. The **feature** columns
5. Any **metadata** columns
6. Any **decision** columns (these measures the direct business decisions made as result of the model's prediction)


In [None]:
# Specify task
model_task = 'binary'

if model_task == 'regression':
    model_task = fdl.ModelTask.REGRESSION
    
elif model_task == 'binary':
    model_task = fdl.ModelTask.BINARY_CLASSIFICATION

elif model_task == 'multiclass':
    model_task = fdl.ModelTask.MULTICLASS_CLASSIFICATION

    
# Specify column types
target = 'churn'
outputs = ['predicted_churn']
decision_cols = ['decision']
features = ['geography', 'gender', 'age', 'tenure', 'balance', 'numofproducts', 'hascrcard', 'isactivemember', 'estimatedsalary']
    
# Generate ModelInfo
model_info = fdl.ModelInfo.from_dataset_info(
    dataset_info=dataset_info,
    dataset_id=DATASET_ID,
    model_task=model_task,
    target=target,
    outputs=outputs,
    decision_cols=decision_cols,
    features=features
)
model_info

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,churn,CATEGORY,2,False,

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,geography,CATEGORY,6.0,False,
1,gender,CATEGORY,3.0,False,
2,age,INTEGER,,False,18 - 92
3,tenure,INTEGER,,False,0 - 10
4,balance,FLOAT,,False,"0.0 - 250,900.0"
5,numofproducts,INTEGER,,False,1 - 4
6,hascrcard,INTEGER,,False,0 - 1
7,isactivemember,INTEGER,,False,0 - 1
8,estimatedsalary,FLOAT,,False,"11.58 - 200,000.0"

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,predicted_churn,FLOAT,,False,0.0 - 1.0

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,decision,CATEGORY,2,False,


This information alows Fiddler to build a **surrogate model** on the backend that can provide more insight into your model's performance.
  
*For more information on surrogate models, [click here](http://internal.docs.fiddler.ai.s3-website-us-west-1.amazonaws.com/pages/user-guide/data-science-concepts/explainability/surrogate-models/).*


---

Almost done! Now just specify a unique model ID and use the client's `register_model` function to send this information to Fiddler.

In [None]:
MODEL_ID = 'churn_classifier'

client.register_model(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    model_id=MODEL_ID,
    model_info=model_info
)

Loading dataset info ...
Validating model info ...
Generating a model using the baseline dataset ...
Running tests ...
All tests passed ..
Model output provided in the baseline dataset
Beginning to precache for dataset churn_data with model churn_classifier...

--- Beginning PDP Caching ---
 |[94m██████████████████████████████████████████████████[0m| 100.0% 9/9 Global PDPs Cached--- Finished PDP Caching ---

--- Beginning Impact/Importance Caching ---
 |[94m██████████████████████████████████████████████████[0m| 100.0% Global Features Cached--- Finished Impact/Importance Caching ---

Successfully precached for dataset churn_data with model churn_classifier


'Model successfully registered on Fiddler. \n Visit https://app.fiddler.ai/projects/quickstart_example '

On the project page, you should now be able to see the newly created model.

<table>
    <tr>
        <td>
            <img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_new_model.png" />
        </td>
    </tr>
</table>

## 4. Publish production events

Your model is registered and now it's time to start publishing some production data!  
Fiddler will **monitor this data and compare it to your baseline to generate powerful insights into how your model is behaving**.


---


Each record sent to Fiddler is called **an event**.  
An event is just **a dictionary that maps column names to column values**.
  
Let's load in some sample events from a CSV file.

In [None]:
PATH_TO_EVENTS_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/new-quickstart/content_root/tutorial/quickstart/churn_events.csv'

production_df = pd.read_csv(PATH_TO_EVENTS_CSV)
production_df

Unnamed: 0,creditscore,geography,gender,age,tenure,balance,numofproducts,hascrcard,isactivemember,estimatedsalary,churn,predicted_churn,decision,timestamp
0,559,California,Male,52,2,0.00,1,1,0,129013.59,no,0.007896,low_risk,1628264512281
1,482,California,Male,55,5,97318.25,1,0,1,78416.14,yes,0.885725,high_risk,1628266931481
2,651,Florida,Female,46,4,89743.05,1,1,0,156425.57,no,0.031816,low_risk,1628269350681
3,611,Hawaii,Male,38,7,0.00,1,1,1,63202.00,yes,0.930061,high_risk,1628271769881
4,696,California,Female,33,4,0.00,2,1,1,73371.65,yes,0.999726,high_risk,1628274189081
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
245,781,Hawaii,Female,48,0,57098.96,6,1,0,85644.06,no,0.000916,low_risk,1628857216281
246,797,Hawaii,Female,55,10,0.00,9,1,1,49418.87,no,0.043374,low_risk,1628859635481
247,554,Hawaii,Male,31,1,0.00,7,0,1,192660.55,yes,0.624521,high_risk,1628862054681
248,701,Hawaii,Nonbinary,37,1,0.00,7,1,0,163457.55,yes,0.101539,low_risk,1628864473881


And convert the pandas DataFrame into a dictionary.

In [None]:
production_events = production_df.to_dict(orient='records')
production_events[0]

{'age': 52,
 'balance': 0.0,
 'churn': 'no',
 'creditscore': 559,
 'decision': 'low_risk',
 'estimatedsalary': 129013.59,
 'gender': 'Male',
 'geography': 'California',
 'hascrcard': 1,
 'isactivemember': 0,
 'numofproducts': 1,
 'predicted_churn': 0.00789562591605295,
 'tenure': 2,
 'timestamp': 1628264512281}

You can use the client's `publish_event` function to start pumping data into Fiddler!
  
*Just include:*
1. The event dictionary
2. A timestamp for when the event occurred

In [None]:
from tqdm import tqdm

for event in tqdm(production_events):

    client.publish_event(
        project_id=PROJECT_ID,
        model_id=MODEL_ID,
        event=event,
        event_timestamp=event['timestamp']
    )

100%|██████████| 250/250 [00:30<00:00,  8.17it/s]


## 5. Get insights

**You're all done!**
  
Now just head to your Fiddler URL and start getting enhanced observability into your model's performance.

Run the following code block to get your URL.

In [None]:
print('/'.join([URL, 'projects', PROJECT_ID, 'models', MODEL_ID, 'monitor']))

https://app.fiddler.ai/projects/quickstart_example/models/churn_classifier/monitor


*Please allow 3-5 minutes for monitoring data to populate the charts.*
  
The following screen will be available to you upon completion.
<table>
    <tr>
        <td>
            <img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_monitoring.png" />
        </td>
    </tr>
</table>



---


**Questions?**  
  
Check out [our docs](https://docs.fiddler.ai/) for a more detailed explanation of what Fiddler has to offer.

If you're still looking for answers, fill out a ticket on [our support page](https://fiddlerlabs.zendesk.com/) and we'll get back to you shortly.