# Fiddler Quick Start Guide for Explainability (XAI) with Surrogate Models

Fiddler is not only a powerful observability tool for monitoring the health of your ML models in production but also an explainability tool to peak into your black box models. With the ability to **point explain** and **global explain** your model, Fiddler provides powerful visualizations that can explain your model's behavior. 


---


You can start exploring Fiddler's XAI capabilities by following these five quick steps:

1. Connect to Fiddler
2. Upload a baseline dataset
3. Add your model details to Fiddler
4. Either upload a model artifact or use Fiddler generated surrogate model
5. Get insights

## 0. Imports

In [21]:
!pip install -q fiddler-client;

import numpy as np
import pandas as pd
import fiddler as fdl
import time as time

print(f"Running client version {fdl.__version__}")

Running client version 1.5.0


## 1. Connect to Fiddler

Before you can register your model with Fiddler, you'll need to connect using our API client.


---


**We need a few pieces of information to get started.**
1. The URL you're using to connect to Fiddler

In [2]:
URL = '' # Make sure to include the full URL (including https://).

In [3]:
ORG_ID = ''
AUTH_TOKEN = ''

2. Your organization ID
3. Your authorization token

Both of these can be found by clicking the URL you entered and navigating to the **Settings** page.

<table>
    <tr>
        <td><img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_settings_page_numbered.png" /></td>
        <td><img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_org_id_numbered.png" /></td>
    </tr>
    <tr>
        <td><img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_new_key_numbered.png" /></td>
        <td><img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_auth_token_numbered.png" /></td>
    </tr>
</table>

Now just run the following code block to connect to the Fiddler API!

In [4]:
client = fdl.FiddlerApi(
    url=URL,
    org_id=ORG_ID,
    auth_token=AUTH_TOKEN
)

Once you connect, you can create a new project by specifying a unique project ID in the client's `create_project` function.

In [13]:
PROJECT_ID = 'quickstart_xai'

client.create_project(PROJECT_ID)

{'project_name': 'quickstart_xai100'}

You should now be able to see the newly created project on the UI.

<table>
    <tr>
        <td>
            <img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/xai_project_list.png" />
        </td>
    </tr>
</table>

## 2. Upload a baseline dataset

In this example, we'll be considering the case where we're a bank and we have **a model that predicts churn for our customers**.  
We want to explain our model's predictions and **understand the features that impact model predictions** the most.
  
In order to get explainability insights, **Fiddler needs to fiddle with your model**. To do so, we need to add your model details. This includes information about the data used by your model. So, we first start with uploading a small sample of data that can serve as a baseline.


---


*For more information on how to design a baseline dataset, [click here](https://docs.fiddler.ai/pages/user-guide/data-science-concepts/monitoring/designing-a-baseline-dataset/).*

In [14]:
PATH_TO_BASELINE_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/churn_baseline.csv'

baseline_df = pd.read_csv(PATH_TO_BASELINE_CSV)
baseline_df

Unnamed: 0,customer_id,creditscore,geography,gender,age,tenure,balance,numofproducts,hascrcard,isactivemember,estimatedsalary,predicted_churn,decision,churn
0,27acd1c2,545,Texas,Male,37,9,110483.86,1,1,1,127394.67,0.897202,high_risk,yes
1,27b36d0c,497,Texas,Female,55,7,131778.66,1,1,1,9972.64,0.997441,high_risk,yes
2,27b5360a,509,New York,Female,29,0,107712.57,2,1,1,92898.17,0.920563,high_risk,yes
3,27b5d650,743,Hawaii,Nonbinary,39,6,0.00,2,1,0,44265.28,0.779282,high_risk,yes
4,27b236a8,699,Florida,Female,25,8,0.00,2,1,1,52404.47,0.825474,high_risk,yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,27b409ba,686,Texas,Male,39,3,129626.19,2,1,1,103220.56,0.760645,high_risk,yes
19996,27aaff96,446,Massachusetts,Female,45,10,125191.69,1,1,1,128260.86,0.216093,low_risk,no
19997,27ad3162,794,California,Male,35,6,0.00,2,1,1,68730.91,0.982021,high_risk,yes
19998,27b076ce,832,California,Male,61,2,0.00,1,0,1,127804.66,0.071598,low_risk,no


Fiddler uses this baseline dataset to keep track of important information about your data.
  
This includes **data types**, **data ranges**, and **unique values** for categorical variables.

---

You can construct a `DatasetInfo` object to be used as **a schema for keeping track of this information** by running the following code block.

In [15]:
dataset_info = fdl.DatasetInfo.from_dataframe(baseline_df, max_inferred_cardinality=100)
dataset_info

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,customer_id,STRING,,False,
1,creditscore,INTEGER,,False,350 - 850
2,geography,CATEGORY,6.0,False,
3,gender,CATEGORY,3.0,False,
4,age,INTEGER,,False,18 - 92
5,tenure,INTEGER,,False,0 - 10
6,balance,FLOAT,,False,"0.0 - 250,900.0"
7,numofproducts,INTEGER,,False,1 - 4
8,hascrcard,INTEGER,,False,0 - 1
9,isactivemember,INTEGER,,False,0 - 1


Then use the client's `upload_dataset` function to send this information to Fiddler!
  
*Just include:*
1. A unique dataset ID
2. The baseline dataset as a pandas DataFrame
3. The `DatasetInfo` object you just created

In [16]:
DATASET_ID = 'churn_data'

client.upload_dataset(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    dataset={
        'baseline': baseline_df
    },
    info=dataset_info
)

{'uuid': '0f030210-bac6-4cc7-bdec-dbf3eaa427d8',
 'name': 'Ingestion dataset Upload',
 'info': {'project_name': 'quickstart_xai100',
  'resource_name': 'churn_data',
  'resource_type': 'DATASET'},
 'status': 'SUCCESS',
 'progress': 100.0,
 'error_message': None}

If you click on your project, you should now be able to see the newly created dataset on the UI.

<!-- <table>
    <tr>
        <td>
            <img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_new_dataset.png" />
        </td>
    </tr>
</table> -->

## 3. Add information about your model

Now it's time to add details about your model with Fiddler. We do so by first creating a **ModelInfo Object** that helps Fiddler understand **how your model operates**.
  
*Just include:*
1. The **task** your model is performing (regression, binary classification, etc.)
2. The **target** (ground truth) column
3. The **output** (prediction) column
4. The **feature** columns
5. Any **metadata** columns
6. Any **decision** columns (these measures the direct business decisions made as result of the model's prediction)


In [17]:
# Specify task
model_task = 'binary'

if model_task == 'regression':
    model_task = fdl.ModelTask.REGRESSION
    
elif model_task == 'binary':
    model_task = fdl.ModelTask.BINARY_CLASSIFICATION

elif model_task == 'multiclass':
    model_task = fdl.ModelTask.MULTICLASS_CLASSIFICATION

    
# Specify column types
target = 'churn'
outputs = ['predicted_churn']
decision_cols = ['decision']
features = ['geography', 'gender', 'age', 'tenure', 'balance', 'numofproducts', 'hascrcard', 'isactivemember', 'estimatedsalary']
    
# Generate ModelInfo
model_info = fdl.ModelInfo.from_dataset_info(
    dataset_info=dataset_info,
    dataset_id=DATASET_ID,
    model_task=model_task,
    target=target,
    outputs=outputs,
    decision_cols=decision_cols,
    features=features
)
model_info

No `binary_classification_threshold` specified, defaulting to 0.5


Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,churn,CATEGORY,2,False,

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,geography,CATEGORY,6.0,False,
1,gender,CATEGORY,3.0,False,
2,age,INTEGER,,False,18 - 92
3,tenure,INTEGER,,False,0 - 10
4,balance,FLOAT,,False,"0.0 - 250,900.0"
5,numofproducts,INTEGER,,False,1 - 4
6,hascrcard,INTEGER,,False,0 - 1
7,isactivemember,INTEGER,,False,0 - 1
8,estimatedsalary,FLOAT,,False,"11.58 - 200,000.0"

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,predicted_churn,FLOAT,,False,0.0 - 1.0

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,decision,CATEGORY,2,False,


After ModelInfo object is created to save your model information, use the client's *add_model* call to add the generated details about your model. 

**Note:** You will need to specify a unique model ID.


<!-- alows Fiddler to build a **surrogate model** on the backend that can provide more insight into your model's performance.
  
*For more information on surrogate models, [click here](https://docs.fiddler.ai/docs/surrogate-models).*


---

Almost done! Now just specify a unique model ID and use the client's `register_model` function to send this information to Fiddler. -->

In [18]:
MODEL_ID = 'churn_classifier'

client.add_model(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    model_id=MODEL_ID,
    model_info=model_info
)

Start initialize monitoring
Init monitoring succeeded:JOB UUID: 8ef70fca-8f11-4873-a1ab-0d239b0ba86a task id: d31c46f3-d302-4d4c-a26d-c5723cb549f2 result: {'result': 'SKETCH GENERATION RESULTS: \nNo event-weighted HISTOGRAM sketch generated, which is only applicable for data with class imbalance.\n No event-weighted FREQUENCY sketch generated, which is only applicable for data with class imbalance.\n No event-weighted NULL_COUNT sketch generated, which is only applicable for data with class imbalance.'}


On the project page, you should now be able to see the newly created model.

<!-- <table>
    <tr>
        <td>
            <img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_new_model.png" />
        </td>
    </tr>
</table> -->

## 4. Either upload your own model or generate a surrogate model

With the above step, your model is added to Fiddler which means that for a given *project_id*, your given *model_id* now holds *ModelInfo* about the model you care about. 

In order to be able to run predictions for explainability analysis, however, you will need to upload your model file. If you just want to explore the XAI capabilities without providing your model to Fiddler, you can also generate a surrogate model which tries to mimic your model based on the details provided. 

In this quickstart, we will go with generating a surrogate model based on the information (ModelInfo) provided above.

In [19]:
client.add_model_surrogate(
    project_id=PROJECT_ID,
    model_id=MODEL_ID
)

Validating model info...
Generating surrogate model...
Testing the deployed model with sample events...
Dataset already has output columns; predictions already imported...
Beginning to precache for dataset churn_data with model churn_classifier...

--- Beginning Impact/Importance Caching ---

 |[94m██████████████████████████████████████████████████[0m| 100.0% Global Features Cached
--- Finished Impact/Importance Caching ---

Successfully precached for dataset churn_data with model churn_classifier
Pre-caching completed 


In [22]:
PATH_TO_EVENTS_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/churn_events.csv'

production_df = pd.read_csv(PATH_TO_EVENTS_CSV)
# Shift the timestamps of the production events to be as recent as today 
production_df['timestamp'] = production_df['timestamp'] + (int(time.time() * 1000) - production_df['timestamp'].max())

In [23]:
client.publish_events_batch(
    project_id=PROJECT_ID,
    model_id=MODEL_ID,
    batch_source=production_df,
    timestamp_field='timestamp',
    id_field='customer_id' # Optional
)

{'status': 202,
 'job_uuid': '69b6e8e6-ef6a-4e32-9339-4febf54c1f46',
 'files': ['tmpz_59p642.csv'],
 'message': 'Successfully received the event data. Please allow time for the event ingestion to complete in the Fiddler platform.'}

## 5. Get insights

**You're all done!**
  
You can head to your Fiddler URL and start getting enhanced monitoring and explainability into the surrogate model.

Run the following code block to get your URL.

In [24]:
print('/'.join([URL, 'projects', PROJECT_ID, 'models', MODEL_ID, 'explain']))

https://mainbuild.dev.fiddler.ai/projects/quickstart_xai100/models/churn_classifier/explain


The following screen will be available to you upon completion.
<table>
    <tr>
        <td>
            <img src="https://fiddler-nb-assets.s3.us-west-1.amazonaws.com/qs_monitoring.png" />
        </td>
    </tr>
</table>

You can also run explanations from the client

In [25]:
#slice to run explanation on
explain_df = production_df[1:2]
explain_df

Unnamed: 0,customer_id,creditscore,geography,gender,age,tenure,balance,numofproducts,hascrcard,isactivemember,estimatedsalary,predicted_churn,decision,churn,timestamp
1,27c35cee,482,California,Male,55,5,97318.25,1,0,1,78416.14,0.804852,high_risk,yes,1670888548757


In [26]:
explanation = client.run_explanation(
    project_id=PROJECT_ID,
    model_id=MODEL_ID,
    dataset_id=DATASET_ID,
    df=explain_df
)

In [27]:
explanation

AttributionExplanation(algorithm='shap', inputs=['age', 'balance', 'estimatedsalary', 'gender', 'geography', 'hascrcard', 'isactivemember', 'numofproducts', 'tenure'], attributions=[-0.22829719896470613, 0.05395405345442672, 0.05821722432631347, 0.02592202192643008, 0.04889003726989818, 0.02747556207772711, 0.1840908533648324, -0.049035284568693584, 0.012210490163388729], misc={'background_dataset_size': 48, 'explanation_lower': {'age': -0.243328429925663, 'balance': 0.04439245878936571, 'estimatedsalary': 0.04956526927024543, 'gender': 0.02319085058945426, 'geography': 0.03921942993948388, 'hascrcard': 0.022864587965403943, 'isactivemember': 0.17853620456961852, 'numofproducts': -0.056895896255301254, 'tenure': 0.008863213995403421}, 'explanation_std': {'age': 0.015850668593051072, 'balance': 0.010129514348764366, 'estimatedsalary': 0.009736111895193158, 'gender': 0.003908115664891616, 'geography': 0.008640506622363618, 'hascrcard': 0.004068259835999008, 'isactivemember': 0.0092625444

In [28]:
feature_importance = client.run_feature_importance(
    project_id=PROJECT_ID,
    model_id=MODEL_ID,
    dataset_id=DATASET_ID
)

In [29]:
feature_importance

FeatureImportanceResults(all_obs_input_df_size=10035, all_obs_reference_df_size=10035, ci_level=0.95, feature_names=['geography', 'gender', 'age', 'tenure', 'balance', 'numofproducts', 'hascrcard', 'isactivemember', 'estimatedsalary'], fixed_sample_ci=None, loss='pointwise_logloss', mean_loss=0.09957577539672471, mean_loss_ci=0.004857541008375612, mean_loss_increase_importance=[0.049955023407920345, 0.02807683033154858, 0.3966555059081438, 0.07513816389882265, 0.23996714417410486, 0.24323690839422388, 0.015668615204968435, 0.10479268500307345, 0.1999466006826271], n_references=10035, non_null_input_df_size=10035, non_null_reference_df_size=10035, random_sample_ci=[0.005341348769419479, 0.003268551190327656, 0.020154451032073955, 0.005777392606402853, 0.014228543199846922, 0.015509464497990658, 0.0020680140934179582, 0.009650673987729818, 0.012346292654041692])



---


**Questions?**  
  
Check out [our docs](https://docs.fiddler.ai/) for a more detailed explanation of what Fiddler has to offer.

If you're still looking for answers, fill out a ticket on [our support page](https://fiddlerlabs.zendesk.com/) and we'll get back to you shortly.