# Fiddler working notebook for Churn usecase



1. Connect to Fiddler
2. Upload a baseline dataset
3. Register your model with Fiddler
4. Publish production events
5. Run prediction and explanation on a sample data point

## 0. Imports

In [85]:
# !pip3 install -q fiddler-client;

import numpy as np
import pandas as pd
import fiddler as fdl

print(f"Running client version {fdl.__version__}")

Running client version 1.0.2


## 1. Connect to Fiddler

In [86]:
URL = ''
ORG_ID = ''
AUTH_TOKEN = ''

client = fdl.FiddlerApi(
    url=URL,
    org_id=ORG_ID,
    auth_token=AUTH_TOKEN
)

fiddler.connection INFO client_version = 1.0.2 > server_version = 1.0.1


In [87]:
PROJECT_ID = ''

client.create_project(PROJECT_ID)

{'project_name': 'quickstart_example_2'}

In [88]:
MODEL_ID = ''
DATASET_ID = ''

## 2. Upload a baseline dataset

*For more information on how to design a baseline dataset, [click here](https://docs.fiddler.ai/pages/user-guide/data-science-concepts/monitoring/designing-a-baseline-dataset/).*

In [89]:
PATH_TO_BASELINE_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/new-quickstart/content_root/tutorial/quickstart/churn_baseline.csv'

baseline_df = pd.read_csv(PATH_TO_BASELINE_CSV)
baseline_df

Unnamed: 0,creditscore,geography,gender,age,tenure,balance,numofproducts,hascrcard,isactivemember,estimatedsalary,churn,predicted_churn,decision
0,545,Texas,Male,37,9,110483.86,1,1,1,127394.67,yes,0.897202,low_risk
1,497,Texas,Female,55,7,131778.66,1,1,1,9972.64,yes,0.997441,low_risk
2,509,New York,Female,29,0,107712.57,2,1,1,92898.17,yes,0.920563,low_risk
3,743,Hawaii,Nonbinary,39,6,0.00,2,1,0,44265.28,yes,0.779282,low_risk
4,699,Florida,Female,25,8,0.00,2,1,1,52404.47,yes,0.825474,low_risk
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,686,Texas,Male,39,3,129626.19,2,1,1,103220.56,yes,0.760645,low_risk
19996,446,Massachusetts,Female,45,10,125191.69,1,1,1,128260.86,no,0.216093,high_risk
19997,794,California,Male,35,6,0.00,2,1,1,68730.91,yes,0.982021,low_risk
19998,832,California,Male,61,2,0.00,1,0,1,127804.66,no,0.071598,high_risk


In [90]:
dataset_info = fdl.DatasetInfo.from_dataframe(baseline_df, max_inferred_cardinality=100)
dataset_info

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,creditscore,INTEGER,,False,350 - 850
1,geography,CATEGORY,6.0,False,
2,gender,CATEGORY,3.0,False,
3,age,INTEGER,,False,18 - 92
4,tenure,INTEGER,,False,0 - 10
5,balance,FLOAT,,False,"0.0 - 250,900.0"
6,numofproducts,INTEGER,,False,1 - 4
7,hascrcard,INTEGER,,False,0 - 1
8,isactivemember,INTEGER,,False,0 - 1
9,estimatedsalary,FLOAT,,False,"11.58 - 200,000.0"


In [91]:
client.upload_dataset(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    dataset={
        'baseline': baseline_df
    },
    info=dataset_info
)

fiddler.utils.pandas INFO Writing df with shape (20000, 13) to /var/folders/sq/r44f5gd56nv30kbtz35ddlfr0000gn/T/tmpm32q98d7/baseline.csv.parquet
fiddler.fiddler_api INFO [churn_data] dataset upload: upload and import dataset files
fiddler.fiddler_api INFO Uploading the dataset churn_data ...
fiddler.fiddler_api INFO Dataset uploaded {'col_count': 13, 'row_count': 20000}


{'col_count': 13, 'row_count': 20000}

## 3. Register your model


In [92]:
# Specify task
model_task = 'binary'

if model_task == 'regression':
    model_task = fdl.ModelTask.REGRESSION
    
elif model_task == 'binary':
    model_task = fdl.ModelTask.BINARY_CLASSIFICATION

elif model_task == 'multiclass':
    model_task = fdl.ModelTask.MULTICLASS_CLASSIFICATION

    
# Specify column types
target = 'churn'
outputs = ['predicted_churn']
decision_cols = ['decision']
features = ['geography', 'gender', 'age', 'tenure', 'balance', 'numofproducts', 'hascrcard', 'isactivemember', 'estimatedsalary']
     
# Generate ModelInfo
model_info = fdl.ModelInfo.from_dataset_info(
    dataset_info=dataset_info,
    dataset_id=DATASET_ID,
    model_task=model_task,
    target=target,
    outputs=outputs,
    decision_cols=decision_cols,
    features=features
)
model_info

fiddler.core_objects INFO Using inferred positive class.


Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,churn,CATEGORY,2,False,

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,geography,CATEGORY,6.0,False,
1,gender,CATEGORY,3.0,False,
2,age,INTEGER,,False,18 - 92
3,tenure,INTEGER,,False,0 - 10
4,balance,FLOAT,,False,"0.0 - 250,900.0"
5,numofproducts,INTEGER,,False,1 - 4
6,hascrcard,INTEGER,,False,0 - 1
7,isactivemember,INTEGER,,False,0 - 1
8,estimatedsalary,FLOAT,,False,"11.58 - 200,000.0"

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,predicted_churn,FLOAT,,False,0.0 - 1.0

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,decision,CATEGORY,2,False,


In [93]:
client.register_model(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    model_id=MODEL_ID,
    model_info=model_info
)

Validating model info...
Generating surrogate model...
Testing the deployed model with sample events...
Dataset already has output columns, importing predictions from dataset...
Beginning to precache for dataset churn_data with model churn_classifier...

--- Beginning Impact/Importance Caching ---

 |[94m██████████████████████████████████████████████████[0m| 100.0% Global Features Cached
--- Finished Impact/Importance Caching ---

Successfully precached for dataset churn_data with model churn_classifier
Beginning to cache dataset churn_data...
Pre-caching completed 


## 4. Publish production events

In [38]:
PATH_TO_EVENTS_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/new-quickstart/content_root/tutorial/quickstart/churn_events.csv'

production_df = pd.read_csv(PATH_TO_EVENTS_CSV)
production_df

Unnamed: 0,creditscore,geography,gender,age,tenure,balance,numofproducts,hascrcard,isactivemember,estimatedsalary,churn,predicted_churn,decision,timestamp
0,559,California,Male,52,2,0.00,1,1,0,129013.59,no,0.007448,low_risk,1628610458681
1,482,California,Male,55,5,97318.25,1,0,1,78416.14,yes,0.804852,high_risk,1628612877881
2,651,Florida,Female,46,4,89743.05,1,1,0,156425.57,no,0.012754,low_risk,1628615297081
3,611,Hawaii,Male,38,7,0.00,1,1,1,63202.00,yes,0.882252,high_risk,1628617716281
4,696,California,Female,33,4,0.00,2,1,1,73371.65,yes,0.999736,high_risk,1628620135481
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
245,781,Hawaii,Female,48,0,57098.96,6,1,0,85644.06,no,0.032330,low_risk,1629203162681
246,797,Hawaii,Female,55,10,0.00,9,1,1,49418.87,no,0.020316,low_risk,1629205581881
247,554,Hawaii,Male,31,1,0.00,7,0,1,192660.55,yes,0.269628,low_risk,1629208001081
248,701,Hawaii,Nonbinary,37,1,0.00,7,1,0,163457.55,yes,0.769625,high_risk,1629210420281


In [40]:
production_events = production_df.to_dict(orient='records')
production_events[0]

{'creditscore': 559,
 'geography': 'California',
 'gender': 'Male',
 'age': 52,
 'tenure': 2,
 'balance': 0.0,
 'numofproducts': 1,
 'hascrcard': 1,
 'isactivemember': 0,
 'estimatedsalary': 129013.59,
 'churn': 'no',
 'predicted_churn': 0.0074475368963339075,
 'decision': 'low_risk',
 'timestamp': 1628610458681}

In [41]:
from tqdm import tqdm

for event in tqdm(production_events):

    client.publish_event(
        project_id=PROJECT_ID,
        model_id=MODEL_ID,
        event=event,
        event_timestamp=event['timestamp']
    )

100%|██████████| 250/250 [00:29<00:00,  8.34it/s]


### Explanation

We will filter some 'false negative' events to run point explanations for them in order to check feature attribution

In [51]:
# False Negative Events
fn_events = []
for event in production_events:
    if event['geography'] == 'Hawaii' and event['predicted_churn'] < 0.5 and event['churn'] == 'yes':
        fn_events.append(event)

In [60]:
fn_df = pd.DataFrame(fn_events)
fn_df

Unnamed: 0,creditscore,geography,gender,age,tenure,balance,numofproducts,hascrcard,isactivemember,estimatedsalary,churn,predicted_churn,decision,timestamp,__timestamp_format,__event_type,__occurred_at
0,611,Hawaii,Female,40,2,125879.29,4,1,0,93203.43,yes,0.070683,low_risk,1629060429881,infer,execution_event,1629060429881
1,725,Hawaii,Male,39,4,160652.45,5,1,0,57643.55,yes,0.127987,low_risk,1629074945081,infer,execution_event,1629074945081
2,567,Hawaii,Male,42,2,0.0,5,1,1,167984.61,yes,0.365708,low_risk,1629103975481,infer,execution_event,1629103975481
3,643,Hawaii,Male,62,9,0.0,5,0,0,155870.82,yes,0.015485,low_risk,1629108813881,infer,execution_event,1629108813881
4,718,Hawaii,Female,43,0,93143.39,4,1,0,167554.86,yes,0.030118,low_risk,1629111233081,infer,execution_event,1629111233081
5,556,Hawaii,Female,39,9,89588.35,4,1,1,94898.1,yes,0.236813,low_risk,1629116071481,infer,execution_event,1629116071481
6,773,Hawaii,Nonbinary,64,2,145578.28,4,0,1,186172.85,yes,0.07185,low_risk,1629128167481,infer,execution_event,1629128167481
7,541,Hawaii,Female,32,4,0.0,6,1,1,114951.42,yes,0.068371,low_risk,1629130586681,infer,execution_event,1629130586681
8,542,Hawaii,Female,39,4,109949.39,7,1,1,41268.65,yes,0.04149,low_risk,1629171713081,infer,execution_event,1629171713081
9,668,Hawaii,Male,38,10,86977.96,6,0,1,37094.75,yes,0.411313,low_risk,1629178970681,infer,execution_event,1629178970681


Sample a 'false negative' data point

In [80]:
sample_data_point = fn_df.iloc[[8]]

Run prediction for sample data point

In [82]:
# Run prediction for sample data point
client.run_model(
    project_id=PROJECT_ID,
    model_id=MODEL_ID,
    df=sample_data_point
)

Unnamed: 0,predicted_churn
0,0.04149


Run explanation for sample 'false negative' data point

In [84]:
client.run_explanation(
    project_id=PROJECT_ID,
    model_id=MODEL_ID,
    dataset_id=DATASET_ID,
    df=sample_data_point
)

AttributionExplanation(algorithm='shap', inputs=['age', 'balance', 'estimatedsalary', 'gender', 'geography', 'hascrcard', 'isactivemember', 'numofproducts', 'tenure'], attributions=[0.00444427174387102, -0.03178619157561093, -0.07114516377051001, -0.032126974426231636, 0.043653841315491194, 0.010361995778015979, 0.0380248757805022, -0.7129060398907102, -0.0025929853325551053], misc={'explanation_lower': {'age': -0.0023765110133561196, 'balance': -0.04641219828679427, 'estimatedsalary': -0.08534051546136934, 'gender': -0.04151580127849941, 'geography': 0.042539187835521575, 'hascrcard': 0.006878595533772458, 'isactivemember': 0.026540324525484382, 'numofproducts': -0.7279162804602579, 'tenure': -0.012244372671043446}, 'explanation_std': {'age': 0.005974943957484252, 'balance': 0.012693312062022918, 'estimatedsalary': 0.016880004978495334, 'gender': 0.009498333308555174, 'geography': 0.0009701708329056174, 'hascrcard': 0.0030516087355113727, 'isactivemember': 0.011189199987703975, 'numof

Run feature importance for sample data point

In [78]:
client.run_feature_importance(
    project_id=PROJECT_ID,
    model_id=MODEL_ID,
    dataset_id=DATASET_ID
)

FeatureImportanceResults(ci_confidence_level=0.95, feature_names=['geography', 'gender', 'age', 'tenure', 'balance', 'numofproducts', 'hascrcard', 'isactivemember', 'estimatedsalary'], fixed_sample_ci=None, loss='pointwise_logloss', mean_loss=0.07543815162504038, mean_loss_ci=0.003231594655949242, mean_loss_increase_importance=[0.07214915005755727, 0.044836352361263015, 0.48837756923295245, 0.09504251515246319, 0.31927190611468803, 0.2734322647223927, 0.026608276121039297, 0.12077811707004531, 0.26818795792984473], n_cycles=1, n_inputs=10000, n_references=10000, random_sample_ci=[0.006870845396021482, 0.004945157651884891, 0.024500068496164953, 0.007457963731120429, 0.01842551694132254, 0.017830363695645154, 0.003481093275980086, 0.010594410084035251, 0.01608175413927711], seed=0)

## 5. Get insights

**You're all done!**

Run the following code block to get your URL.

In [None]:
print('/'.join([URL, 'projects', PROJECT_ID, 'models', MODEL_ID, 'monitor']))

*Please allow 3-5 minutes for monitoring data to populate the charts.*



---


**Questions?**  
  
Check out [our docs](https://docs.fiddler.ai/) for a more detailed explanation of what Fiddler has to offer.

If you're still looking for answers, fill out a ticket on [our support page](https://fiddlerlabs.zendesk.com/) and we'll get back to you shortly.