# Fiddler Customer Churn

## Model Registration

In [2]:
import numpy as np
import pandas as pd
import fiddler as fdl

print(f"Running client version {fdl.__version__}")

Running client version 1.0.2


## 1. Connect to Fiddler

Before you can register your model with Fiddler, you'll need to connect using our API client.


---


**We need a few pieces of information to get started.**
1. The URL you're using to connect to Fiddler

In [3]:
URL = ''

2. Your organization ID
3. Your authorization token

Both of these can be found by clicking the URL you entered and navigating to the **Settings** page.

In [4]:
ORG_ID = ''
AUTH_TOKEN = ''

In [5]:
client = fdl.FiddlerApi(
    url=URL,
    org_id=ORG_ID,
    auth_token=AUTH_TOKEN
)

fiddler.connection INFO client_version = 1.0.2 > server_version = 1.0.1


In [27]:
PROJECT_ID = ''

client.create_project(PROJECT_ID)

{'project_name': 'airflow_churn_example_4'}

## 2. Upload a baseline dataset

In this example, we'll be considering the case where we're a bank and we have **a model that predicts churn for our customers**.  
We want to know when our model's predictions start to drift—that is, **when churn starts to increase** within our customer base.
  
In order to get insights into the model's performance, **Fiddler needs a small  sample of data that can serve as a baseline** for making comparisons with data in production.


---


*For more information on how to design a baseline dataset, [click here](https://docs.fiddler.ai/pages/user-guide/data-science-concepts/monitoring/designing-a-baseline-dataset/).*

In [28]:
MODEL_ID = ''
DATASET_ID = ''

In [29]:
PATH_TO_BASELINE_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/new-quickstart/content_root/tutorial/quickstart/churn_baseline.csv'

baseline_df = pd.read_csv(PATH_TO_BASELINE_CSV)
baseline_df

Unnamed: 0,creditscore,geography,gender,age,tenure,balance,numofproducts,hascrcard,isactivemember,estimatedsalary,churn,predicted_churn,decision
0,545,Texas,Male,37,9,110483.86,1,1,1,127394.67,yes,0.897202,low_risk
1,497,Texas,Female,55,7,131778.66,1,1,1,9972.64,yes,0.997441,low_risk
2,509,New York,Female,29,0,107712.57,2,1,1,92898.17,yes,0.920563,low_risk
3,743,Hawaii,Nonbinary,39,6,0.00,2,1,0,44265.28,yes,0.779282,low_risk
4,699,Florida,Female,25,8,0.00,2,1,1,52404.47,yes,0.825474,low_risk
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,686,Texas,Male,39,3,129626.19,2,1,1,103220.56,yes,0.760645,low_risk
19996,446,Massachusetts,Female,45,10,125191.69,1,1,1,128260.86,no,0.216093,high_risk
19997,794,California,Male,35,6,0.00,2,1,1,68730.91,yes,0.982021,low_risk
19998,832,California,Male,61,2,0.00,1,0,1,127804.66,no,0.071598,high_risk


In [30]:
dataset_info = fdl.DatasetInfo.from_dataframe(baseline_df, max_inferred_cardinality=100)
dataset_info

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,creditscore,INTEGER,,False,350 - 850
1,geography,CATEGORY,6.0,False,
2,gender,CATEGORY,3.0,False,
3,age,INTEGER,,False,18 - 92
4,tenure,INTEGER,,False,0 - 10
5,balance,FLOAT,,False,"0.0 - 250,900.0"
6,numofproducts,INTEGER,,False,1 - 4
7,hascrcard,INTEGER,,False,0 - 1
8,isactivemember,INTEGER,,False,0 - 1
9,estimatedsalary,FLOAT,,False,"11.58 - 200,000.0"


In [31]:
client.upload_dataset(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    dataset={
        'baseline': baseline_df
    },
    info=dataset_info
)

fiddler.utils.pandas INFO Writing df with shape (20000, 13) to /var/folders/sq/r44f5gd56nv30kbtz35ddlfr0000gn/T/tmp9ivoyz61/baseline.csv.parquet
fiddler.fiddler_api INFO [churn_data] dataset upload: upload and import dataset files
fiddler.fiddler_api INFO Uploading the dataset churn_data ...
fiddler.fiddler_api INFO Dataset uploaded {'col_count': 13, 'row_count': 20000}


{'col_count': 13, 'row_count': 20000}

In [32]:
# Specify task
model_task = 'binary'

if model_task == 'regression':
    model_task = fdl.ModelTask.REGRESSION
    
elif model_task == 'binary':
    model_task = fdl.ModelTask.BINARY_CLASSIFICATION

elif model_task == 'multiclass':
    model_task = fdl.ModelTask.MULTICLASS_CLASSIFICATION

    
# Specify column types
target = 'churn'
outputs = ['predicted_churn']
decision_cols = ['decision']
features = ['geography', 'gender', 'age', 'tenure', 'balance', 'numofproducts', 'hascrcard', 'isactivemember', 'estimatedsalary']
    
# Generate ModelInfo
model_info = fdl.ModelInfo.from_dataset_info(
    dataset_info=dataset_info,
    dataset_id=DATASET_ID,
    model_task=model_task,
    target=target,
    outputs=outputs,
    decision_cols=decision_cols,
    features=features
)
model_info

fiddler.core_objects INFO Using inferred positive class.


Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,churn,CATEGORY,2,False,

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,geography,CATEGORY,6.0,False,
1,gender,CATEGORY,3.0,False,
2,age,INTEGER,,False,18 - 92
3,tenure,INTEGER,,False,0 - 10
4,balance,FLOAT,,False,"0.0 - 250,900.0"
5,numofproducts,INTEGER,,False,1 - 4
6,hascrcard,INTEGER,,False,0 - 1
7,isactivemember,INTEGER,,False,0 - 1
8,estimatedsalary,FLOAT,,False,"11.58 - 200,000.0"

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,predicted_churn,FLOAT,,False,0.0 - 1.0

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,decision,CATEGORY,2,False,


## 3. Register your model

Now it's time to register your model with Fiddler.


---


You'll need to specify some more **information about how your model operates**.
  
*Just include:*
1. The **task** your model is performing (regression, binary classification, etc.)
2. The **target** (ground truth) column
3. The **output** (prediction) column
4. The **feature** columns
5. Any **metadata** columns
6. Any **decision** columns (these measures the direct business decisions made as result of the model's prediction)


In [33]:
MODEL_ID = 'churn_classifier'

client.register_model(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    model_id=MODEL_ID,
    model_info=model_info
)

Validating model info...
Generating surrogate model...
Testing the deployed model with sample events...
Dataset already has output columns, importing predictions from dataset...
Beginning to precache for dataset churn_data with model churn_classifier...

--- Beginning Impact/Importance Caching ---

 |[94m██████████████████████████████████████████████████[0m| 100.0% Global Features Cached
--- Finished Impact/Importance Caching ---

Successfully precached for dataset churn_data with model churn_classifier
Beginning to cache dataset churn_data...
Pre-caching completed 
