# Initialize Fiddler Client

This python client is a powerful way to:
- Upload the dataset and model to Fiddler
- Ingest production events to Fiddler

This can be done from a Jupyter Notebook or any python editor that you use to load data and build models.

<img src="images/fiddler_client.png" width=600 height=600 />

First, we need to initialize the client object by specifying:
- The url: url is the fiddler URL that you have been provided to access. Usually of the form ‘XXXXX.fiddler.ai’. Contact Fiddler if you don’t have it
- The org_id: organization id is an identifier for the account. See Fiddler_URL/settings/general to find this id (listed as "Organization ID")
<img src="images/org_id.png" width=800 height=800 />
- The auth_token: this token is used to authenticate access. See Fiddler_URL/settings/credentials to find, create, or change this token
<img src="images/auth_token.png" width=800 height=800 />

You can also save this config as a file called fiddler.ini in the same folder as the notebook/script. That saves you from specifying the parameters in every notebook and script.
<img src="images/fiddler_ini.png" width=800 height=800 />


In [None]:
import fiddler as fdl

url = 'http://xxx.fiddler.ai'
token = 'my_token'
org_id = 'my_org_id'

client = fdl.FiddlerApi(url=url, org_id=org_id, auth_token=token)

Fiddler has three primary constructs, namely projects, datasets and models. This diagram illustrates the relationship between the three.
<img src="images/projects_data_models.png" width=600 height=600 />

The Fiddler client provides a number of methods.
- List datasets: ```client.list_datasets()``` List the ids of all datasets in the org.
- List projects: ```client.list_projects()``` List the ids of all projects in the org.
- List models: ```client.list_models()``` List the names of all models in a project.
- Create project: ```client.create_project()``` Create a new project.
- Create model: ```client.create_model()``` Trigger auto-modeling on a dataset already uploaded to Fiddler.
- Get dataset info: ```client.get_dataset_info()``` Get DatasetInfo for a dataset.
- Get model info: ```client.get_model_info()``` Get ModelInfo for a model in a certain project.
- Get dataset: ```client.get_dataset()``` Fetches data from a dataset on Fiddler.
- Get slice: ```client.get_slice()``` Fetches data from Fiddler via a slice query (SQL query).
- Delete dataset: ```client.delete_dataset()``` Permanently delete a dataset.
- Delete model: ```client.delete_model()``` Permanently delete a model.
- Delete model artifacts: ```client.delete_model_artifacts()``` Permanently delete a model artifacts.
- Delete project: ```client.delete_project()``` Permanently delete a project.
- Upload dataset: ```client.upload_dataset()``` Uploads a dataset to the Fiddler engine.
- Upload dataset from a directory: ```client.upload_dataset_from_dir()``` Uploads a dataset from a directory to the Fiddler engine.
- Run model: ```client.run_model()``` Executes a model in the Fiddler engine on a DataFrame.
- Run explanation: ```client.run_explanation()``` Explains a model's prediction on a single instance.
- Run feature importance: ```client.run_feature_importance()``` Get global feature importance for a model over a dataset.
- Upload model sklearn: ```client.upload_model_sklearn()``` Uploads a subclass of sklearn.base.BaseEstimator to the Fiddler engine.
- Upload model package: ```client.upload_model_package()``` Uploads a custom model object to the Fiddler engine along with custom glue-code for running the model.
- Publish event: ```client.publish_event()``` Publishes an event to Fiddler Service.

In [None]:
project_id = 'tf_tabular'
dataset_id = 'heart_disease'
model_id = 'heart_disease_tf'

# Create Project

Here we will create a project, a convenient container for housing the models and datasets associated with a given ML use case.

In [None]:
# Creating our project using project_id
if project_id not in client.list_projects():
    client.create_project(project_id)

# Load dataset

Load the data you are going to use for training your model.

In [None]:
import pandas as pd
df = pd.read_csv('/app/fiddler_samples/samples/datasets/heart_disease/data.csv')

In [None]:
df.head()

# Upload dataset

To upload a model, you first need to upload a sample of the data of the model’s inputs, targets, and additional metadata that might be useful for model analysis. This data sample helps us (among other things) to infer the model schema and the data types and values range of each feature.
- This sample has to be a flat table that can be loaded as a pandas DF (```upload_dataset()```) or saved as a csv (```upload_dataset_from_dir()```).
- In this example age, sex, trestbps, chol, fbs, thalach, exang, oldpeak, slope are input features, and target is the target column for the model.
- This input data sample is used for many downstream functions in Fiddler
    - Shapley value methods - background data to simulate the missing of features
    - What-if (ICE) plots - background data
    - PDP plots - background data
    - Drift - to serve as a baseline
    - Outliers - to serve as a baseline
    - Data integrity - to serve as a baseline
- We suggest uploading a sample of the model’s training data as it’s the most meaningful for the tasks listed above. For example, model outliers should be ideally based on the training data as that’s the data the model has seen. 
- You can upload multiple datasets with string identifiers, but we currently do not ascribe any meaning to those. For example: ```dataset={'data': df}``` or ```dataset={'train': train_df, 'test': test_df}```.
- Currently we support two input types:
    - Tabular
    - Single string text, meaning text data in a single column

In [None]:
df_schema = fdl.DatasetInfo.from_dataframe(df, max_inferred_cardinality=10)

In [None]:
if dataset_id  not in client.list_datasets(project_id):
    upload_result = client.upload_dataset(
        project_id=project_id,
        dataset={'data': df}, 
        dataset_id=dataset_id,
        info=df_schema)

# Create model schema

As you must have noted, in the dataset upload step we did not ask for the model’s features and targets, or any model specific information. That’s because we allow for linking multiple models to a given dataset schema. Hence we require an Infer model schema step which helps us know the features relevant to the model and the model task. Here you can specify the input features, the target column, decision columns and metadata columns, and also the type of model.
- Currently we support only one target column. This is not to be confused with output columns, which can be more than one. 
- Decision columns specify the decisions made on the basis of the model’s predictions. For example, in a credit lending scenario, the business decision to give or not to give a loan based on the model’s output. This is helpful while monitoring models after deployment, to keep track of the business impact of the model.
- Metadata is data that is not used by the model, but can be relevant for understanding the model’s behavior on different segments of the data. For example, gender, race, age and other such sensitive features may not be used in the model, but we can analyze along these dimensions post facto to understand if the model is biased.
- We can infer the model task from the target column, or it can explicitly set. Currently we support three model types:
    - Regression
    - Binary Classification
    - Multi-class Classification

In [None]:
target = 'target'
feature_columns = list(df.drop(columns=['target']).columns)

model_info = fdl.ModelInfo.from_dataset_info(
    dataset_info=client.get_dataset_info(project_id, 'heart_disease'),
    target=target, 
    features=feature_columns,
    display_name='Keras Tabular IG',
    description='this is a keras model using tabular data and IG enabled from tutorial',
    model_task=fdl.ModelTask.BINARY_CLASSIFICATION,
    preferred_explanation_method=fdl.ExplanationMethod.IG_FLEX
)
model_info

## Install TensorFlow if necessary

Currently, we support Sklearn version 0.21.2 and TF version 1.14  
If you have another version, please contact Fiddler for assistance.

In [None]:
import tensorflow as tf

assert tf.__version__=='2.5.0', 'Please change tensorflow version to 2.5.0'

In [None]:
# !pip install tensorflow==2.5

# Train model

Build and train your model.

In [None]:
train_input = df.drop(columns=['target'])
train_target = df[target]

In [None]:
inputs = tf.keras.Input(shape=(train_input.shape[1], ))
activations = tf.keras.layers.Dense(32, activation='linear', use_bias=True)(inputs)
activations = tf.keras.layers.Dense(128, activation=tf.nn.relu, use_bias=True)(activations)
activations = tf.keras.layers.Dense(128, activation=tf.nn.relu, use_bias=True)(activations)
activations = tf.keras.layers.Dense(1, activation='sigmoid', use_bias=True)(activations)
model = tf.keras.Model(inputs=inputs, outputs=activations, name='keras_model')

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss='binary_crossentropy',
    metrics=['accuracy']
)

model.fit(train_input, train_target.values, batch_size=32, epochs=8)

In [None]:
model.evaluate(train_input, train_target) 

# Save model

Next step, we need to save the model and any pre-processing step you had on the input features (for example Categorical encoder, Tokenization, ...).  
We currently support the following stored model formats:
- For sklearn API based models, pickled models, or any storage format that you can load in the package.py (details below).
- For TF, we support TF Saved Model and Keras .h5   

Note:
- Keras models have to have their input tensor differentiable if Integrated Gradients support is desired
- We also need to save the data preprocessing pipeline code, if any. This will be accessed in the package.py

In [None]:
import pathlib
import shutil

# For demo purpose, let's save this model in Keras .h5 demo.

# create temp dir
model_dir = pathlib.Path(model_id)
shutil.rmtree(model_dir, ignore_errors=True)
model_dir_keras.mkdir()

# save model
model.save(str(model_dir / 'model.h5'), include_optimizer=False)

# Write package.py and related wrappers

#### Import related wrappers
We need to import the GEM wrapper for displaying the attributions. This file is stored in the utils directory.

In [None]:
import shutil
shutil.copy('utils/GEM.py', model_dir)

In [None]:
%%writefile heart_disease_tf/package.py

import pathlib
import pandas as pd
import numpy as np
from tensorflow.keras.models import load_model
import tensorflow as tf

from .GEM import GEMContainer, GEMSimple, GEMText


class MyModel:
    def __init__(self):

        self.model_dir = pathlib.Path(__file__).parent

        self.model = load_model(str(self.model_dir / 'model.h5'))
        self.output_columns = ['predicted_target']
        self.inputs = self.model.input.name
    
    def get_settings(self):
        return {'ig_start_steps': 32,  # 32
                'ig_max_steps': 4096,  # 2048
                'ig_min_error_pct':5.0 # 1.0
               }

    def _transform_input(self, input_df):
        return input_df

    def get_ig_baseline(self, input_df):
        """ This method is used to generate the baseline against which to compare the input. 
            It accepts a pandas DataFrame object containing rows of raw feature vectors that 
            need to be explained (in case e.g. the baseline must be sized according to the explain point).
            Must return a pandas DataFrame that can be consumed by the predict method described earlier.
        """
        return input_df*0

    def predict(self, input_df):
        transformed_input_df = self._transform_input(input_df)
        pred = self.model.predict(transformed_input_df)
        return pd.DataFrame(pred, columns=self.output_columns)
    
    def transform_to_attributable_input(self, input_df):
        """ This method is called by the platform and is responsible for transforming the input dataframe
            to the upstream-most representation of model inputs that belongs to a continuous vector-space.
            For this example, the model inputs themselves meet this requirement.  For models with embedding
            layers (esp. NLP models) the first attributable layer is downstream of that.
        """
        transformed_input = self._transform_input(input_df)

        return {self.inputs: input_df.values}
    
    def compute_gradients(self, attributable_input):
        """ This method computes gradients of the model output wrt to the differentiable input. 
            If there are embeddings, the attributable_input should be the output of the embedding 
            layer. In the backend, this method receives the output of the transform_to_attributable_input() 
            method. This must return an array of dictionaries, where each entry of the array is the attribution 
            for an output. As in the example provided, in case of single output models, this is an array with 
            single entry. For the dictionary, the key is the name of the input layer and the values are the 
            attributions.
        """
        gradients_by_output = []
        attributable_input_tensor = {k: tf.identity(v) for k, v in attributable_input.items()}
        gradients_dic_tf = self._gradients_input(attributable_input_tensor)
        gradients_dic_numpy = dict([key, np.asarray(value)] for key, value in gradients_dic_tf.items()) 
        gradients_by_output.append(gradients_dic_numpy)
        return gradients_by_output    
    
    def _gradients_input(self, x):
        """
        Function to Compute gradients.
        """
        with tf.GradientTape() as tape:
            tape.watch(x)
            preds = self.model(x)

        grads = tape.gradient(preds, x)

        return grads


    def project_attributions(self, input_df, attributions):
        att = []
        for ind, col in enumerate(input_df.columns):
            val = input_df[col].values[0]
            att.append(GEMSimple(display_name=col, feature_name=col,
                                 value=float(val),
                                 attribution=attributions[0][self.inputs][ind]))

        gem_container = GEMContainer(contents=att)

        explanations_by_output = {self.output_columns[0]: gem_container.render()}

        return explanations_by_output


def get_model():
    model = MyModel()
    return model

# Upload model

Now that we have all the parts that we need, we can go ahead and upload the model to the Fiddler platform. You first need to add the model schema using `add_model`. Then you can use the `add_model_artifact` to upload this entire directory in one shot. We need the following for uploading a model:
- The `path` to the directory
- The `project_id` to which the model belongs
- The `model_id`, which is the name you want to give the model. You can access it in Fiddler henceforth via this ID
- The `dataset_id` which the model is linked to (optional)  

In total, we will have a model file, the GEM.py wrapper and a `package.py` file within our model directory.

In [None]:
# Let's first delete the model if it already exists in the project
if model_id in client.list_models(project_id):
    client.delete_model(project_id, model_id)
    print('Model deleted')
    
client.add_model(project_id=project_id, model_id=model_id, dataset_id=dataset_id, model_info=model_info)
client.add_model_artifact(model_dir=model_dir, project_id=project_id, model_id=model_id)

# Run model

In [None]:
prediction_input = train_input[:10]
client.run_model(project_id, model_id, prediction_input)

# Get explanation

In [None]:
selected_point = df.head(1)

In [None]:
client.run_explanation(
    project_id=project_id,
    model_id=model_id, 
    df=selected_point, 
    dataset_id=dataset_id,
    explanations='ig_flex')