# Fiddler examples have moved! [Deprecation Notice]

Dear user thank you for using fiddler product, we appreciate your time! We have moved the examples to a new github repo located at the following link


***
# [New fiddler-examples repo](https://github.com/fiddler-labs/fiddler-examples)
***

# Model Monitoring with Surrogate Model

This tutorial shows how to use Model Monitoring in Fiddler if you only have a Dataset 
and no model artifacts. The tutorial has two parts, part one is to upload your dataset 
and use Fiddler's surrogate model capability to generate a model and part two is to ingest 
the monitoring events using `publish_event` API and use the surrogate model to monitor 
your production traffic. 

## Initialize Fiddler Client
We begin this section as usual by establishing a connection to our
Fiddler instance. We can establish this connection either by specifying 
our credentials directly, or by utilizing our `fiddler.ini` file. More
information can be found in the [setup](https://github.com/fiddler-labs/fiddler-samples/blob/master/content_root/tutorial/00%20Setup.ipynb) section.


In [None]:
import fiddler as fdl

# client = fdl.FiddlerApi(url=url, org_id=org_id, auth_token=auth_token)
client = fdl.FiddlerApi()

## Load Baseline

Here we will load in our baseline dataset from a csv called `p2p_loans.csv`. We will
also create a schema using this information.

In [None]:
import pandas as pd
baseline_df = pd.read_csv(
    'https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/samples/datasets/p2p_loans/p2p_loans.csv'
)
baseline_schema = fdl.DatasetInfo.from_dataframe(baseline_df, max_inferred_cardinality=1000)

## Setup Monitoring Using Surrogate Model
Now, we will setup a project, and use Fiddler's surrogate model capability to generate a model. 
Projects are one of the key entities of Fiddler. They are convenient containers 
for housing the models and datasets associated with a given ML use case. Specifics about
projects can be found [here](https://docs.fiddler.ai/components/#project)

In [None]:
project_id = 'tutorial'
model_id = 'loan_status_surrogate'
target='loan_status'
features = ['loan_amnt', 'int_rate', 'annual_inc', 'dti', 'fico_range_low', 'total_acc', 'acc_open_past_24mths']

## setup/cleanup project
if project_id in client.list_projects():
    client.delete_model(project_id, model_id)
    client.delete_dataset(model_id)
else:
    client.create_project(project_id)


client.create_surrogate_model(
    project_id,
    model_id,
    baseline_df,
    baseline_schema,
    target,
    features
)


## Send Monitoring Events

### First Option
In this step, we will be simulating traffic to send for our model monitoring by using 
[publish_event](https://docs.fiddler.ai/api-reference/python-package/#publish-event). 
This will be the equivalent of running our model separately on data, and either 
sending to Fiddler then, or saving this information to a log and sending at a later point.

For this demonstration, we will be going with a log-related approach. 
This log contains rows that have inputs and predictions. 
To most accurately simulate this as a time-series event, we will generate a timestamp and send an event every 5 minutes. Real data will ideally have a timestamp related to when the event took place; otherwise, the current 
time will be used.

We can send the inputs, outputs, targets as well as decisions variables.

**Note**: The timestamp must be in UTC milliseconds. See 
[here](https://docs.fiddler.ai/api-reference/python-package/#publish-event) for more details

In [None]:
import datetime
import time
from IPython.display import clear_output

NUM_EVENTS_TO_SEND = 50

FIVE_MINUTES_MS = 300000
FIFTEEN_MINUTES_MS = FIVE_MINUTES_MS * 3
ONE_DAY_MS = 8.64e+7
start_date = round(time.time() * 1000) - (ONE_DAY_MS * 8)
print(datetime.datetime.fromtimestamp(start_date/1000.0))

In [None]:
result_df = pd.read_csv('/app/fiddler_samples/samples/datasets/p2p_loans/p2p_production.log')
result_df = result_df.rename(columns={'probability_fully_paid': 'probability_Fully Paid'})

# Convert this dataframe into a list of dictionary events, where each event is its own dictionary
event_list_dict = result_df.sample(n=NUM_EVENTS_TO_SEND, random_state=42).to_dict(orient='records') 

for ind, event_dict in enumerate(event_list_dict):
    event_time = start_date + ind * FIVE_MINUTES_MS
    result = client.publish_event(project_id,
                                  model_id,
                                  event_dict,
                                  event_time_stamp=event_time,
                                  event_id=str(ind + 100),
                                  update_event=False)
    
    readable_timestamp = datetime.datetime.fromtimestamp(event_time/1000.0)
    clear_output(wait = True)
    
    print(f'Sending {ind+1} / {NUM_EVENTS_TO_SEND} \n{readable_timestamp} UTC: \n{event_dict}')
    time.sleep(0.1)

**Note**: If we want to update the events later, we need to specify an `event_id`. To update an event, we need to call `publish_event` again with the same `event_id` and `update_event=True`.

### Second Option
As an alternative, we can send a log dataframe in once by using [publish_events_log](https://docs.fiddler.ai/api-reference/python-package/#publish-events-log).

We can embed the `event_timestamp` as a field in the input data frame and then use the `ts_column` to specify which column to use for timestamp. If the timestamp is not provided, the current time will be used.

We can send the inputs, outputs, targets as well as decisions variables.

**Note**: The timestamp must be in UTC milliseconds. See 
[here](https://docs.fiddler.ai/api-reference/python-package/#publish-event) for more details

In [None]:
import datetime
import time

now = datetime.datetime.now()
start_date = now - datetime.timedelta(days=2)

list_timestamp = [start_date + datetime.timedelta(minutes=5) * ind for ind in range(NUM_EVENTS_TO_SEND)]
list_timestamp = [x.isoformat(' ') for x in list_timestamp]

Optionally, we can also embed the `event_id` as a field in the input data if we want to update those events later. 

In [None]:
event_id = [str(x) for x in range(NUM_EVENTS_TO_SEND)]

In [None]:
event_log = pd.concat([result_df.sample(n=NUM_EVENTS_TO_SEND, random_state=42).reset_index(),
                       pd.Series(list_timestamp, name='timestamp'),
                       pd.Series(event_id, name='__event_id')], axis=1)

In [None]:
event_log.head()

In [None]:
client.publish_events_log(project_id,
                          model_id,
                          event_log,
                          ts_column='timestamp'
                         )