In [1]:
import os, sys

nb_dir = os.path.split(os.getcwd())[0]
if nb_dir not in sys.path:
    sys.path.append(nb_dir)

In [2]:
from lib.methods import ips

import pandas as pd

### Example using IPS to offline evaluate a new fraud policy

#### 1 - Assume we have a fraud model in production that blocks transactions if the P(fraud) > 0.05

Let's build some sample logs from that policy running in production. One thing to note, we need some basic exploration in the production logs (e.g. epsilon-greedy w/ε = 0.1). That is, 10% of the time we take a random action. Rewards represent revenue gained from allowing the transaction. A negative reward indicates the transaction was fraud and resulted in a chargeback.

In [3]:
logs_df = pd.DataFrame([
    {"id": 0, "context": {"p_fraud": 0.08}, "action": "blocked", "action_prob": 0.90, "reward": 0},
    {"id": 1, "context": {"p_fraud": 0.03}, "action": "allowed", "action_prob": 0.90, "reward": 20},
    {"id": 2, "context": {"p_fraud": 0.01}, "action": "allowed", "action_prob": 0.90, "reward": 10},    
    {"id": 3, "context": {"p_fraud": 0.09}, "action": "allowed", "action_prob": 0.10, "reward": -20}, # only allowed due to exploration 
])

logs_df

Unnamed: 0,id,context,action,action_prob,reward
0,0,{'p_fraud': 0.08},blocked,0.9,0
1,1,{'p_fraud': 0.03},allowed,0.9,20
2,2,{'p_fraud': 0.01},allowed,0.9,10
3,3,{'p_fraud': 0.09},allowed,0.1,-20


#### 2 - Now let's use IPS to score a more lenient fraud model that blocks transactions only if the P(fraud) > 0.10

We will use the same production logs above and run them through the new policy.

In [4]:
new_df = pd.DataFrame([
    {"id": 0, "new_action": "allowed", "new_action_prob": 0.9}, # we don't know what would have happened here :(
    {"id": 1, "new_action": "allowed", "new_action_prob": 0.9},
    {"id": 2, "new_action": "allowed", "new_action_prob": 0.9},    
    {"id": 3, "new_action": "allowed", "new_action_prob": 0.9},
])

We see that the new policy lets through a fraud example (`id: 3`) at a much higher probability. This should make the new model get penalized in offline evaluation. We also see that for `id: 0`, the new model lets the transaction happen, but we don't have the counterfactual knowledge of whether or not this would have been a non-fraud transaction since in production this transaction was blocked. This demonstrates ones of the drawbacks of offline policy evaluation, but with more data we'd ideally see a different action taken in the same situation (due to exploration).

#### 3 - Now we will score the new model using IPS

In [5]:
df = logs_df.merge(new_df, left_on='id', right_on='id')
df

Unnamed: 0,id,context,action,action_prob,reward,new_action,new_action_prob
0,0,{'p_fraud': 0.08},blocked,0.9,0,allowed,0.9
1,1,{'p_fraud': 0.03},allowed,0.9,20,allowed,0.9
2,2,{'p_fraud': 0.01},allowed,0.9,10,allowed,0.9
3,3,{'p_fraud': 0.09},allowed,0.1,-20,allowed,0.9


In [6]:
results = ips.evaluate(df)

In [7]:
print(results)

{'expected_reward_logging_policy': 2.5, 'expected_reward_new_policy': -37.5}


The expected reward per observation for the new policy is much worse than the logging policy (due to the observation that allowed fraud to go through (`id: 3`) so we wouldn't roll out this new policy into an A/B test or production and instead should test some different policies offline.