In [1]:
import os, sys

nb_dir = os.path.split(os.getcwd())[0]
if nb_dir not in sys.path:
    sys.path.append(nb_dir)

In [2]:
import pandas as pd

from lib.methods import doubly_robust

### Example using the doubly robust (DR) method to offline evaluate a new fraud policy

#### 1 - Assume we have a fraud model in production that blocks transactions if the P(fraud) > 0.05

Let's build some sample logs from that policy running in production. One thing to note, we need some basic exploration in the production logs (e.g. epsilon-greedy w/Îµ = 0.1). That is, 10% of the time we take a random action. Rewards represent revenue gained from allowing the transaction. A negative reward indicates the transaction was fraud and resulted in a chargeback.

In [3]:
logs_df = pd.DataFrame([
    {"id": 0, "context": {"p_fraud": 0.08}, "action": "blocked", "action_prob": 0.90, "reward": 0},
    {"id": 1, "context": {"p_fraud": 0.03}, "action": "allowed", "action_prob": 0.90, "reward": 20},
    {"id": 2, "context": {"p_fraud": 0.02}, "action": "allowed", "action_prob": 0.90, "reward": 10}, 
    {"id": 3, "context": {"p_fraud": 0.01}, "action": "allowed", "action_prob": 0.90, "reward": 20},     
    {"id": 4, "context": {"p_fraud": 0.09}, "action": "allowed", "action_prob": 0.10, "reward": -20}, # only allowed due to exploration 
    {"id": 5, "context": {"p_fraud": 0.40}, "action": "allowed", "action_prob": 0.10, "reward": -10}, # only allowed due to exploration     
])

logs_df

Unnamed: 0,id,context,action,action_prob,reward
0,0,{'p_fraud': 0.08},blocked,0.9,0
1,1,{'p_fraud': 0.03},allowed,0.9,20
2,2,{'p_fraud': 0.02},allowed,0.9,10
3,3,{'p_fraud': 0.01},allowed,0.9,20
4,4,{'p_fraud': 0.09},allowed,0.1,-20
5,5,{'p_fraud': 0.4},allowed,0.1,-10


#### 2 - Now let's use the doubly robust method to score a more lenient fraud model that blocks transactions only if the P(fraud) > 0.10

The doubly robust method requires that we have a function that computes `P(action | context)`for all possible actions under our new policy. We can define that for our new policy easily here:

In [4]:
def action_probabilities(context):
    epsilon = 0.10
    if context["p_fraud"] > 0.10:
        return {"allowed": epsilon, "blocked": 1 - epsilon}    
    
    return {"allowed": 1 - epsilon, "blocked": epsilon}

We will use the same production logs above and run them through the new policy.

In [5]:
doubly_robust.evaluate(logs_df, action_probabilities)

{'expected_reward_logging_policy': 3.33, 'expected_reward_new_policy': -28.47}

The doubly robust method estimates that the expected reward per observation for the new policy is much worse than the logging policy so we wouldn't roll out this new policy into an A/B test or production and instead should test some different policies offline.