In [3]:
import pandas as pd
from ope.methods import doubly_robust

ModuleNotFoundError: No module named 'ope'

### Example using the doubly robust (DR) method to offline evaluate a new fraud policy

#### 1 - Assume we have a fraud model in production that blocks transactions if the P(fraud) > 0.05

Let's build some sample logs from that policy running in production. One thing to note, we need some basic exploration in the production logs (e.g. epsilon-greedy w/ε = 0.1). That is, 10% of the time we take a random action. Rewards represent revenue gained from allowing the transaction. A negative reward indicates the transaction was fraud and resulted in a chargeback.

In [None]:
logs_df = pd.DataFrame([
    {"context": {"p_fraud": 0.08}, "action": "blocked", "action_prob": 0.90, "reward": 0},
    {"context": {"p_fraud": 0.03}, "action": "allowed", "action_prob": 0.90, "reward": 20},
    {"context": {"p_fraud": 0.02}, "action": "allowed", "action_prob": 0.90, "reward": 10}, 
    {"context": {"p_fraud": 0.01}, "action": "allowed", "action_prob": 0.90, "reward": 20},     
    {"context": {"p_fraud": 0.09}, "action": "allowed", "action_prob": 0.10, "reward": -20}, # only allowed due to exploration 
    {"context": {"p_fraud": 0.40}, "action": "allowed", "action_prob": 0.10, "reward": -10}, # only allowed due to exploration     
])

logs_df

#### 2 - Now let's use the doubly robust method to score a more lenient fraud model that blocks transactions only if the P(fraud) > 0.10

The doubly robust method requires that we have a function that computes `P(action | context)`for all possible actions under our new policy. We can define that for our new policy easily here:

In [None]:
def action_probabilities(context):
    epsilon = 0.10
    if context["p_fraud"] > 0.10:
        return {"allowed": epsilon, "blocked": 1 - epsilon}    
    
    return {"allowed": 1 - epsilon, "blocked": epsilon}

We will use the same production logs above and run them through the new policy.

In [2]:
doubly_robust.evaluate(logs_df, action_probabilities, num_bootstrap_samples=50)

NameError: name 'doubly_robust' is not defined

The doubly robust method estimates that the expected reward per observation for the new policy is much worse than the logging policy so we wouldn't roll out this new policy into an A/B test or production and instead should test some different policies offline.

However, the confidence intervals around the expected rewards for our old and new policies overlap heavily. If we want to be really certain, it's might be best to gather some more data to ensure the difference is signal and not noise. In this case, fortunately, we have strong reason to suspect the new policy is worse, but these confidence intervals can be important in cases where we have less prior certainty.