### BASELINE DETECTOR EVALUATION (MERCHANT)

Evaluate the simple threshold rule for detecting velocity spikes in the sparkov + synthetic spikes dataset.

## Method

 - Count number of unique cards per merchant in 30s buckets
 - Raise a flag if the count is greater than the set threshold
 - Compare predictions to the ground truth

## STREAMED DATA EVALUATION

Stream the dataset once through, update the per merchant unique card count on each Tx, flag bucket once the count is greater than the threshold, calculate confusion matrix based on all merchant/bucket pairs

In [None]:
import sys
import os
import pandas as pd

#create absoulute path to notebook's. parent directory to src
module_path = os.path.abspath(os.path.join('..', 'src'))
#add to sys.path if not already there
if module_path not in sys.path:
    sys.path.append(module_path)

#import MerchantWindow class from baseline_detector.py
from baseline_detector import MerchantBaseline
#import the merchant set from truth tables
from truth_tables import MERCHANT_SET
from eval_funcs import per_bucket_confusion, precision_recall_f1

#set static baseline threshold of 10
THRESHOLD = 10

#read in spiked dataset, parsing timestamp as a time object, setting merchant and card ids to strings
#set low memory false to avoid incorrect data type inferences
df = pd.read_csv("../data/processed/sparkov_spikes.csv",
        parse_dates=["timestamp"],
        dtype={"merchant_id": str, "card_id": str},
        low_memory=False,)

#create instance of merchant baseline class
mb   = MerchantBaseline(threshold=THRESHOLD)
#create empty sets for predictions and coverage
predicted = set() #set of (merchant_id, bucket) that we flag
all_buckets = set() #set of all (merchant_id, buckets) streamed in


for row in df.itertuples():
    #call the baseline wrapper class' update function
    flag, info = mb.update(row.merchant_id, row.timestamp, row.card_id)
    bucket = info["bucket"]
    #add each to the all bucekts set
    all_buckets.add((row.merchant_id, bucket))
    #add to flagged set if over threshold
    if flag:
        predicted.add((row.merchant_id, bucket))

#find confusion matrix
cm = per_bucket_confusion(predicted, MERCHANT_SET, all_buckets)
#calc precision, recall, and f1 scores
m  = precision_recall_f1(cm["tp"], cm["fp"], cm["fn"])

#print results
print(f"TP {cm['tp']} FP {cm['fp']} FN {cm['fn']} TN {cm['tn']}")
print(f"precision {m['precision']:.3f} recall {m['recall']:.3f} F1 {m['f1']:.3f}")


TP 50 FP 0 FN 0 TN 1555588
precision 1.000 recall 1.000 F1 1.000


## Varied Threshold Test

Precompute the unique cards per mechant/bucket, then sweep through different thresholds 2-16, calculate and compare results for each threshold.

In [None]:
#add a bucket column to the dataframe (flooring same as detector/truth table)
df["bucket"] = df["timestamp"].dt.floor("30s")

#count unique cards per merchant
counts = df.groupby(["merchant_id", "bucket"])["card_id"].nunique()

#convert to a dictionary of key: (merchant, bucket), value: count or unique cards
counts_dict = {k: int(v) for k, v in counts.items()}
#all merchant/bucket pairs (used to find number of true negatives)
all_buckets = set(counts_dict.keys())

from eval_funcs import sweep_thresholds

results = sweep_thresholds(counts_dict, MERCHANT_SET, all_buckets, start=2, stop=16)
for r in results:
    print(r)

{'th': 2, 'tp': 50, 'fp': 2297, 'fn': 0, 'tn': 1553291, 'precision': 0.021303792074989347, 'recall': 1.0, 'f1': 0.04171881518564872}
{'th': 3, 'tp': 50, 'fp': 2, 'fn': 0, 'tn': 1555586, 'precision': 0.9615384615384616, 'recall': 1.0, 'f1': 0.9803921568627451}
{'th': 4, 'tp': 50, 'fp': 0, 'fn': 0, 'tn': 1555588, 'precision': 1.0, 'recall': 1.0, 'f1': 1.0}
{'th': 5, 'tp': 50, 'fp': 0, 'fn': 0, 'tn': 1555588, 'precision': 1.0, 'recall': 1.0, 'f1': 1.0}
{'th': 6, 'tp': 50, 'fp': 0, 'fn': 0, 'tn': 1555588, 'precision': 1.0, 'recall': 1.0, 'f1': 1.0}
{'th': 7, 'tp': 50, 'fp': 0, 'fn': 0, 'tn': 1555588, 'precision': 1.0, 'recall': 1.0, 'f1': 1.0}
{'th': 8, 'tp': 50, 'fp': 0, 'fn': 0, 'tn': 1555588, 'precision': 1.0, 'recall': 1.0, 'f1': 1.0}
{'th': 9, 'tp': 50, 'fp': 0, 'fn': 0, 'tn': 1555588, 'precision': 1.0, 'recall': 1.0, 'f1': 1.0}
{'th': 10, 'tp': 50, 'fp': 0, 'fn': 0, 'tn': 1555588, 'precision': 1.0, 'recall': 1.0, 'f1': 1.0}
{'th': 11, 'tp': 50, 'fp': 0, 'fn': 0, 'tn': 1555588, 'preci

## Choose Threshold Going Forward

Choose the lowest threshold that still gives good results, in this case, a threshold from 4 to 16 gives the. same perfect results, so we will use 4 as the baseline rule.

## Caixa Dataset FP Test

Run the Real-Life Dataset (Caixa) through the baseline detector.
Keep/change the curresnt threshold rule based on results

In [3]:
#read in caixa datset
dfc = pd.read_csv("../data/processed/caixa_pos_sorted.csv",
                  parse_dates=["timestamp"],
                  dtype={"merchant_id": str, "card_id": str},
                  low_memory=False)

#add a bucket column to the dataframe (flooring same as detector/truth table)
dfc["bucket"] = dfc["timestamp"].dt.floor("30s")
#count unique cards per merchant
counts_c = dfc.groupby(["merchant_id","bucket"])["card_id"].nunique()

#for various threshold values
for THRESHOLD in [4,5,6,7,8]:
    #count number of merchant/bucket pairs flagged as fraud
    num_flagged   = int((counts_c >= THRESHOLD).sum())
    #number of total merchant/bucket pairs
    total_buckets = int(counts_c.size)
    #caklculate rate of false positives
    rate = num_flagged / total_buckets if total_buckets else 0.0
    #print results
    print(f"th={THRESHOLD}: {num_flagged}/{total_buckets}  ({rate:.5%})")


th=4: 254/6692434  (0.00380%)
th=5: 10/6692434  (0.00015%)
th=6: 1/6692434  (0.00001%)
th=7: 0/6692434  (0.00000%)
th=8: 0/6692434  (0.00000%)
