## Credit Card Transaction Demo

In this notebook, we demonstate how the `moco` optimization library can be used to accelerate a pre-trained binary classifier trained to flag credit card transactions as fraudulent.

It does this by reducing the average number of FLOPS (floating point operations) that the model needs to run inference on the entire dataset. Critically, `moco` finds subspaces of the input space where the prediction is a simple linear or constant function. At runtime, the derived model determines which subspace the transaction is a member of, and then executes the associated map with that subspace.

This results in lower energy use, lower latency, higher throughput and less hardware needed.

| experiment   |   Latency per transaction, raced (s) |   Number of FLOPs per transaction (ms) |   Precision |   Recall |
|:-------------|--------------------------------------:|---------------------------------------:|------------:|---------:|
| baseline     |                           4.80942e-05 |                                 3104   |    0.785714 | 0.916667 |
| optimized    |                           2.18402e-05 |                                 1627.6 |    0.785714 | 0.916667 |

### Imports

In [None]:
import pandas as pd
from typing import Callable
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_fscore_support
import time
import matplotlib.pyplot as plt
import numpy as np

### Step 1: Audio initial model

In [None]:

def load_dataset(path: str):
    df = pd.read_csv(path)
    X = df[[col for col in df.columns if col.startswith('V')]].to_numpy()
    y = df['Class'].to_numpy()
    return X, y

Load the dataset. The dataset is heavily imbalanced -- it consists of mostly non-fraudulent transactions (only 0.17% of the 280k+ transactions are fraudulent).

In [None]:
# Load Dataset
X, y = load_dataset('/Users/samrandall/Downloads/creditcard.csv')

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 2)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 0.2, random_state = 2)

X.shape, pd.Series(y).value_counts().to_dict()


In [None]:
# Train initial model
mlp = MLPClassifier(random_state = 4)
mlp.fit(X_train, y_train)

In [None]:
from sklearn.metrics import PrecisionRecallDisplay

display = PrecisionRecallDisplay.from_estimator(
    mlp, X_val, y_val, name="MLP", plot_chance_level=True, despine=True
)
_ = display.ax_.set_title("2-class Precision-Recall curve")

## Choose a threshold based on precision-recall curve on validation set.
- I chose `threshold = {}`, achieving `precision = {}` and `recall = {}`.

In [None]:
threshold = 0.9
y_hat_t_train = mlp.predict_proba(X_train)[:, 1] > threshold
y_hat_t_train.shape

## Benchmark its latency and accuracy.

In [None]:
y_hat_t_test = mlp.predict_proba(X_test)[:, 1] > threshold
p, r, _, _ = precision_recall_fscore_support(y_hat_t_test, y_test, average = 'binary')
p, r

The precision we get is 70% and the recall we get is 100% on the test set. 

In [None]:
class MyMLP:
    def __init__(self, w, b):
        self.w = w
        self.b = b

    def predict(self, x):
        h = np.max(x @ self.w[0] + self.b[0], 0)
        for W, b in zip(self.w[1:-1], self.b[1:-1]):
            h = np.max(h @ W + b, 0)
        out = h @ self.w[-1] + self.b[-1]
        prob = 1 / (1 + np.exp(-out))
        return prob



In [None]:
def forward_mlp(x, weights, biases):
    h = np.max(x @ weights[0] + biases[0], 0)
    for W, b in zip(weights[1:-1], biases[1:-1]):
        h = np.max(h @ W + b, 0)
    out = h @ weights[-1] + biases[-1]
    prob = 1 / (1 + np.exp(-out))
    return prob

In [None]:
original_timings = []
# Warmup
for i in range(100):
    s = time.perf_counter()
    mlp.predict_proba(X_test[i: i + 1])[:, 1] > threshold
    e = time.perf_counter()

for i in range(X_test.shape[0]):
    x = X_test[i: i + 1]
    s = time.perf_counter()
    
    for j in range(3):
        forward_mlp(x, mlp.coefs_, mlp.intercepts_)
    e = time.perf_counter()
    original_timings.append((e - s) / 3)



On average, it takes 6.41E-6 seconds or 6 microseconds (us) to execute a transaction.

## FLOPs (Baseline)
Next we baseline the FLOPS. The model is a two-layer MLP with a ReLU activation and than a sigmoid activation. 
Roughly, the number of FLOPS for the first layer to process one data point is (28 + 1) * 100.
The number of FLOPS to process the second layer is (100 + 1) * 1. Note that in both cases we account for the bias term. 
The number of FLOPs associated with the ReLU is 100 and the the number of flops associated with the sigmoid is 3. 
In sum total, there are (28 + 1) * 100 + 100 + (100 * 1) + 1 + 3 = **3104 FLOPS**. 

In [None]:
for i, (W, b) in enumerate(zip(mlp.coefs_, mlp.intercepts_)):
    print(f"Layer {i}: {W.shape} {b.shape}")

## Accelerating the model with `moco`

In [None]:
from moco.partition import Partition, RoutedModel

In [None]:
p_train = mlp.predict_proba(X_train)[:, 1] > threshold
partition = Partition()
C = 2
partition.find_sufficient_groups(X_train, p_train, min_group_size = X_train.shape[0] / (C * 10))
partition.summary_table

From the above summary table, we see that what we'll now do is create a Gated Model based on only the first group (`is_active == True` for that one, and not the second group). We see that the system generated a group of 182011 transactions, that were all not fraudulent. We fit a model to that, and that model identified 89757. We'll use that model in our new routed model. 

In [None]:
mymlp = MyMLP(mlp.coefs_, mlp.intercepts_)

In [None]:
rm = RoutedModel.from_partition(partition, mlp)

## Evaluation of the `RoutedModel`

On the test set, we do not see a drop in precision or recall. The precision is `83.3%` in both and the recall is `87.5%` in both. 

In [None]:

p_train_new = rm.predict(X_train)
mlp_train_new = mlp.predict(X_train)
print("train")
p, r, _, _ = precision_recall_fscore_support(mlp_train_new, y_train, average = 'binary')
print("MLP", p, r)
p, r, _, _ = precision_recall_fscore_support(p_train_new, y_train, average = 'binary')
print("Routed Model", p, r)

p_test_new = rm.predict(X_test)
mlp_test = mlp.predict(X_test)

print("test")
p, r, _, _ = precision_recall_fscore_support(mlp_test, y_test, average = 'binary')
print("MLP", p, r)

mlp_precision = p
mlp_recall = r
p, r, _, _ = precision_recall_fscore_support(p_test_new, y_test, average = 'binary')
print("Routed Model", p, r)

routed_precision = p
routed_recall = r

In terms of FLOPs, we can do a theoretical analysis, and then we will get to the latency analysis.

In [None]:
np.isnan(partition.transform(X_test)).mean()

## FLOPs in the New Model
Computed before for the original path is 3104 FLOPS / transaction. 
That occurs in the test set whenever we get nan's (51.5%) of the time. 
In both cases, whether the early exit path is used or not, we must evaluate it. 
`total_flops = (1 * Flops(early_exit)) + (g / N) * Flops(full model)`
The FLOPs of the early exit (LogisticClassifier) are `28 * 1 + 1 = 29`. 
`(1 * 29) + (0.515) * 3104 = 1627.6` FLOPS
This is a **48% reduction** in FLOPS

## Real Time Latency Analysis

In [None]:
rm = RoutedModel.from_partition(partition, mlp)

In [None]:

def profile(prediction_method: Callable,
    X: np.ndarray,
    n_trials = 100,
    n_samples = 1000
):

    times = np.zeros(n_samples)
    b = 1
    eyes = np.random.choice(len(X), n_samples, replace = False)
    for j in range(10):
        for i in eyes:
            s = time.perf_counter()
            prediction_method(X[i: i + b])
            e = time.perf_counter()


    for j in range(n_trials):
        for i, idx in enumerate(eyes):
            x = X[idx: idx + b]
            s = time.perf_counter()
            prediction_method(x)
            e = time.perf_counter()
            times[i] += e - s


    return times / n_trials

original_times = profile(mlp.predict, X_test)
rm_times_raced = profile(rm.predict_race, X_test)
rm_seq_times = profile(rm.predict, X_test)
df = pd.DataFrame({"original": original_times, "optimized_best_of": rm_times_raced, "optimized_sequential": rm_seq_times})

In [None]:
df.mean(axis = 0).to_dict()

In [None]:
speedup = df.mean(axis = 0).to_dict()
ratio = speedup['optimized_best_of'] / speedup['original']
ratio


## Latency Result
We are now seeing an average of 2.82E-5 seconds per transaction, which is faster than 4.92E-5.

In [None]:
benchmark_analysis = {
    "experiment": ["baseline", "optimized"],
    "Latency per transaction, raced (ms)": [speedup['original'], speedup['optimized_best_of']],
    "Number of FLOPs per transaction (ms)": [3104, 1627.6],
    "Precision": [mlp_precision, routed_precision],
    "Recall" : [mlp_recall, routed_recall]
}

benchmark_df = pd.DataFrame(benchmark_analysis)
s = benchmark_df.to_markdown()

# Put this above as summary!
s

In [None]:
benchmark_df