Based on:

@book{leborgne2022fraud,

title={Reproducible Machine Learning for Credit Card Fraud Detection - Practical Handbook},

author={Le Borgne, Yann-A{\"e}l and Siblini, Wissam and Lebichot, Bertrand and Bontempi, Gianluca},

url={https://github.com/Fraud-Detection-Handbook/fraud-detection-handbook},

year={2022},

publisher={Universit{\'e} Libre de Bruxelles}

}

Covered subchapters:
* 2.3 Credit card fraud detection system
* 7.2.1, 7.2.2 Feed-forward neural network

In [1]:
import datetime
import sklearn
import xgboost
import torch
import time
import numpy as np
import pickle
import os

import wandb
from wandb.integration.xgboost import WandbCallback

Testing different models on a baseline feature transformation and a simple train-test split

In [2]:
# !curl -O https://raw.githubusercontent.com/Fraud-Detection-Handbook/fraud-detection-handbook/main/Chapter_References/shared_functions.py
%run shared_functions.py

In [15]:
%run my_shared_functions.py

In [4]:
# 1. create 'fraud-detection-handbook' folder one folder above
# 2. cd to the folder
# 3. git clone https://github.com/Fraud-Detection-Handbook/simulated-data-transformed
DIR_INPUT = '../fraud-detection-handbook/simulated-data-transformed/data/'

BEGIN_DATE = "2018-07-25"
END_DATE = "2018-08-14"

%time transactions_df=read_from_files(DIR_INPUT, BEGIN_DATE, END_DATE)
print("{0} transactions loaded, containing {1} fraudulent transactions".format(len(transactions_df),transactions_df.TX_FRAUD.sum()))

CPU times: total: 78.1 ms
Wall time: 166 ms
201295 transactions loaded, containing 1792 fraudulent transactions


In [5]:
transactions_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 201295 entries, 0 to 201294
Data columns (total 23 columns):
 #   Column                               Non-Null Count   Dtype         
---  ------                               --------------   -----         
 0   TRANSACTION_ID                       201295 non-null  int64         
 1   TX_DATETIME                          201295 non-null  datetime64[ns]
 2   CUSTOMER_ID                          201295 non-null  int64         
 3   TERMINAL_ID                          201295 non-null  int64         
 4   TX_AMOUNT                            201295 non-null  float64       
 5   TX_TIME_SECONDS                      201295 non-null  int64         
 6   TX_TIME_DAYS                         201295 non-null  int64         
 7   TX_FRAUD                             201295 non-null  int64         
 8   TX_FRAUD_SCENARIO                    201295 non-null  int64         
 9   TX_DURING_WEEKEND                    201295 non-null  int64         
 

columns 0-8 : simulator data

columns 9+ : baseline feature transformation

(11-16 "keep track of the average spending amount and number of transcations for each customer and for three window sizes", for example CUSTOMER_ID_NB_TX_7DAY_WINDOW - "number of transcations by the customer in the last 7 days")

(17-22 "characterize the 'risk' associated with the terminal. The risk will be defined as the average number of frauds that were observed on the terminal for three window sizes")

In [6]:
output_feature="TX_FRAUD"

input_features=[
       'TX_AMOUNT',
       'TX_DURING_WEEKEND',
       'TX_DURING_NIGHT',
       'CUSTOMER_ID_NB_TX_1DAY_WINDOW',
       'CUSTOMER_ID_AVG_AMOUNT_1DAY_WINDOW',
       'CUSTOMER_ID_NB_TX_7DAY_WINDOW',
       'CUSTOMER_ID_AVG_AMOUNT_7DAY_WINDOW',
       'CUSTOMER_ID_NB_TX_30DAY_WINDOW',
       'CUSTOMER_ID_AVG_AMOUNT_30DAY_WINDOW',
       'TERMINAL_ID_NB_TX_1DAY_WINDOW',
       'TERMINAL_ID_RISK_1DAY_WINDOW',
       'TERMINAL_ID_NB_TX_7DAY_WINDOW',
       'TERMINAL_ID_RISK_7DAY_WINDOW',
       'TERMINAL_ID_NB_TX_30DAY_WINDOW',
       'TERMINAL_ID_RISK_30DAY_WINDOW'
       ]

In [7]:
start_date_training = datetime.datetime.strptime(BEGIN_DATE, "%Y-%m-%d")
delta_train = delta_delay = delta_test = 7

end_date_training = start_date_training+datetime.timedelta(days=delta_train-1)

start_date_test = start_date_training+datetime.timedelta(days=delta_train+delta_delay)
end_date_test = start_date_training+datetime.timedelta(days=delta_train+delta_delay+delta_test-1)


In [8]:
end_date_training

datetime.datetime(2018, 7, 31, 0, 0)

In [9]:
(train_df, test_df) = get_train_test_set(transactions_df,start_date_training,
                                       delta_train=7,delta_delay=7,delta_test=7)

Random model

In [21]:
predictions_df=test_df
predictions_df['predictions']=0.5

# AUC ROC - based on sklearn's implementation
#
# Average precision - based on sklearn's implementation
#    
# CardPrecision@k - "takes into account the fact that investigators can only check a maximum of k potentially fraudulent
# cards per day. It is computed by ranking, for every day in the test set, the most fraudulent transactions, and selecting
# the k cards whose transcations have the highest fraud probabilities.The precision (proportion of actual compromised cards
# out of predicted compromised cards) is then computed for each day. The Card Precision top-k is the average of these daily
# precisions. (...) The metric provides values in the interval [0, 1], and the higher value means better performance."
performance_assessment_f1_included(predictions_df, top_k_list=[100])

Unnamed: 0,AUC ROC,Average precision,F1 score,Card Precision@100
0,0.5,0.007,0.0,0.017


For a random model:
* Average precision - gives the proportion of frauds in the test set (0.7%)
* Card Precision@100 - every day, 1.7% of the cards with the highest fraudulent scores were indeed compromised

In [53]:
config = dict(
    dataset_id = 'fraud-detection-handbook-transformed',
    validation = 'train test split',
    begin_date = BEGIN_DATE,
    delta_train = 7,
    delta_delay = 7,
    delta_test = 7,
    random_state = 0,
    max_depth=2,
    scale=False
)
wandb.init(project="mgr-anomaly-tsxai-project", config=config, tags=['baseline', 'decision-tree'])
config = wandb.config

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.016666666666666666, max=1.0…

In [54]:
classifier = sklearn.tree.DecisionTreeClassifier(max_depth=config.max_depth, random_state=config.random_state)

model_and_predictions_dictionary = fit_model_and_get_predictions(classifier, train_df, test_df, 
                                                                 input_features, output_feature,
                                                                 scale=config.scale)

In [55]:
model_and_predictions_dictionary

{'classifier': DecisionTreeClassifier(max_depth=2, random_state=0),
 'predictions_test': array([0.00353643, 0.00353643, 0.00353643, ..., 0.00353643, 0.00353643,
        0.00353643]),
 'predictions_train': array([0.00353643, 0.00353643, 0.00353643, ..., 0.00353643, 0.00353643,
        0.00353643]),
 'training_execution_time': 0.11499667167663574,
 'prediction_execution_time': 0.014999866485595703}

In [56]:
wandb.log({'Training execution time': model_and_predictions_dictionary['training_execution_time']})
wandb.log({'Prediction execution time':  model_and_predictions_dictionary['prediction_execution_time']})

In [57]:
predictions_df=test_df
predictions_df['predictions']=model_and_predictions_dictionary['predictions_test']

performance_df = performance_assessment_f1_included(predictions_df, top_k_list=[100])
performance_df

Unnamed: 0,AUC ROC,Average precision,F1 score,Card Precision@100
0,0.763,0.496,0.64,0.241


In [58]:
wandb.log({'AUC ROC': performance_df.loc[0,'AUC ROC']})
wandb.log({'Average precision': performance_df.loc[0,'Average precision']})
wandb.log({'F1 score': performance_df.loc[0,'F1 score']})
wandb.log({'Card Precision@100': performance_df.loc[0,'Card Precision@100']})

In [51]:
pickle.dump(classifier, open('models/baseline/dt_maxdepth2/dt_maxdepth2_model.sav', 'wb'))

In [61]:
classifier_artifact = wandb.Artifact('dt_maxdepth2', type='decision_tree', description='trained baseline decision tree with max_depth=2')
classifier_artifact.add_dir('models/baseline/dt_maxdepth2')
wandb.log_artifact(classifier_artifact)
wandb.finish()

[34m[1mwandb[0m: Adding directory to artifact (.\models\baseline\dt_maxdepth2)... Done. 0.0s


0,1
AUC ROC,▁
Average precision,▁
Card Precision@100,▁
F1 score,▁
Prediction execution time,▁
Training execution time,▁

0,1
AUC ROC,0.763
Average precision,0.496
Card Precision@100,0.241
F1 score,0.64
Prediction execution time,0.015
Training execution time,0.115


In [62]:
classifiers_dictionary={'Logistic regression':sklearn.linear_model.LogisticRegression(random_state=0), 
                        'Decision tree with depth of two':sklearn.tree.DecisionTreeClassifier(max_depth=2,random_state=0), 
                        'Decision tree - unlimited depth':sklearn.tree.DecisionTreeClassifier(random_state=0), 
                        'Random forest':sklearn.ensemble.RandomForestClassifier(random_state=0,n_jobs=-1),
                        'XGBoost':xgboost.XGBClassifier(random_state=0,n_jobs=-1),
                       }

fitted_models_and_predictions_dictionary={}

for classifier_name in classifiers_dictionary:
    
    model_and_predictions = fit_model_and_get_predictions(classifiers_dictionary[classifier_name], train_df, test_df, 
                                                                                  input_features=input_features,
                                                                                output_feature=output_feature)
    fitted_models_and_predictions_dictionary[classifier_name]=model_and_predictions

In [64]:
df_performances=performance_assessment_model_collection_f1_included(fitted_models_and_predictions_dictionary, test_df, 
                                                        type_set='test', 
                                                        top_k_list=[100])
df_performances

Unnamed: 0,AUC ROC,Average precision,F1 score,Card Precision@100
Logistic regression,0.871,0.606,0.616,0.291
Decision tree with depth of two,0.763,0.496,0.64,0.241
Decision tree - unlimited depth,0.788,0.309,0.553,0.243
Random forest,0.867,0.658,0.67,0.287
XGBoost,0.862,0.639,0.688,0.273


In [65]:
df_execution_times=execution_times_model_collection(fitted_models_and_predictions_dictionary)
df_execution_times

Unnamed: 0,Training execution time,Prediction execution time
Logistic regression,0.108001,0.011999
Decision tree with depth of two,0.110353,0.012
Decision tree - unlimited depth,1.050994,0.015058
Random forest,2.162526,0.108001
XGBoost,3.503231,0.062


In [68]:
pickle.dump(classifiers_dictionary['Logistic regression'], open('models/baseline/lr/lr_model.sav', 'wb'))
pickle.dump(classifiers_dictionary['Decision tree - unlimited depth'], open('models/baseline/dt_maxdepth_unlim/dt_maxdepth_unlim_model.sav', 'wb'))
pickle.dump(classifiers_dictionary['Random forest'], open('models/baseline/rf/rf_model.sav', 'wb'))
pickle.dump(classifiers_dictionary['XGBoost'], open('models/baseline/xgb/xgb_model.sav', 'wb'))

In [69]:
df_performances

Unnamed: 0,AUC ROC,Average precision,F1 score,Card Precision@100
Logistic regression,0.871,0.606,0.616,0.291
Decision tree with depth of two,0.763,0.496,0.64,0.241
Decision tree - unlimited depth,0.788,0.309,0.553,0.243
Random forest,0.867,0.658,0.67,0.287
XGBoost,0.862,0.639,0.688,0.273


In [70]:
config_lr = dict(
    dataset_id = 'fraud-detection-handbook-transformed',
    validation = 'train test split',
    begin_date = BEGIN_DATE,
    delta_train = 7,
    delta_delay = 7,
    delta_test = 7,
    random_state = 0,
    scale=True
)
wandb.init(project="mgr-anomaly-tsxai-project", config=config_lr, tags=['baseline', 'logistic-regression'])
wandb.log({'Training execution time': df_execution_times.iloc[0]['Training execution time']})
wandb.log({'Prediction execution time':  df_execution_times.iloc[0]['Prediction execution time']})
wandb.log({'AUC ROC': df_performances.iloc[0]['AUC ROC']})
wandb.log({'Average precision': df_performances.iloc[0]['Average precision']})
wandb.log({'F1 score': df_performances.iloc[0]['F1 score']})
wandb.log({'Card Precision@100': df_performances.iloc[0]['Card Precision@100']})

lr_artifact = wandb.Artifact('lr', type='logistic_regression', description='trained baseline logistic regression with default values')
lr_artifact.add_dir('models/baseline/lr')
wandb.log_artifact(lr_artifact)
wandb.finish()

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.016666666666666666, max=1.0…

[34m[1mwandb[0m: Adding directory to artifact (.\models\baseline\lr)... Done. 0.0s


0,1
AUC ROC,▁
Average precision,▁
Card Precision@100,▁
F1 score,▁
Prediction execution time,▁
Training execution time,▁

0,1
AUC ROC,0.871
Average precision,0.606
Card Precision@100,0.291
F1 score,0.616
Prediction execution time,0.012
Training execution time,0.108


In [71]:
config_dt = dict(
    dataset_id = 'fraud-detection-handbook-transformed',
    validation = 'train test split',
    begin_date = BEGIN_DATE,
    delta_train = 7,
    delta_delay = 7,
    delta_test = 7,
    random_state = 0,
    max_depth=None,
    scale=True
)
wandb.init(project="mgr-anomaly-tsxai-project", config=config_dt, tags=['baseline', 'decision-tree'])
wandb.log({'Training execution time': df_execution_times.iloc[2]['Training execution time']})
wandb.log({'Prediction execution time':  df_execution_times.iloc[2]['Prediction execution time']})
wandb.log({'AUC ROC': df_performances.iloc[2]['AUC ROC']})
wandb.log({'Average precision': df_performances.iloc[2]['Average precision']})
wandb.log({'F1 score': df_performances.iloc[2]['F1 score']})
wandb.log({'Card Precision@100': df_performances.iloc[2]['Card Precision@100']})

dt_artifact = wandb.Artifact('dt_maxdepth_unlim', type='decision_tree', description='trained baseline decision tree with default values (max depth unlimited)')
dt_artifact.add_dir('models/baseline/dt_maxdepth_unlim')
wandb.log_artifact(dt_artifact)
wandb.finish()

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.016666666666666666, max=1.0…

[34m[1mwandb[0m: Adding directory to artifact (.\models\baseline\dt_maxdepth_unlim)... Done. 0.0s


0,1
AUC ROC,▁
Average precision,▁
Card Precision@100,▁
F1 score,▁
Prediction execution time,▁
Training execution time,▁

0,1
AUC ROC,0.788
Average precision,0.309
Card Precision@100,0.243
F1 score,0.553
Prediction execution time,0.01506
Training execution time,1.05099


In [72]:
config_rf = dict(
    dataset_id = 'fraud-detection-handbook-transformed',
    validation = 'train test split',
    begin_date = BEGIN_DATE,
    delta_train = 7,
    delta_delay = 7,
    delta_test = 7,
    random_state = 0,
    n_jobs=-1,
    scale=True
)
wandb.init(project="mgr-anomaly-tsxai-project", config=config_rf, tags=['baseline', 'random-forest'])
wandb.log({'Training execution time': df_execution_times.iloc[3]['Training execution time']})
wandb.log({'Prediction execution time':  df_execution_times.iloc[3]['Prediction execution time']})
wandb.log({'AUC ROC': df_performances.iloc[3]['AUC ROC']})
wandb.log({'Average precision': df_performances.iloc[3]['Average precision']})
wandb.log({'F1 score': df_performances.iloc[3]['F1 score']})
wandb.log({'Card Precision@100': df_performances.iloc[3]['Card Precision@100']})

rf_artifact = wandb.Artifact('rf', type='random_forest', description='trained baseline random forest')
rf_artifact.add_dir('models/baseline/rf')
wandb.log_artifact(rf_artifact)
wandb.finish()

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.016666666666666666, max=1.0…

[34m[1mwandb[0m: Adding directory to artifact (.\models\baseline\rf)... Done. 0.0s


0,1
AUC ROC,▁
Average precision,▁
Card Precision@100,▁
F1 score,▁
Prediction execution time,▁
Training execution time,▁

0,1
AUC ROC,0.867
Average precision,0.658
Card Precision@100,0.287
F1 score,0.67
Prediction execution time,0.108
Training execution time,2.16253


In [73]:
config_xgb = dict(
    dataset_id = 'fraud-detection-handbook-transformed',
    validation = 'train test split',
    begin_date = BEGIN_DATE,
    delta_train = 7,
    delta_delay = 7,
    delta_test = 7,
    random_state = 0,
    n_jobs=-1,
    scale=True
)
wandb.init(project="mgr-anomaly-tsxai-project", config=config_xgb, tags=['baseline', 'xgboost'])
wandb.log({'Training execution time': df_execution_times.iloc[4]['Training execution time']})
wandb.log({'Prediction execution time':  df_execution_times.iloc[4]['Prediction execution time']})
wandb.log({'AUC ROC': df_performances.iloc[4]['AUC ROC']})
wandb.log({'Average precision': df_performances.iloc[4]['Average precision']})
wandb.log({'F1 score': df_performances.iloc[4]['F1 score']})
wandb.log({'Card Precision@100': df_performances.iloc[4]['Card Precision@100']})

xgb_artifact = wandb.Artifact('xgb', type='xgboost', description='trained baseline xgboost')
xgb_artifact.add_dir('models/baseline/xgb')
wandb.log_artifact(xgb_artifact)
wandb.finish()

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.016666666666666666, max=1.0…

[34m[1mwandb[0m: Adding directory to artifact (.\models\baseline\xgb)... Done. 0.0s


0,1
AUC ROC,▁
Average precision,▁
Card Precision@100,▁
F1 score,▁
Prediction execution time,▁
Training execution time,▁

0,1
AUC ROC,0.862
Average precision,0.639
Card Precision@100,0.273
F1 score,0.688
Prediction execution time,0.062
Training execution time,3.50323


In [74]:
if torch.cuda.is_available():
    DEVICE = "cuda" 
else:
    DEVICE = "cpu"
print("Selected device is",DEVICE)

Selected device is cuda


In [75]:
config_mlp = dict(
    dataset_id = 'fraud-detection-handbook-transformed',
    validation = 'train test split',
    seed = 42,
    begin_date = '2018-07-25',
    delta_train = 7,
    delta_delay = 7,
    delta_test = 7,
    batch_size=64,
    num_workers=0,
    hidden_size = 1000,
    lr=0.07,
    max_epochs=25,
    scale=False,
    criterion='bce'
)
wandb.init(project="mgr-anomaly-tsxai-project", config=config_mlp, tags=['baseline', 'mlp'])
config_mlp = wandb.config

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.016666666666666666, max=1.0…

In [76]:
# SEED = 42
seed_everything(config_mlp.seed)

In [77]:
x_train = torch.FloatTensor(train_df[input_features].values)
x_test = torch.FloatTensor(test_df[input_features].values)
y_train = torch.FloatTensor(train_df[output_feature].values)
y_test = torch.FloatTensor(test_df[output_feature].values)

In [79]:
train_loader_params = {'batch_size': config_mlp.batch_size,
          'shuffle': True,
          'num_workers': config_mlp.num_workers}
test_loader_params = {'batch_size': config_mlp.batch_size,
          'num_workers': config_mlp.num_workers}

training_set = FraudDataset(x_train.to(DEVICE), y_train.to(DEVICE))

testing_set = FraudDataset(x_test.to(DEVICE), y_test.to(DEVICE))


training_generator = torch.utils.data.DataLoader(training_set, **train_loader_params)
testing_generator = torch.utils.data.DataLoader(testing_set, **test_loader_params)

In [81]:
model = SimpleFraudMLP(len(input_features), config_mlp.hidden_size).to(DEVICE)

In [82]:
criterion = torch.nn.BCELoss().to(DEVICE)

In [83]:
model.eval()

SimpleFraudMLP(
  (fc1): Linear(in_features=15, out_features=1000, bias=True)
  (relu): ReLU()
  (fc2): Linear(in_features=1000, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)

In [84]:
optimizer = torch.optim.SGD(model.parameters(), lr = config_mlp.lr)

In [85]:
model.train()

wandb.watch(model, criterion, log='all', log_freq=100)

start_time=time.time()
epochs_train_losses = []
epochs_test_losses = []
for epoch in range(config_mlp.max_epochs):
    model.train()
    train_loss=[]
    for x_batch, y_batch in training_generator:
        # set the gradients to zero before starting to do backprop as by default PyTorch
        # accumulates gradients on subsequent backward passes (i.e. for every mini-batch
        # in the training phase)
        optimizer.zero_grad()
        y_pred = model(x_batch)
        loss = criterion(y_pred.squeeze(), y_batch)
        # compute gradient for every trainable parameter in a model
        loss.backward()
        # update value of every trainable parameter
        optimizer.step()
        train_loss.append(loss.item())
    
    epochs_train_losses.append(np.mean(train_loss))
    print('Epoch {}: train loss: {}'.format(epoch, np.mean(train_loss)))
    
    val_loss = evaluate_model_no_grad(model,testing_generator,criterion)    
    epochs_test_losses.append(val_loss)
    print('test loss: {}'.format(val_loss))   
    print("")

    wandb.log({'train loss': np.mean(train_loss), 'val loss': val_loss}, step=epoch)
    
training_execution_time=time.time()-start_time

Epoch 0: train loss: 0.03482447893932093
test loss: 0.02212812714421821

Epoch 1: train loss: 0.026304942334636153
test loss: 0.020956096490620177

Epoch 2: train loss: 0.02477469021703899
test loss: 0.020735017180904893

Epoch 3: train loss: 0.024022467529443717
test loss: 0.021015695675631982

Epoch 4: train loss: 0.023468064743609875
test loss: 0.020646726651373806

Epoch 5: train loss: 0.023047099678824123
test loss: 0.019709601263545297

Epoch 6: train loss: 0.022711212656856803
test loss: 0.01983951722781872

Epoch 7: train loss: 0.022619226127091996
test loss: 0.019601698620699787

Epoch 8: train loss: 0.02191990295894263
test loss: 0.020056967769849648

Epoch 9: train loss: 0.021763283022316962
test loss: 0.01953357877695379

Epoch 10: train loss: 0.021666503598763796
test loss: 0.02038441432545904

Epoch 11: train loss: 0.021089430717878988
test loss: 0.020437601461248015

Epoch 12: train loss: 0.020934559022927998
test loss: 0.019799922216856395

Epoch 13: train loss: 0.02095

In [86]:
wandb.log({'Training execution time': training_execution_time})
training_execution_time

529.2248146533966

In [87]:
start_time=time.time()
# no need to set model in eval mode since there are no BN, Dropout layers
predictions_test = model(x_test.to(DEVICE))
prediction_execution_time=time.time()-start_time
wandb.log({'Prediction execution time': prediction_execution_time})

In [89]:
predictions_df=test_df
predictions_df['predictions']=predictions_test.detach().cpu().numpy()
    
performance_df = performance_assessment_f1_included(predictions_df, top_k_list=[100])
performance_df

Unnamed: 0,AUC ROC,Average precision,F1 score,Card Precision@100
0,0.872,0.624,0.673,0.28


In [None]:
torch.save(model.state_dict(), 'models/baseline/mlp/simple_mlp_model.pt')

In [90]:
wandb.log({'AUC ROC': performance_df.loc[0,'AUC ROC']})
wandb.log({'Average precision': performance_df.loc[0,'Average precision']})
wandb.log({'F1 score': performance_df.loc[0,'F1 score']})
wandb.log({'Card Precision@100': performance_df.loc[0,'Card Precision@100']})

mlp_artifact = wandb.Artifact('simple_mlp', type='mlp', description='trained baseline simple multilayer perceptron with 1 hidden layer')
mlp_artifact.add_dir('models/baseline/mlp')
wandb.log_artifact(mlp_artifact)
wandb.finish()

[34m[1mwandb[0m: Adding directory to artifact (.\models\baseline\mlp)... Done. 0.0s


0,1
AUC ROC,▁
Average precision,▁
Card Precision@100,▁
F1 score,▁
Prediction execution time,▁
Training execution time,▁
train loss,█▄▃▃▃▃▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁
val loss,█▅▅▅▅▂▃▂▃▂▄▄▃▂▄▅▂▃▄▄▁▁▂▁▃

0,1
AUC ROC,0.872
Average precision,0.624
Card Precision@100,0.28
F1 score,0.673
Prediction execution time,0.011
Training execution time,529.22481
train loss,0.01945
val loss,0.02019


In [89]:
# testing if saved parameters above can be used to restore
# the model and run inference

model = SimpleFraudMLP(len(input_features), 1000).to(DEVICE)
model.load_state_dict(torch.load('models/baseline/mlp/simple_mlp_model.pt'))
model.eval()
predictions_test = model(x_test.to(DEVICE))
predictions_df=test_df
predictions_df['predictions']=predictions_test.detach().cpu().numpy()
    
performance_assessment_f1_included(predictions_df, top_k_list=[100])

Unnamed: 0,AUC ROC,Average precision,F1 score,Card Precision@100
0,0.872,0.624,0.673,0.28
