# Credit Card Fraud Detection

```{article-info}
:avatar: https://avatars.githubusercontent.com/u/25820201?v=4
:avatar-link: https://github.com/PhotonicGluon/
:author: "[Ryan Kan](https://github.com/PhotonicGluon/)"
:date: "Jul 1, 2024"
:read-time: "{sub-ref}`wordcount-minutes` min read"
```

*This notebook is largely inspired by the Keras code example [Imbalanced classification: credit card fraud detection](https://keras.io/examples/structured_data/imbalanced_classification/) by [fchollet](https://twitter.com/fchollet).*

<center>
    <img alt="Credit Cards" style="width: 75%" src="https://storage.googleapis.com/kaggle-datasets-images/310/684/3503c6c827ca269cc00ffa66f2a9c207/dataset-cover.jpg">
</center>

It essential that credit card companies can detect fraudulent transactions using credit cards so that customers are not charged for items that they did not buy. This example looks at the [Kaggle Credit Card Fraud Detection](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud) dataset to demonstrate how to train a classification model on data with highly imbalanced classes.

:::{note}
We will use the `jax` backend for faster execution of the code. Feel free to ignore the cell below.
:::

In [1]:
import os
os.environ["KERAS_BACKEND"] = "jax"

## Preparing the Data

### Loading the Data

The dataset we will be using is the [Kaggle Credit Card Fraud Detection](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud) dataset. To access it, you will need a Kaggle account.

```{button-link} https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
:color: primary
:shadow:

Download Data
```

The dataset contains transactions made by credit cards in September 2013 by European cardholders over two days, where there are 492 frauds out of 284,807 transactions. The dataset is highly unbalanced &mdash; the fraudulent transactions account for only 0.172% of all transactions. Despite this class imbalance, we will try to create a model that detects fraud.

The dataset contains only numerical input variables which are the result of a [Principal Component Analysis (PCA)](https://en.wikipedia.org/wiki/Principal_component_analysis) transformation. However, the real features used to generate the original, unmodified dataset are not given; the features given here are the principal components obtained with PCA. The only unchanged columns are `Time` and `Amount`. 
- The `Time` is the seconds elapsed between each transaction and the first transaction in the dataset.
- The `Amount` is the transaction amount.

Our aim is to predict the `Class` label, where `1` reflects a fraudulent transaction and `0` otherwise.

The dataset is saved in the file called `creditcard.csv` in the folder `data`. We will first vectorize the data.

In [2]:
FILE_NAME = "data/creditcard.csv"

In [3]:
import numpy as np

all_features = []
all_targets = []

with open(FILE_NAME) as f:
    for i, line in enumerate(f):
        # We will skip the first line, which is the header
        if i == 0:
            # Skip the header
            print("HEADER:", line.strip())
            continue
        
        # Get the fields of that row
        fields = line.strip().split(",")
        all_features.append([float(v.replace('"', "")) for v in fields[:-1]])
        all_targets.append([int(fields[-1].replace('"', ""))])
        
        # Print the first line as an example of what features we have
        if i == 1:
            print("EXAMPLE FEATURES:", all_features[-1])

features = np.array(all_features, dtype="float32")
targets = np.array(all_targets, dtype="uint8")
print("Shape of features:", features.shape)
print("Shape of targets: ", targets.shape)

HEADER: "Time","V1","V2","V3","V4","V5","V6","V7","V8","V9","V10","V11","V12","V13","V14","V15","V16","V17","V18","V19","V20","V21","V22","V23","V24","V25","V26","V27","V28","Amount","Class"
EXAMPLE FEATURES: [0.0, -1.3598071336738, -0.0727811733098497, 2.53634673796914, 1.37815522427443, -0.338320769942518, 0.462387777762292, 0.239598554061257, 0.0986979012610507, 0.363786969611213, 0.0907941719789316, -0.551599533260813, -0.617800855762348, -0.991389847235408, -0.311169353699879, 1.46817697209427, -0.470400525259478, 0.207971241929242, 0.0257905801985591, 0.403992960255733, 0.251412098239705, -0.018306777944153, 0.277837575558899, -0.110473910188767, 0.0669280749146731, 0.128539358273528, -0.189114843888824, 0.133558376740387, -0.0210530534538215, 149.62]
Shape of features: (284807, 30)
Shape of targets:  (284807, 1)


### Preprocessing the Data

First, we will split the data into training and validation datasets. The proportion of data that goes into each of the datasets will be controlled by the `VAL_SPLIT` constant. 

In [4]:
VAL_SPLIT = 0.2

In [5]:
num_val_samples = int(len(features) * VAL_SPLIT)
train_features = features[:-num_val_samples]
train_targets = targets[:-num_val_samples]
val_features = features[-num_val_samples:]
val_targets = targets[-num_val_samples:]

print("Number of training samples:", len(train_features))
print("Number of validation samples:", len(val_features))

Number of training samples: 227846
Number of validation samples: 56961


Let's now analyse how bad the class imbalances are in the training dataset.

In [6]:
counts = np.bincount(train_targets[:, 0])
print(
    f"Number of fraudulant samples in training data: {counts[1]} ({100 * float(counts[1]) / len(train_targets):.2f}% of total)"
)

Number of fraudulant samples in training data: 417 (0.18% of total)


We will assign the weight for the classes using the inverse of the counts present in the training dataset.

In [7]:
weight_for_0 = 1.0 / counts[0]
weight_for_1 = 1.0 / counts[1]

print("Weight for normal transactions:     ", weight_for_0)
print("Weight for fraudulant transactions: ", weight_for_1)

Weight for normal transactions:      4.396976638863118e-06
Weight for fraudulant transactions:  0.002398081534772182


Lastly, we will normalize the data using the statistics in the training data.

In [8]:
mean = np.mean(train_features, axis=0)
std = np.std(train_features, axis=0)

train_features = (train_features - mean) / std
val_features = (val_features - mean) / std

## Creating the Model

The model architecture that we will be using here is nothing special. It is a standard fully-connected network with a classification head at the end. We use `DenseMML` layers for the hidden layers, but use the standard `Dense` layer for the classification head. We add some `Dropout` layers to prevent the model from overfitting.

In [9]:
import keras
import keras_mml


model = keras.Sequential(
    [
        keras.Input(shape=train_features.shape[1:]),
        keras_mml.layers.DenseMML(256, activation="relu"),
        keras_mml.layers.DenseMML(256, activation="relu"),
        keras.layers.Dropout(0.3),
        keras_mml.layers.DenseMML(256, activation="relu"),
        keras.layers.Dropout(0.3),
        keras.layers.Dense(1, activation="sigmoid"),
    ]
)
model.summary()

An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.


We compile the model with an aim of minimising `binary_crossentropy` loss using the Adam optimizer. For our metrics, we will monitor

- the number of false negatives;
- the number of false positives;
- the number of true negatives;
- the number of true positives;
- the precision of the model, which is given by $$\mathrm{Precision} = \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FP}}$$ where $\mathrm{TP}$ is the number of true positives and $\mathrm{FP}$ is the number of false positives; and
- the recall of the model, which is given by $$\mathrm{Recall} = \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FN}}$$ where $\mathrm{TP}$ is the number of true positives and $\mathrm{FN}$ is the number of false negatives.

In [10]:
model.compile(
    loss="binary_crossentropy",
    optimizer="adam",
    metrics=[
        keras.metrics.FalseNegatives(name="fn"),
        keras.metrics.FalsePositives(name="fp"),
        keras.metrics.TrueNegatives(name="tn"),
        keras.metrics.TruePositives(name="tp"),
        keras.metrics.Precision(name="precision"),
        keras.metrics.Recall(name="recall"),
    ]
)

We will weight the classes based on the `weight_for_0` and `weight_for_1` calculated in the previous section.

In [11]:
class_weight = {0: weight_for_0, 1: weight_for_1}

With all these defined, we can start the model training!

In [12]:
model.fit(
    train_features,
    train_targets,
    batch_size=2048,
    epochs=30,
    validation_data=(val_features, val_targets),
    class_weight=class_weight,
)

Epoch 1/30
[1m112/112[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 48ms/step - fn: 50.9911 - fp: 37535.1875 - loss: 5.3293e-06 - precision: 0.0041 - recall: 0.7579 - tn: 78943.3047 - tp: 161.3186 - val_fn: 8.0000 - val_fp: 7554.0000 - val_loss: 0.4601 - val_precision: 0.0088 - val_recall: 0.8933 - val_tn: 49332.0000 - val_tp: 67.0000
Epoch 2/30
[1m112/112[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 18ms/step - fn: 30.9469 - fp: 9665.1680 - loss: 3.0677e-06 - precision: 0.0167 - recall: 0.8398 - tn: 106811.8438 - tp: 182.8407 - val_fn: 10.0000 - val_fp: 1416.0000 - val_loss: 0.2439 - val_precision: 0.0439 - val_recall: 0.8667 - val_tn: 55470.0000 - val_tp: 65.0000
Epoch 3/30
[1m112/112[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 18ms/step - fn: 28.1947 - fp: 4317.9468 - loss: 2.2399e-06 - precision: 0.0414 - recall: 0.8734 - tn: 112158.9062 - tp: 185.7522 - val_fn: 11.0000 - val_fp: 1057.0000 - val_loss: 0.1491 - val_precision: 0.0571 - val_recall: 0.8533

<keras.src.callbacks.history.History at 0x7fa3f4536170>

With the model trained, how does it do on the validation dataset?

In [13]:
val_loss, val_fn, val_fp, val_tn, val_tp, val_precision, val_recall = model.evaluate(val_features, val_targets)

print("--- Validation Statistics ---")
print("Loss:           ", val_loss)
print("False Negatives:", int(val_fn))
print("False Positives:", int(val_fp))
print("True Negatives: ", int(val_tn))
print("True Positives: ", int(val_tp))
print("Precision:      ", f"{val_precision * 100:.3f}%")
print("Recall:         ", f"{val_recall * 100:.3f}%")

[1m1781/1781[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - fn: 3.1655 - fp: 298.5135 - loss: 0.0364 - precision: 0.1158 - recall: 0.9160 - tn: 28185.0996 - tp: 41.1689 
--- Validation Statistics ---
Loss:            0.03248622640967369
False Negatives: 8
False Positives: 580
True Negatives:  56306
True Positives:  67
Precision:       10.355%
Recall:          89.333%


## Conclusion

At the end of training, out of 56,961 validation transactions, we 
- correctly identify 67 of them as fraudulent;
- miss 8 fraudulant transactions; and
- incorrectly flag 580 legitimate transactions.

In practice, one would put an even higher weight on class 1 (i.e., the fraudulant class) so as to reflect that false negatives are more costly than false positives.

Regardless, this example shows how `DenseMML` can be used as a replacement for `Dense` layers in classification models, even if the classes present are imbalanced.