# Build Fraud Detection App using OpenAI

## Motivation

- Thought experiment
  - GPT getting a lot of airtime
  - Could AI be used beyond just chat?
- Investigate fraud detection
  - Victim of fraud
  - Simple low-effort evaluations against some exiting technologies (scikit-learn, Apache Spark)
- Found the following quote:
  > *Building AI models from scratch is difficult and time-consuming, but with GPT-3, even a 10 year child can create well performing Deep Learning models.* Source: [Blotout experimenting with Open AI](https://blotout.io/blog/open-ai)
  - Challenge accepted!

## Fraud dataset selection

We need some data for our use case. We can find actual credit card data on [Kaggle](https://www.kaggle.com/mlg-ulb/creditcardfraud). If you don’t have an account at Kaggle, [create one](https://www.kaggle.com/account/login?phase=startRegisterTab) and download the **creditcard.csv** file. The Kaggle website states that this file is 143.84 MB in size.

The data are anonymised credit card transactions containing genuine and fraudulent cases.

The transactions occurred over two days during September 2013, and the dataset includes a total of 284,807 transactions, of which 492 are fraudulent, representing just 0.172% of the total.

This dataset, therefore, presents some challenges for analysis as it is highly unbalanced. There is a good article called [Imbalanced Classification with the Fraudulent Credit Card Transactions Dataset](https://machinelearningmastery.com/imbalanced-classification-with-the-fraudulent-credit-card-transactions-dataset/) by Jason Brownlee.

The dataset consists of the following fields:

- **Time:** The number of seconds elapsed between a transaction and the first transaction in the dataset
- **V1 to V28:** Details not available due to confidentiality reasons
- **Amount:** The monetary value of the transaction
- **Class:** The response variable (0 = no fraud, 1 = fraud)

One method to prepare data for analysis is described below. However, use whatever method is convenient for you.

- Create a Spark Dataframe
  ```
  df = spark.read.csv("/path/to/creditcard.csv",
                      header = True,
                      inferSchema = True
  )
  ```
- Separate fraudulent and non-fraudulent transactions
  ```
  is_fraud = df.select("*").filter("Class == 1")
  no_fraud = df.select("*").filter("Class == 0")
  ```
- Keep all the fraudulent transactions and randomly sample 1% of non-fraudulent transactions without replacement
  ```
  no_fraud = no_fraud.sample(False, 0.01, seed = 123)
  ```
- Concatenate the two Dataframes and sort on the "Time" column
  ```
  df_concat = no_fraud.union(is_fraud)
  df = df_concat.sort("Time")
  df.count()
  ```
- Result is a reduced dataset with 3265 rows, which is what we will use

We'll show the following metrics:
```
                       Predicted 
                | Positive | Negative |
  Actual        |          |          |
----------------+----------+----------+
  Positive      |    TP    |    FN    |
----------------+----------+----------+
  Negative      |    FP    |    TN    |
----------------+----------+----------+
```

- Accuracy = (TP + TN) / (TP + TN + FP + FN)
- Precision = TP / (TP + FP)
- Recall = TP / (TP + FN)
- F1 Score = 2 * (Precision * Recall) / (Precision + Recall)



## Create a SingleStoreDB Cloud account

A [previous article]() showed the steps required to create a free SingleStoreDB Cloud account. We'll use **GPT Demo Group** as our Workspace Group Name and **gpt-demo** as our Workspace Name. We'll make a note of our **password** and **host** name.

We'll use the **SQL Editor** to create a new database, as follows:

```
CREATE DATABASE IF NOT EXISTS creditcard_db;
```

## Create a Deepnote account 

We'll create a [free](https://deepnote.com/sign-up) account on the Deepnote website. Once logged in, we'll create a new Deepnote project to give us a new notebook. We'll also create several new directories called `data`, `images` and `models`.

## Load data from CSV

In [None]:
import ibis
import pandas as pd

ibis.options.interactive = True

pdf = pd.read_csv("data/creditcard.csv")

## Connect to SingleStoreDB and save data

In [None]:
conn = ibis.singlestoredb.connect(
    "admin:<password>@<host>:3306/creditcard_db"
)

We'll replace the `<password>` and `<host>` with the values from our SingleStoreDB Cloud account.

In [None]:
creditcard_tbl = conn.create_table(
    "creditcard",
    pdf,
    force = True
)

## Read data back from SingleStoreDB

The data are safely stored in SingleStoreDB and we could perform further analysis in the cloud environment. However, we'll read the data back just to be sure that we can retrieve the stored data using Ibis.

In [None]:
new_creditcard_tbl = conn.table("creditcard")

new_creditcard_tbl.head(5)

We'll create a Pandas Dataframe from the retrieved data.

In [None]:
pdf = new_creditcard_tbl.execute(limit = None)

## Check the data

First, let's check the number of rows.

In [None]:
pdf.shape[0]

Next, let's check the number of rows for the two values of the response variable.

In [None]:
pdf.groupby("Class").size()

Now let's take a look at the Dataframe.

In [None]:
pdf

Let's get more information on the **Amount** column.

In [None]:
pdf["Amount"].describe()

A visualisation can also be helpful to see the distribution of values.

In [None]:
import plotly.express as px

fig = px.scatter(
    pdf,
    y = "Amount",
    color = pdf["Class"].astype(str),
    hover_data = ["Amount"]
)

fig.update_layout(
    # yaxis_type = "log",
    title = "Amount and Class"
)

fig.show()

Another way to look at the data is to use a histogram.

In [None]:
fig = px.histogram(
    pdf,
    x = "Amount",
    nbins = 50
)

fig.show()

## 1. Logistic Regression with scikit-learn

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# Split the data into features and labels
features = pdf.iloc[:, 1:30]
labels = pdf.iloc[:, 30]

# Split the data into training and testing sets
train_features, test_features, train_labels, test_labels = train_test_split(
    features,
    labels,
    test_size = 0.3,
    random_state = 42
)

# Train the logistic regression model
train_model = LogisticRegression(max_iter = 1000)
train_model.fit(train_features, train_labels)

# Make predictions on the test set
predicted_labels = train_model.predict(test_features)

# Generate and plot the confusion matrix
cm = confusion_matrix(test_labels, predicted_labels)

fig = px.imshow(
    cm,
    x = ["Genuine (0)", "Fraudulent (1)"],
    y = ["Genuine (0)", "Fraudulent (1)"],
    color_continuous_scale = "Reds",
    labels = dict(x = "Predicted Label", y = "True Label")
)

# Add annotations to the heatmap
for i in range(len(cm)):
    for j in range(len(cm)):
        fig.add_annotation(
            x = j,
            y = i,
            text = str(cm[i][j]),
            font = dict(color = "white" if cm[i][j] > cm.max() / 2 else "black"),
            showarrow = False
        )

fig.update_layout(
    title = "Confusion Matrix - Logistic Regression (scikit-learn)"
)

fig.show()

In [None]:
# Calculate and print the accuracy, precision, recall and f1 of the model
report = classification_report(test_labels, predicted_labels)
print(report)

## Install Apache Spark

In [None]:
!sudo apt-get update -qq > /dev/null 2>&1
!sudo mkdir -p /usr/share/man/man1 > /dev/null 2>&1
!sudo apt-get install -yqq openjdk-11-jdk > /dev/null 2>&1
!pip -q install pyspark > /dev/null 2>&1

## 2. Logistic Regression with Apache Spark

In [None]:
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml.feature import VectorAssembler
from pyspark.sql import SparkSession

# Create the spark session
spark = SparkSession.builder.appName("FraudDetection").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

sdf = spark.createDataFrame(pdf)

# Select features and labels
features = sdf.columns[1:30]
labels = "Class"

# Assemble features into vector
assembler = VectorAssembler(inputCols = features, outputCol = "features")
sdf = assembler.transform(sdf).select("features", labels)

# Using the code below instead of
# train, test = sdf.cache().randomSplit([0.7, 0.3], seed = 42)

pandas_df = sdf.toPandas()
train_df, test_df = train_test_split(
    pandas_df,
    test_size = 0.3,
    random_state = 42
)

# Split the data into training and testing sets
train = spark.createDataFrame(train_df)
test = spark.createDataFrame(test_df)

# Initialise logistic regression model
lr = LogisticRegression(
    maxIter = 1000,
    featuresCol = "features",
    labelCol = labels
)

# Train the logistic regression model
train_model = lr.fit(train)

# Make predictions on the test set
predictions = train_model.transform(test)

# Calculate the accuracy, precision, recall and f1 of the model
accuracy = predictions.filter(predictions.Class == predictions.prediction).count() / float(test.count())

evaluator = MulticlassClassificationEvaluator(
    labelCol = labels,
    predictionCol = "prediction",
    metricName = "precisionByLabel"
)
precision = evaluator.evaluate(predictions)

evaluator = MulticlassClassificationEvaluator(
    labelCol = labels,
    predictionCol = "prediction",
    metricName = "recallByLabel"
)
recall = evaluator.evaluate(predictions)

evaluator = MulticlassClassificationEvaluator(
    labelCol = labels,
    predictionCol = "prediction",
    metricName = "fMeasureByLabel"
)
f1 = evaluator.evaluate(predictions)

# Create confusion matrix
cm = predictions.select("Class", "prediction")
cm = cm.groupBy("Class", "prediction").count()
cm = cm.toPandas()

# Pivot the confusion matrix
cm = cm.pivot(
    index = "Class",
    columns = "prediction",
    values = "count"
)

# Generate and plot the confusion matrix
fig = px.imshow(
    cm,
    x = ["Genuine (0)", "Fraudulent (1)"],
    y = ["Genuine (0)", "Fraudulent (1)"],
    color_continuous_scale = "Reds",
    labels = dict(x = "Predicted Label", y = "True Label")
)

# Add annotations to the heatmap
for i in range(len(cm)):
    for j in range(len(cm)):
        fig.add_annotation(
            x = j, 
            y = i,
            text = str(cm.iloc[i, j]),
            font = dict(color = "white" if cm.iloc[i, j] > cm.values.max() / 2 else "black"),
            showarrow = False
        )

fig.update_layout(title_text = "Confusion Matrix - Logistic Regression (Spark)")

fig.show()

In [None]:
# Print the accuracy, precision, recall and f1 of the model
print("Accuracy: %.4f" % accuracy)
print("Precision: %.4f" % precision)
print("Recall: %.4f" % recall)
print("F1: %.4f" % f1)

## 3. OpenAI

Initially, it may be a good idea just to test the OpenAI API with a very small sample of 100 transactions (50 fraudulent and 50 non-fraudulent). This can be achieved, as follows:

```
new_pdf = pdf.groupby("Class").sample(n = 50)
```

The cost for this should be approximately US$0.39 (39 cents).

Subsequently, if you decide to use the full 3265 rows, you can just copy the Dataframe, as follows:

```
new_pdf = pdf.copy()
```

The cost for this should be approximately US$13 (13 dollars).

In [None]:
# Cost US$00.39 for 100
new_pdf = pdf.groupby("Class").sample(n = 50)

# Cost US$13.00 for 3265
# new_pdf = pdf.copy()

In [None]:
my_key = "<Add your OpenAI Key>"

The following code handles rate limits, which occur with the free credits. The code is adapted from a [notebook](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_handle_rate_limits.ipynb).

In [None]:
import pickle
import openai
import csv
import random

# Set OpenAI API key
openai.api_key = my_key

from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
)

@retry(wait = wait_random_exponential(min = 1, max = 60),
       stop = stop_after_attempt(6)
)

def completion_with_backoff(**kwargs):
    return openai.Completion.create(**kwargs)

In [None]:
data = new_pdf.values.tolist()

# Split the data into features and labels
features = [[float(cell) for cell in row[1:29]] + [float(row[29])] for row in data]
labels = [int(row[-1]) for row in data]

# Split the data into training and testing sets
train_features, test_features, train_labels, test_labels = train_test_split(
    features,
    labels,
    test_size = 0.3,
    random_state = 42
)

In the next code block to execute the code, comment out the following line, as follows:

```
# raise KeyboardInterrupt("Execution stopped manually.")
```

In [None]:
raise KeyboardInterrupt("Execution stopped manually.")

# Train the model
train_model = "text-davinci-002"
train_prompt = (
    f"The goal of this task is to train a model to classify transactions \n"
    f"as fraudulent or not based on historical data. Each transaction is \n"
    f"represented by 28 features (the details of which are not available) \n"
    f"and the monetary value of the transaction in the last column. The \n"
    f"label for each transaction is either 0 indicating that it is not \n"
    f"fraudulent, or 1 indicating that it is fraudulent. Your task is to use \n"
    f"the OpenAI GPT-3 API to train a model to classify transactions as \n"
    f"fraudulent or not. Please classify the following transactions as either \n"
    f"not fraudulent or fraudulent."
)

train_model_response = completion_with_backoff(
    engine = train_model,
    prompt = train_prompt,
    temperature = 0.5,
    max_tokens = 30,
    n = 1,
    stop = None,
    timeout = 30,
)

train_model_id = train_model_response.model

for i in range(len(train_features)):
    prompt = (f"Train the model to classify the transaction with the following \n"
              f"label: {train_labels[i]}, with features: {train_features[i]}"
    )
    response = completion_with_backoff(
        engine = train_model_id,
        prompt = prompt,
        temperature = 0.5,
        max_tokens = 30,
        n = 1,
        stop = None,
        timeout = 30,
    )

In [None]:
# Save the train model to disk
train_model = openai.Model(train_model_id)
with open("models/train_model.pkl", "wb") as f:
    pickle.dump(train_model, f)

In [None]:
# Check that the train model can be read back from disk
with open("models/train_model.pkl", "rb") as f:
    train_model = pickle.load(f)

In the next code block to execute the code, comment out the following line, as follows:

```
# raise KeyboardInterrupt("Execution stopped manually.")
```

In [None]:
raise KeyboardInterrupt("Execution stopped manually.")

# Evaluate the model
test_model = train_model
test_model_prompt = (
    f"Classify whether the transaction with the following features is \n"
    f"either not fraudulent or fraudulent."
)

test_model_response = completion_with_backoff(
    engine = train_model.id,
    prompt = test_model_prompt,
    temperature = 0.5,
    max_tokens = 30,
    n = 1,
    stop = None,
    timeout = 30,
)

test_model_id = test_model_response.model

predicted_labels = []
for i in range(len(test_features)):
    prompt = (f"Classify whether the transaction with the following features is \n"
              f"either not fraudulent or fraudulent: {test_features[i]}"
    )
    response = completion_with_backoff(
        engine = test_model_id,
        prompt = prompt,
        temperature = 0.5,
        max_tokens = 30,
        n = 1,
        stop = None,
        timeout = 30,
    )

    predicted_label = response.choices[0].text.strip().lower().replace(".", "")
    binary_label = 1 if predicted_label == "fraudulent" else 0
    predicted_labels.append(binary_label)

In [None]:
# Save the test model to disk
test_model = openai.Model(test_model_id)
with open("models/test_model.pkl", "wb") as f:
    pickle.dump(test_model, f)

In [None]:
# Check that the test model can be read back from disk
with open("models/test_model.pkl", "rb") as f:
    test_model = pickle.load(f)

In [None]:
# Generate and plot the confusion matrix
cm = confusion_matrix(test_labels, predicted_labels)
fig = px.imshow(
    cm,
    x = ["Genuine (0)", "Fraudulent (1)"],
    y = ["Genuine (0)", "Fraudulent (1)"],
    color_continuous_scale = "Reds",
    labels = dict(x = "Predicted Label", y = "True Label")
)

# Add annotations to the heatmap
for i in range(len(cm)):
    for j in range(len(cm)):
        fig.add_annotation(
            x = j,
            y = i,
            text = str(cm[i][j]),
            font = dict(color = "white" if cm[i][j] > cm.max() / 2 else "black"),
            showarrow = False
        )

fig.update_layout(title = "Confusion Matrix - OpenAI Model")

fig.show()

In [None]:
# Calculate and print the accuracy, precision, recall and f1 of the model
report = classification_report(test_labels, predicted_labels)
print(report)

## Summary

- GPT and similar technologies can augment existing ML/DL
  - Useful to analyse text, such as email messages, to detect potential fraud
- Fine-tuning could help
  - [Fine tuning classification example](https://github.com/openai/openai-cookbook/blob/main/examples/Fine-tuned_classification.ipynb)
- Finding working examples could be a challenge
  - Technology is moving very fast
  - Older examples may no longer work
  - [OpenAI Cookbook](https://github.com/openai/openai-cookbook)
- Care with privacy and personal information
  - Use fake/mock data
- Think carefully about prompt design
  - Test initially on small scale
  - Save your models
  - Watch the costs and manage your budget
  ![Used_Punchcard](images/Used_Punchcard.jpg)
  Source: [Wikipedia](https://en.wikipedia.org/wiki/Punched_card)