# Credit Card Fraud Detection Using Daimensions

In this notebook, we will be using a dataset from Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles). This dataset has 30 attribute columns to describe a credit card transaction and one target column to determine if it is a fraudulant transaction. The dataset can be found on Kaggle: 
https://www.kaggle.com/mlg-ulb/creditcardfraud

Below is a sample of the data. All of the features that start with "V" are the result of a PCA transformation on the sensitive data relevant to the transaction. We are trying to predict the "Class" column, and it has the labels "1" for fraudulent transactions and "0" for regular ones. Also, the dataset is highly unbalanced, with only 0.17% of the transactions being fraudulent.

In [6]:
! head creditcard.csv
# file needs to be unzipped

"Time","V1","V2","V3","V4","V5","V6","V7","V8","V9","V10","V11","V12","V13","V14","V15","V16","V17","V18","V19","V20","V21","V22","V23","V24","V25","V26","V27","V28","Amount","Class"
0,-1.3598071336738,-0.0727811733098497,2.53634673796914,1.37815522427443,-0.338320769942518,0.462387777762292,0.239598554061257,0.0986979012610507,0.363786969611213,0.0907941719789316,-0.551599533260813,-0.617800855762348,-0.991389847235408,-0.311169353699879,1.46817697209427,-0.470400525259478,0.207971241929242,0.0257905801985591,0.403992960255733,0.251412098239705,-0.018306777944153,0.277837575558899,-0.110473910188767,0.0669280749146731,0.128539358273528,-0.189114843888824,0.133558376740387,-0.0210530534538215,149.62,"0"
0,1.19185711131486,0.26615071205963,0.16648011335321,0.448154078460911,0.0600176492822243,-0.0823608088155687,-0.0788029833323113,0.0851016549148104,-0.255425128109186,-0.166974414004614,1.61272666105479,1.06523531137287,0.48909501589608,-0.143772296441519,0.635558093258208,0.4639170410

For this dataset, our objective is to understand which attributes are most important, and then be able to build a model that detects credit card fraud. Daimension's has an option to enable attribute ranking, which is extremely helpful in finding the features that are most correlated with the target class.

## 1. Get Measurements
Before we build the predictor for the dataset, it would be wise to measure it. This allows us to find the most optimal model, without even having to build one. For more information about how to use Daimensions and why we want to measure our data beforehand, check out the Titanic notebook.

In [7]:
! btc creditcard.csv -measureonly


Brainome Daimensions(tm) 0.99 Copyright (c) 2019 - 2021 by Brainome, Inc. All Rights Reserved.
Licensed to:              Alexander Makhratchev  (Evaluation)
Expiration Date:          2021-04-30   56 days left
Number of Threads:        1
Maximum File Size:        30 GB
Maximum Instances:        unlimited
Maximum Attributes:       unlimited
Maximum Classes:          unlimited
Connected to:             daimensions.brainome.ai  (local execution)



Command:
    btc creditcard.csv -measureonly

Start Time:                 03/05/2021, 02:28


Data:
    Input:                      creditcard.csv
    Target Column:              Class
    Number of instances:        284807
    Number of attributes:       30
    Number of classes:          2
    Class Balance:              0: 99.83%, 1: 0.17%

Learnability:
    Best guess accuracy:          99.83%
    Data Sufficiency:            Maybe enough data to generalize. [yellow]

Capacity Progression:            at [ 5%, 10%, 20%, 40%, 80%, 100% ]
    

## 2. Neural Network with -O 
From the daimensions measurements, we can see that the best model for this dataset would be a neural network. It has the highest generalization and lowest memory equivalent capacity. However, the neural network has a much higher risk for overfit. Because the dataset is so unbalanced, we will be using the -O command line option in order optimize the true positive rate (TPR). After the -O, we specify the label to focus on, and in our case it is the fradulent charges "1".

In [9]:
! btc creditcard.csv -f NN -O 1 --yes 


Brainome Daimensions(tm) 0.99 Copyright (c) 2019 - 2021 by Brainome, Inc. All Rights Reserved.
Licensed to:              Alexander Makhratchev  (Evaluation)
Expiration Date:          2021-04-30   56 days left
Number of Threads:        1
Maximum File Size:        30 GB
Maximum Instances:        unlimited
Maximum Attributes:       unlimited
Maximum Classes:          unlimited
Connected to:             daimensions.brainome.ai  (local execution)



Command:
    btc creditcard.csv -f NN -O 1 --yes

Start Time:                 03/05/2021, 03:37


Data:
    Input:                      creditcard.csv
    Target Column:              Class
    Number of instances:        284807
    Number of attributes:       30
    Number of classes:          2
    Class Balance:              0: 99.83%, 1: 0.17%

Learnability:
    Best guess accuracy:          99.83%
    Data Sufficiency:            Maybe enough data to generalize. [yellow]

Capacity Progression:            at [ 5%, 10%, 20%, 40%, 80%, 100% ]


The neural network had a very poor overall accuracy on the validation set. However, the true positive rate is 100%, signifying that every transaction that was fraudulent was identified. 

Now we will re-run the previous command, but this time we will add the -e command in order to increase the training effort of the model.

In [4]:
! btc creditcard.csv -f NN -O 1 --yes -e 5


Brainome Table Compiler 0.99
Copyright (c) 2019-2021 Brainome, Inc. All Rights Reserved.
Licensed to:                 Alexander Makhratchev  (Evaluation)
Expiration Date:             2021-04-30   55 days left
Maximum File Size:           30 GB
Maximum Instances:           unlimited
Maximum Attributes:          unlimited
Maximum Classes:             unlimited
Connected to:                daimensions.brainome.ai  (local execution)

Command:
    btc creditcard.csv -f NN -O 1 --yes -e 5

Start Time:                 03/06/2021, 00:49 UTC

Data:
    Input:                      creditcard.csv
    Target Column:              Class
    Number of instances:     284807
    Number of attributes:        30
    Number of classes:            2
    Class Balance:                0: 99.83%, 1: 0.17%

Learnability:
    Best guess accuracy:          99.83%
    Data Sufficiency:             Maybe enough data to generalize. [yellow]

Capacity Progression:             at [ 5%, 10%, 20%, 40%, 80%, 100% ]
   

## 3. Decision Tree with -O
We can also try to a decision tree for the dataset by simply replacing the NN command with DT. 

In [10]:
! btc creditcard.csv -rank -f DT -O 1 --yes


Brainome Daimensions(tm) 0.99 Copyright (c) 2019 - 2021 by Brainome, Inc. All Rights Reserved.
Licensed to:              Alexander Makhratchev  (Evaluation)
Expiration Date:          2021-04-30   56 days left
Number of Threads:        1
Maximum File Size:        30 GB
Maximum Instances:        unlimited
Maximum Attributes:       unlimited
Maximum Classes:          unlimited
Connected to:             daimensions.brainome.ai  (local execution)



Command:
    btc creditcard.csv -rank -f DT -O 1 --yes

Start Time:                 03/05/2021, 04:37

Attribute Ranking:
    Important columns:          V17, V14, V10, V9, V25
    Overfit risk:                   0.0%
    Ignoring columns:           Time, V1, V2, V3, V4, V5, V6, V7, V8, V11, V12, V13, V15, V16, V18, V19, V20, V21, V22, V23, V24, V26, V27, V28, Amount



Data:
    Input:                      creditcard.csv
    Target Column:              Class
    Number of instances:        284807
    Number of attributes:       5
    Number of

The decion tree was able to predict most of the fraudelent charges with 99.98% accuracy. The use of attribute ranking significantly reduces the noise in a dataset and improves accuracy.

## 4. Neural Netork with -balance
Now we will try the -balance command which optimizes the true positive rate for each class, instead of a specific one.

In [11]:
! btc creditcard.csv -f NN -balance --yes


Brainome Daimensions(tm) 0.99 Copyright (c) 2019 - 2021 by Brainome, Inc. All Rights Reserved.
Licensed to:              Alexander Makhratchev  (Evaluation)
Expiration Date:          2021-04-30   56 days left
Number of Threads:        1
Maximum File Size:        30 GB
Maximum Instances:        unlimited
Maximum Attributes:       unlimited
Maximum Classes:          unlimited
Connected to:             daimensions.brainome.ai  (local execution)



Command:
    btc creditcard.csv -f NN -balance --yes

Start Time:                 03/05/2021, 04:57


Data:
    Input:                      creditcard.csv
    Target Column:              Class
    Number of instances:        284807
    Number of attributes:       30
    Number of classes:          2
    Class Balance:              0: 99.83%, 1: 0.17%

Learnability:
    Best guess accuracy:          99.83%
    Data Sufficiency:            Maybe enough data to generalize. [yellow]

Capacity Progression:            at [ 5%, 10%, 20%, 40%, 80%, 100

Unfortunately, our model performs slightly worse than best guess on the dataset, but the true positive rate is 99.89%. 

Now we will re run the following command, but will use the -e command to increase the amount of effort in training the model.

In [5]:
! btc creditcard.csv -f NN -balance --yes -e 5


Brainome Table Compiler 0.99
Copyright (c) 2019-2021 Brainome, Inc. All Rights Reserved.
Licensed to:                 Alexander Makhratchev  (Evaluation)
Expiration Date:             2021-04-30   55 days left
Maximum File Size:           30 GB
Maximum Instances:           unlimited
Maximum Attributes:          unlimited
Maximum Classes:             unlimited
Connected to:                daimensions.brainome.ai  (local execution)

Command:
    btc creditcard.csv -f NN -balance --yes -e 5

Start Time:                 03/06/2021, 01:24 UTC

Data:
    Input:                      creditcard.csv
    Target Column:              Class
    Number of instances:     284807
    Number of attributes:        30
    Number of classes:            2
    Class Balance:                0: 99.83%, 1: 0.17%

Learnability:
    Best guess accuracy:          99.83%
    Data Sufficiency:             Maybe enough data to generalize. [yellow]

Capacity Progression:             at [ 5%, 10%, 20%, 40%, 80%, 100% ]

From the results, it looks like our model did not perform well. The validation accuracy was very low, because the model simply guessed all of the charges are fraudulent. 

# 5. Random Forest

In the newest version of the Brainome Table Compiler, the random forest model is included. We can run it on the dataset and increase the effort level to improve the accuracy.

In [13]:
! btc creditcard.csv -f RF --yes -e 5


Brainome Table Compiler 0.99
Copyright (c) 2019-2021 Brainome, Inc. All Rights Reserved.
Licensed to:                 Alexander Makhratchev  (Evaluation)
Expiration Date:             2021-04-30   53 days left
Maximum File Size:           30 GB
Maximum Instances:           unlimited
Maximum Attributes:          unlimited
Maximum Classes:             unlimited
Connected to:                daimensions.brainome.ai  (local execution)

Command:
    btc creditcard.csv -f RF --yes -e 5

Start Time:                 03/08/2021, 21:38 UTC

Data:
    Input:                      creditcard.csv
    Target Column:              Class
    Number of instances:     284807
    Number of attributes:        30
    Number of classes:            2
    Class Balance:                0: 99.83%, 1: 0.17%

Learnability:
    Best guess accuracy:          99.83%
    Data Sufficiency:             Maybe enough data to generalize. [yellow]

Capacity Progression:             at [ 5%, 10%, 20%, 40%, 80%, 100% ]
    Idea

The Random Forest model did better than best guess on the validation data. Additionally, the True Positive Rate is almost near 100%, which signifies that a majority of the fraudulent transactions were detected.

# 6. Random Forest with -O and -rank

We can run the same command as we did above, but now we will utilize the -O command in order to optimize the True Positive Rate. 

In [11]:
! btc creditcard.csv -f RF --yes -e 5 -O 1 -rank


Brainome Table Compiler 0.99
Copyright (c) 2019-2021 Brainome, Inc. All Rights Reserved.
Licensed to:                 Alexander Makhratchev  (Evaluation)
Expiration Date:             2021-04-30   55 days left
Maximum File Size:           30 GB
Maximum Instances:           unlimited
Maximum Attributes:          unlimited
Maximum Classes:             unlimited
Connected to:                daimensions.brainome.ai  (local execution)

Command:
    btc creditcard.csv -f RF --yes -e 5 -O 1 -rank

Start Time:                 03/06/2021, 21:20 UTC


Attribute Ranking:
    Important columns:          V17, V14, V10, V9, V25
    Overfit risk:                  0.0%
    Ignoring columns:           Time, V1, V2, V3, V4, V5, V6, V7, V8, V11, V12, V13, V15, V16, V18, V19, V20, V21, V22, V23, V24, V26, V27, V28, Amount

Data:
    Input:                      creditcard.csv
    Target Column:              Class
    Number of instances:     284807
    Number of attributes:         5
    Number of classes:

The validation score is higher than best guess, and 99.98% of fraudulent transactions were identified. However, only 89.02% of the regular transactions were identified. 