# Creating a Neural Network to predict credit card transaction fraud

## Introduction

Cybersecurity is an important topic in today's world - an important aspect of the digital aspect of our lives is securing funds and our financial assetes - those include credit/debit cards and our bank account credentials. In this project, I am attempting to prototype and test a Neural Network capable of detecting whether a transaction is fraudulent or not. I will be using Keras as an API to TensorFlow and basic pandas data manipulation skills. In this particular case, I will be developing a binary classification model.

## Database

Database has been downloaded from kaggle - **[database link](https://www.kaggle.com/datasets/nelgiriyewithana/credit-card-fraud-detection-dataset-2023)**
Contained data include an **id**, **28 columns** containing numbers that represent various **transaction details**, **amount of funds** transferred and wether the transaction was a **scam or not**. We need to scale the values using standard scaler and extract the label from "Class" column.

## Imports and overview

Here I import the necessary modules, read the database using pandas and initially check the data in the database.

In [25]:
import pandas as pd
import numpy as np
from tensorflow import keras
from sklearn.preprocessing import StandardScaler
from sklearn.compose import make_column_transformer
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split

CreditDataRaw = pd.read_csv('archive/creditcard_2023.csv')

CreditDataRaw.head()

Unnamed: 0,id,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0,-0.260648,-0.469648,2.496266,-0.083724,0.129681,0.732898,0.519014,-0.130006,0.727159,...,-0.110552,0.217606,-0.134794,0.165959,0.12628,-0.434824,-0.08123,-0.151045,17982.1,0
1,1,0.9851,-0.356045,0.558056,-0.429654,0.27714,0.428605,0.406466,-0.133118,0.347452,...,-0.194936,-0.605761,0.079469,-0.577395,0.19009,0.296503,-0.248052,-0.064512,6531.37,0
2,2,-0.260272,-0.949385,1.728538,-0.457986,0.074062,1.419481,0.743511,-0.095576,-0.261297,...,-0.00502,0.702906,0.945045,-1.154666,-0.605564,-0.312895,-0.300258,-0.244718,2513.54,0
3,3,-0.152152,-0.508959,1.74684,-1.090178,0.249486,1.143312,0.518269,-0.06513,-0.205698,...,-0.146927,-0.038212,-0.214048,-1.893131,1.003963,-0.51595,-0.165316,0.048424,5384.44,0
4,4,-0.20682,-0.16528,1.527053,-0.448293,0.106125,0.530549,0.658849,-0.21266,1.049921,...,-0.106984,0.729727,-0.161666,0.312561,-0.414116,1.071126,0.023712,0.419117,14278.97,0


## Checking the shape

It is beneficial to know the amount of records we are working on in our project, that's what the shape attribute is for.

In [27]:
CreditDataRaw.shape

(568630, 31)

## Checking the bias

In most machine learning models, we have to take into account the amount of rows of each label in our data, we have to check how many transactions from our dataset were fraudulent and how many were not

In [28]:
CreditDataRaw["Class"].value_counts()

Class
0    284315
1    284315
Name: count, dtype: int64

## Checking the null values

Okay, now that we know that the classes are equally represented in the dataset, it would be recommended to check for null fields to make sure our data is clean.

In [29]:
CreditDataRaw.isnull().sum()

id        0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
Amount    0
Class     0
dtype: int64

## Transforming data in the dataset

Wonderful, that means our data is clean and ready to be transformed into a form suitable for a NN. I will be applying **StandardScaler()** function to the columns contating the transaction details and amount.

In [30]:
ColumsToScale = ["V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11",\
                "V12", "V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20", "V21",\
                "V22","V23", "V24", "V25", "V26", "V27", "V28", "Amount"]

PreProcessor = make_column_transformer(\
    (StandardScaler(), ColumsToScale))

y = CreditDataRaw.copy().pop('Class')

X = CreditDataRaw

X_train, X_valid, y_train, y_valid = \
    train_test_split(X, y, stratify=y, train_size=0.75)

X_train = PreProcessor.fit_transform(X_train)

X_valid = PreProcessor.transform(X_valid)