# Detecting Credit Card Fraud using Tensorflow

In this project, we'll be leveraging the power of deep learning to solve a key issue that credit card companies often have to address, namely detecting fradulent transactions.

## Importing Python Libraries

In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import sklearn
import scipy
from sklearn.preprocessing import StandardScaler
import sklearn.model_selection as model_selection
from imblearn.over_sampling import SMOTE
import tensorflow as tf
from tensorflow import keras


## Importing the Dataset

In [None]:
CreditCard = pd.read_csv("creditcard.csv")

In [None]:
print("Total no. of records in the dataset are", CreditCard.shape[0])
print("Total features in the dataset are", CreditCard.shape[1])
CreditCard.head()

In [None]:
## To check missing values in the dataset.
CreditCard.isnull().values.any() 

The output shows that our dataset has no missing data

In [None]:
## Rename Class
CreditCard.rename(columns ={'Class': "isFraud"}, inplace = True)
CreditCard = CreditCard.applymap(lambda x: x.replace("'", "") if (isinstance(x, str)) else x)
CreditCard['isFraud'] = pd.to_numeric(CreditCard['isFraud'])

# fraudulent Transactions Percentage
fraud_per = CreditCard[CreditCard.isFraud == 1].isFraud.count() / CreditCard.isFraud.count()*100
print("Percentage of Fraudulent Transactions in the dataset are {:.2f} %".format(fraud_per))

We got the dataset from Kaggle and it contains two days worth of transactions by European cardholders. Due to cdonfidential nature of the data, a PCA transformation was done on 28 features and we have no information on what those features are. The only features that haven't undergone this transformation and we can identify them are 'Time', 'Amount', and 'Class'.

Time represents the seconds elapsed between each transaction and the first transaction in the dataset. 'Amount denotes the amount of transaction anjd 'Class' refers to out target variable with 0 referring to a normal transaction and 1 referring to a fraudulent one.

It is important to note here that the target variable's instances are imbalanced. Only 0.17% of transactions are fraudulent.

In [None]:
## Do fraudulent transactions occur more often during certain time?
f, (ax1, ax2) = plt.subplots(2,1, sharex = True)
f.suptitle('Time of transaction vs Amount by class')

ax1.scatter(CreditCard[CreditCard.isFraud == 1].Time, CreditCard[CreditCard.isFraud == 1].Amount)
ax1.set_title('Fraud')

ax2.scatter(CreditCard[CreditCard.isFraud == 0].Time, CreditCard[CreditCard.isFraud == 0].Amount)
ax2.set_title('Normal')

plt.xlabel('Time in Seconds')
plt.ylabel('Amount')
plt.show()

Doesn't seem like the time of transaction really matters.

## DataPreprocessing

In [None]:
## Defining x and y
x = CreditCard.iloc[:,:-1].values
y = CreditCard.iloc[:,:1].values

In [None]:
## Defining training and tesing set
## Train-Test split
X_train, X_test, y_train, y_test = model_selection.train_test_split(x, y, test_size =0.1, random_state = 100)

In [None]:
## Standardising the dataset as this would speedup the training process

## Standardization
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)


As mentioned previously, this dataset is highly imbalanced. We'll address this issue using Synthetic Minority Oversampling Technique (SMOTE). This technique creates artificial minority class samples by replicating them. In this case it will create synthetic fraud instances and so corrects the imbalance in our dataset.

In [None]:
## SMOTE
# sm = SMOTE(random_state = 2)
# X_train_SMOTE, y_train_SMOTE = sm.fit_resample(X_train,y_train)

oversample = SMOTE()
X_train_SMOTE, y_train_SMOTE = oversample.fit_resample(X_train, y_train)

# ## SMOTE plot
# pd.Series(y_train_SMOTE).value_counts().plot(kind = "bar")
# plt.title("Balanced Dataset")
# plt.show()

## Model ANN Building

## Training ANN

We’ve come to this number of neurons and layers in our network using a trial and error approach. We also used ReLU as our activation function for the hidden layers and a sigmoid function for our output layer. We've used multiple droput layers to prevent our network overfitting.


In [None]:
## DNN
model = keras.Sequential([
    tf.keras.layers.Dense(input_dim = 30, units =128, activation ="relu"),
    tf.keras.layers.Dense(units = 64, activation = "relu"),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(units = 32, activation ="relu"),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(units = 32, activation ="relu"),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(units = 16, activation ="relu"),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(units = 1, activation ="sigmoid")])
model.summary()  

## Model Evaluation

In [None]:
## Metrics
metrics = [
    tf.keras.metrics.Accuracy(name = "Accuracy"),
    tf.keras.metrics.Precision(name = "Precision"),
    tf.keras.metrics.Recall(name ="Recall")]

## Compiling and Fiting the model
model.compile(optimizer ="adam",loss = "binary_crossentropy",
             metrics = metrics)
model.fit(X_train_SMOTE, y_train_SMOTE, batch_size = 32, epochs = 50)

print("Evaluate on test data")
score = model.evaluate(X_test, y_test)
print("test loss, test accuracy, test precision, test recall:", score)

We used ‘adam’ as our optimizer as it’s computationally efficient and is well suited for problems with a high number of parameters and ‘binary_crossentropy’ as our loss function as it’s most appropriate for our binary classification problem. For our evaluation, we’ll not only focus on accuracy as a metric but we’ll assess precision and recall too. Now let’s have a look at how the last 10 epochs went and how well our model performed on our test data.