## Importing the necessary libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

## Loading the Dataset

The dataset can be found [here](http://www.kaggle.com/mlg-ulb/creditcardfraud).

In [None]:
file = '/kaggle/input/creditcardfraud/creditcard.csv'

df = pd.read_csv(file)
df.head()

# Data preprocessing

* Null value check

In [None]:
df.isnull().values.any()

* Data balance check

In [None]:
print( 'Fraud Percentage: {}'.format(round( (df['Class'].value_counts()[1]/len(df))*100, 2) ) )

print( 'Non Fraud Percentage: {}'.format(round( (df['Class'].value_counts()[0]/len(df))*100, 2) ) ) 

Thus, our dataset is highly imbalanced. A model when trained on this dataset predicts with a remarkably high probability that a test credit card is not fraud. 
Therefore, it becomes very difficult to accurately detect frauds which, incidentally, is our task!

* Scaling

All the features have already been scaled except for Time and Amount. So, we scale these two.

In [None]:
from sklearn.preprocessing import RobustScaler

ss1 = RobustScaler()
df['Amount']= ss1.fit_transform(df['Amount'].values.reshape(-1, 1))

ss2 = RobustScaler()
df['Time']= ss2.fit_transform(df['Time'].values.reshape(-1, 1))

df.head()

Note the changed values of Time and Amount

* Splitting the Dataset

1_ Into X and y

In [None]:
X = df.drop('Class', axis=1)
y = df['Class']

2_ Into test and train respectively

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=3)

print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)

## Random Sampling

Since our dataset is highly imbalanced we will have to perform random sampling to get equal cases of fraud and anti-fraud cards.
There are two ways to do this:
* Under sampling - **remove** information
* Over sampling - **synthesize** information

Here, we will use SMOTE oversampling (SMOTE: Synthetic Minority Oversampling TEchnique)

## Training and Testing

We will implement a **neural network** to implement the necessary training and testing

In [None]:
import keras
from keras.models import Sequential
from keras.layers import Dense

import warnings
warnings.filterwarnings('ignore')

In [None]:
classifier = Sequential()
classifier.add(Dense(15, activation='relu',kernel_initializer='uniform',input_shape=(30,)))
classifier.add(Dense(15, activation='relu',kernel_initializer='uniform'))
classifier.add(Dense(1, activation='sigmoid', kernel_initializer='uniform' ))

In [None]:
classifier.compile(optimizer='adam',loss= ['binary_crossentropy'],metrics=['accuracy'])

In [None]:
from imblearn.over_sampling import SMOTE

sm = SMOTE(random_state=42)

Xsm_train, ysm_train = sm.fit_resample(X, y)

classifier.fit(Xsm_train,ysm_train, batch_size=200,epochs=100)

In [None]:
oversample_pred = classifier.predict_classes(x_test, batch_size=200, verbose=0)

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
confusion_matrix(y_test, oversample_pred)

In [None]:
accuracy_score(y_test, oversample_pred)

In [None]:
print(classification_report(y_test, oversample_pred, target_names=['No Fraud', 'Fraud']))