#Scenario: **Credit Card Fraud Detection Using CNN**

###**Dataset Description**
The datasets contains transactions made by credit cards in September 2013 by european cardholders. 

Presents transactions that occurred in two days, where we have **492** frauds out of **284,807** transactions. 

- **Time** - Number of seconds elapsed between this transaction and the first transaction in the dataset
- **V1-V28** - Encrpted attributes (or columns) to protect user identities and sensitive features (v1-v28)
- **Amount** - Transaction Amount
- **Class** - **1** for fraudulent transactions, **0** otherwise

###**Tasks to be performed**

- Import the required libraries and load the dataset
- Perform Exploratory Data Analysis (EDA) on the data set
  - Plot **Univariate Distributions**
    - What is the distribution of the **amount** & **class** columns in the data set?
    
- Pre-process that data set for modeling
  - Handle Missing values present in the data set
  - Scale the data set using **RobustScaler()**
  - Split the data into training and testing set using sklearn's **train_test_split** function
- Build a CNN Model Using Tensorflow 2.0
- Compile and fit the model
- Plot the Training History
      - Make a plot for the Loss Function to visualize the change in Loss at every epoch
      - Make a plot for the Accuracy Metric to visualize the accuracy at every epoch
- Build CNN Model 2 with MaxPooling Layers
- Plot the Training History 
      - Make a plot for the Loss Function to visualize the change in Loss at every epoch
      - Make a plot for the Accuracy Metric to visualize the accuracy at every epoch


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [None]:
df = pd.read_csv('creditcard.csv')
df.head()

###**Exploratory Data Analysis**

####**Plotting Univariate Distributions**
A **Univariate distribution** is a probability distribution of only one random variable

**Note:** You have already seen this in Pandas Profiling. Still, if you want to write the code, you can do so.

What is the distribution of the **Time** & **Amount** columns in the data set?

In [None]:
import plotly.express as px

fig = px.histogram(df, x = 'Time')
fig.show()

In [None]:
fig = px.histogram(df, x = 'Amount')
fig.show()

####**Checking Imbalanced Dataset**

In [None]:
import seaborn as sns
%matplotlib inline

plt.figure(figsize=(12,8))
ax = sns.countplot(df["Class"], color='green')
for p in ax.patches:
    x = p.get_bbox().get_points()[:,0]
    
    y = p.get_bbox().get_points()[1,1]
    
    ax.annotate('{:.2g}%'.format(100.*y/len(df)), (x.mean(), y), ha='center', va='bottom')
plt.show()

___
**Observations:**

The data set is **Highly Unbalanced** with only **0.17%** of transactions being classified as **Fraudulent**. 

Several ways to approach this Imbalance Classification problem:

- **Acquire More Data**
- **Changing the performance metric:**
 - Use the **Confusion Matrix**
 - **F1-Score** (Weighted Average of **Precision** & **Recall**)
 - **ROC Curves**

- **Re-sampling the dataset:** Essentially this is a method that will process the data to have an approximate 50-50 ratio.

 - **Over-sampling**, which is adding copies of the under-represented class (better when you have little data)

 - **Under-sampling**, which deletes instances from the over-represented class (better when he have lot's of data)
___

###**Data Manipulation**

In [None]:
#Robust Scaler is similar to normalization but it instead uses the interquartile range, so that it is robust to outliers

from sklearn.preprocessing import RobustScaler
rs = RobustScaler()

#Fit_Transform the scaled_amount and scaled_time columns in the data set and dropping the Original Time and Amount Column from the data set

df['scaled_amount'] = rs.fit_transform(df['Amount'].values.reshape(-1,1))
df['scaled_time'] = rs.fit_transform(df['Time'].values.reshape(-1,1))
df.drop(['Time', 'Amount'], axis=1, inplace=True) #Dropping the Original Time and Amount Column from the data set

In [None]:
scaled_amount = df['scaled_amount']
scaled_time = df['scaled_time']
df.drop(['scaled_amount', 'scaled_time'], axis=1, inplace=True)
df.insert(0, 'scaled_amount', scaled_amount)
df.insert(0, 'scaled_time', scaled_time)
df.head()

In [None]:

from sklearn.model_selection import train_test_split as holdout #Importing the train_test_split from sklearn library
x = np.array(df.iloc[:, df.columns != 'Class']) #Predictors 
y = np.array(df.iloc[:, df.columns == 'Class']) #Target Column
x_train, x_test, y_train, y_test = holdout(x, y, test_size=0.2, random_state=0) #Splitting the data set into training and testing set

In [None]:
x_train.shape

In [None]:
x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], 1)
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], 1)

x_train.shape, x_test.shape

###**Build a CNN Model Using Tensorflow 2.0**

In [None]:
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.layers import Conv1D, MaxPool1D

model = Sequential()

model.add(Conv1D(32, 2, activation='relu', input_shape = x_train[0].shape))
model.add(BatchNormalization())
model.add(Dropout(0.2))

model.add(Conv1D(64, 2, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(1,, activation='sigmoid'))
#The output layer: Dense layer with 1 neuron. We are predicting a single value as this is a binary classification problem
#Using Sigmoid function because it exists between (0 to 1) and this facilitates us to predict a binary input

In [None]:
model.summary()

- Each layer has an output and its shape is shown in the **Output Shape** column
- Each layer’s output becomes the input for the next layer
- The **Param #** column shows the number of parameters that are trained for each layer
- The total number of parameters is shown at the end, which is equal to the number of **trainable** and **non-trainable parameters**

###**Compile and fit the model**

In [None]:
from tensorflow.keras.optimizers import Adam

In [None]:
# Define the model optimizer, loss function and metrics

model.compile(
    optimizer=Adam(lr=0.0003), 
    loss = 'binary_crossentropy', 
    metrics=['accuracy']
    )

####**EarlyStopping**

We will be using Callback to implement a regularisation approach called **Early Stopping**.

**EarlyStopping** is a technique that monitors the performance of the network for every epoch on a held out validation set during the training run and terminates the training based on the Validation Performance.



In [None]:
from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(monitor='val_accuracy', patience = 5, min_delta = 0.01, mode='max')

#In this case, the training will terminate only if there is no improvement in the monitor performance measure for 5 epochs in a row
#0.01 means that the Validation_Accuracy has to improve by atleast 0.01 for it to count as an Improvement


In [None]:
#model.fit returns a Tensorflow History Object
#It contains a record of the progress of the Network during training in terms of the Loss and the Metrics

history = model.fit(x_train, y_train, epochs=40, validation_data=(x_test, y_test), verbose=2, callbacks=[early_stopping])

In [None]:
#The History Object is A Dictionary that contains information about Loss Fucntions and Metrics after each of the Epochs
df = pd.DataFrame(history.history)

df
#This is quite useful to check how the Training is going

###**Plot Training History** 

In [None]:
# Make a plot for the loss
#This Shows how the Loss Function decreases after Every Epoch

loss_plot = df.plot(y = 'loss', title = 'Loss vs Epochs', legend = False)
loss_plot.set(xlabel = 'Epochs', ylabel = 'Loss')

In [None]:
# Make a plot for the accuracy

acc_plot = df.plot(y = 'accuracy', title = 'accuracy vs epochs', legend = False)
acc_plot.set(xlabel = 'epochs', ylabel = 'accuracy')

###**Add MaxPooling Layers and create a CNN Model 2**

In [None]:
model = Sequential()
model.add(Conv1D(32, 2, activation='relu', input_shape = x_train[0].shape))
model.add(BatchNormalization())
model.add(MaxPool1D(2))
model.add(Dropout(0.2))

model.add(Conv1D(64, 2, activation='relu'))
model.add(BatchNormalization())
model.add(MaxPool1D(2))
model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(1, activation='sigmoid'))
#The output layer: Dense layer with 1 neuron. We are predicting a single value as this is a binary classification problem
#Using Sigmoid function because it exists between (0 to 1) and this facilitates us to predict a binary input

In [None]:
model.compile(optimizer=Adam(lr=0.0001), 
              loss = 'binary_crossentropy', 
              metrics=['accuracy'])

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(monitor='val_accuracy', patience = 5, min_delta = 0.01, mode='max')

#In this case, the training will terminate only if there is no improvement in the monitor performance measure for 5 epochs in a row
#0.01 means that the Validation_Accuracy has to improve by atleast 0.01 for it to count as an Improvement


In [None]:
history = model.fit(x_train, y_train, epochs=80, validation_data=(x_test, y_test), verbose=2, callbacks=[early_stopping])

In [None]:
#The History Object is A Dictionary that contains information about Loss Fucntions and Metrics after each of the Epochs
df = pd.DataFrame(history.history)

df
#This is quite useful to check how the Training is going

###**Plot the training history**


In [None]:
# Make a plot for the loss
#This Shows how the Loss Function decreases after Every Epoch

loss_plot = df.plot(y = 'loss', title = 'Loss vs Epochs', legend = False)
loss_plot.set(xlabel = 'Epochs', ylabel = 'Loss')

In [None]:
# Make a plot for the accuracy

acc_plot = df.plot(y = 'accuracy', title = 'accuracy vs epochs', legend = False)
acc_plot.set(xlabel = 'epochs', ylabel = 'accuracy')

###**Benefit of Using CNN for this project:**

- Better accuracy than traditional Machine Learning Algorithms such as SVM, Random Forest, etc
- Datasets	available	for	training	are	highly imbalanced,	with	the	number	of	fradulent	transactions	considerably	less	than the	other. Model could be biased towards the majority class. Oversampling	the	minority	class	is	one	approach	to	mitigate	this	problem	but	it	still	has	its	drawbacks
- Using a Deep Learning based CNN model may fare better and it did so in our case

