# Credit Card Fraud Detection using Single Perceptron Model

#### Author:  Ratan Madankumar Singh    
#### 3rd March, 2018

In this document we have implemented a single neuron model to detect creditcard fraud detection. This data has been taken from the www.kaggle.com. This underlying dataset contains 284807 rows and 30 attribbutes defining the characteristics of the transaction. The fraudulent transaction is depicted by 1 and non-fraudulent transaction is denoted by 0. The biggest advantage of this dataset is that it is cleaned and preprocessed and therefore can directly used for analysis. Before applying the machine learning model, Let's create some basic nut-bolts to implement the machine learning function.

In [9]:
import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt

def sigmoid(z):
    y_hat = 1.0/(1+np.exp(-z))
    return y_hat

def weightInit(n):
    W = np.random.random((1,n))*0.01
    b = 0
    return [W,b]

def forwardPropagation(W,X,b):
    z = np.dot(W,X)+b
    y_hat = sigmoid(z)
    return y_hat

def computeCost(y_hat,y,lambd,W):
    m = len(np.squeeze(y_hat))
    epsilon = 1e-8
    logprobs = np.multiply(np.log(y_hat+epsilon),y)+np.multiply(np.log(1-y_hat+epsilon),(1-y))
    cost = -np.sum(logprobs)/m
    return cost

    J = np.sum((y_hat-y)**2) + lambd*np.sum(np.array(W)**2)
    return J

def computeGradient(y_hat,y,X,lambd,W):
    m = len(np.squeeze(y_hat))
    dW = np.dot((y_hat-y),X.T)/m + lambd*np.array(W)/m
    db = np.sum(y_hat-y)/m
    return [dW,db]

def predictLogit(W,b,X):
    z = np.dot(W,X)+b
    y_hat = sigmoid(z)
    return y_hat

def accuracy(y_hat,y):
    y_hat = (y_hat > 0.5)*1.0
    acc = np.sum(y_hat == y)*100.0/len(np.squeeze(y))
    return(acc)

def initAdam(W,b):
    Sw = np.zeros(W.shape)
    Sb = 0
    Vw = np.zeros(W.shape)
    Vb = 0
    return [Sw,Sb,Vw,Vb]

def adam(W,b,dW,db,Sw,Sb,Vw,Vb,beta1,beta2,alpha,epsilon = 1e-8):
    Sw = beta1*Sw + (1-beta1)*np.square(dW)
    Sb = beta1*Sb + (1-beta1)*np.square(db)
    Vw = beta2*Vw + (1-beta2)*dW
    Vb = beta2*Vb + (1-beta2)*db
    W = W - alpha*Vw/(np.sqrt(Sw)+epsilon)
    b = b - alpha*Vb/(np.sqrt(Sb)+epsilon)
    return [W,b,Sw,Sb,Vw,Vb]

Our data is stored in my local machine. Let's load it and shuffle it randomly. After random shuffling, I am splitting it into training and testing dataset with 200000 observations in training dataset and 84807 rows in test dataset. To optimize the cost function, I have used Adam Algorithm which converges faster than the batch gradient descent algorithm and does not slow down in horse-saddle region 

In [10]:
DataFrame = pd.read_csv('C:\\Users\\Ratan Singh\\Documents\\R Markdown Files\\credit card fraud\\creditcard.csv')
print("Dimensions of file is "+str(DataFrame.shape[0])+" rows and "+str(DataFrame.shape[1])+" columns" )

order = np.random.permutation(DataFrame.shape[0])
DataFrame = DataFrame.iloc[order,:]
cutOff = 200000

X_trainData = DataFrame.iloc[range(0,cutOff),range(0,30)].T
X_testData = DataFrame.iloc[range(cutOff,DataFrame.shape[0]),range(0,30)].T

Y_trainData = (DataFrame.iloc[range(0,cutOff),30].T).values.reshape(1,cutOff)
Y_testData = (DataFrame.iloc[range(cutOff,DataFrame.shape[0]),30].T).values.reshape(1,DataFrame.shape[0]-cutOff)

Y_trainData = (Y_trainData > 0)*1.0
Y_testData = (Y_testData > 0)*1.0

num_iter = 15000
lambd = 0.1
alpha = 0.1
beta1 = 0.99
beta2 = 0.9

J = []
[W,b] = weightInit(30)

[Sw,Sb,Vw,Vb] = initAdam(W,b)


for i in range(0,num_iter):
    
    y_hat = forwardPropagation(W,X_trainData,b)
    if(i%1000 == 0):
        print("Iteration Number - "+str(i)+ " Cost::"+str(computeCost(y_hat,Y_trainData,lambd,W)))
        J.append(computeCost(y_hat,Y_trainData,lambd,W))
            
    [dW,db] = computeGradient(y_hat,Y_trainData,X_trainData,lambd,W)
    [W,b,Sw,Sb,Vw,Vb] = adam(W,b,dW,db,Sw,Sb,Vw,Vb,beta1,beta2,alpha)
        
print("Training the model is completed ::")

Dimensions of file is 284807 rows and 31 columns
Iteration Number - 0 Cost::18.235161709
Iteration Number - 1000 Cost::0.0350230249403
Iteration Number - 2000 Cost::0.0425708822919
Iteration Number - 3000 Cost::0.0182366973248
Iteration Number - 4000 Cost::0.0332493187622
Iteration Number - 5000 Cost::0.0316415030608
Iteration Number - 6000 Cost::0.0246549501354
Iteration Number - 7000 Cost::0.017996047407
Iteration Number - 8000 Cost::0.0331716918875
Iteration Number - 9000 Cost::0.0314965932636
Iteration Number - 10000 Cost::0.0192749779762
Iteration Number - 11000 Cost::0.0317312487701
Iteration Number - 12000 Cost::0.0198944613009
Iteration Number - 13000 Cost::0.0327106688306
Iteration Number - 14000 Cost::0.0299536099718
Training the model is completed ::


Here we are getting Cost as NaN as the cost is very low during each iteration.

In [11]:
y_hat = predictLogit(W,b,X_testData)
print("Accuracy due to Adam :: "+str(accuracy(y_hat,Y_testData)))

Accuracy due to Adam :: 99.7087504569


In [12]:
plt.plot(np.squeeze(J))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(alpha))
plt.show()

Thus we can observe that our model has an accuracy of 99.7%  on the test dataset. This all is attributed to the cleaned data that was given in dataset.