# Challenge - Credit Card Fraud Detection

![neurons](https://www.gannett-cdn.com/media/USATODAY/USATODAY/2013/02/05/afp-517027843-16_9.jpg?width=3200&height=1680&fit=crop)

## Background information

Many credit bank companies and banks are aiming at developing sophisticated algorithms to recognize fraudulent credit card transactions. For this task, we will look at a reduced dataset containing transaction time, amount and not-described variables (due to confidentiality issues).

Your task is to build a working neural network (**from scratch**) that would predict fraudulent transactions.


## Data

For this week's challenge, we again are going to use the Kaggle InClass competition link of which you can find [here](https://www.kaggle.com/c/ucl-ai-society-card-fraud-detection/data).

As in the previous InClass competition, you will need to train your model on the ```train.csv``` dataset, while the submission file should be produced from the ```test.csv```.

As you might notice, some of the bits (**including the model**) are already written for you to reduce time. Therefore, just by following this notebook, you should be able to produce a decent submission file. Although, feel free to change the activation functions, number of layers and neurons, data splitting  or standardization functions.

In [None]:
#Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import RobustScaler, StandardScaler
from math import log2

In [None]:
#Read train data
PATH = 
data = 

### EDA

Similar to the previous challenges, we are first going to start with exploratory data analysis. As usual, we will have a look at the head values of our data, check whether there are any Null values and do some initial visualization

In [None]:
#Have a look at top dataset values


In [None]:
#Look the columns


In [None]:
#Check if there are any null values


In [None]:
#Check the number of values for each category (1 - fraudulent, 0 - non-fraudulent)


As it can be seen from the cell above, the number of fraudulent and non-fraudulent data cases is not equal (the original dataset had even larger difference). Keep this in mind when choosing splitting function.

Now, let's plot some initial graphs to observe any correlation.

In [None]:
#Just run this cell
plt.figure(figsize=(20,10))
sns.pairplot(data[["Time", "V2", "Amount"]])

As we can seen, some of the graphs (*Time-V2*, *Time-Amount*) have a linear relationship except for a few outliers. As the fraudulent cases will most probably be outliers, let's look at one of these graphs more in depth.

In [None]:
#Just run this cell
plt.figure(figsize=(20,5))
sns.scatterplot(data["Time"], data["V2"], hue = data["Class"])

## Preprocessing

Before building our model, we first need to do some data preprocessing. First, we will need to extract the features and labels. In addition, we are going to drop some of the less relevant features that could distort the model training (you can play around with the categories).

In [None]:
#Extracting features-labels and dropping some of the columns
X = 
y = 

As it was discussed at the start of the lecture, the data scaling can be very beneficial especially when dealing with outlier and large number of features.

In [None]:
#Scaling features (use RobustScaler)
X = 

Finally, convert your features and labels to numpy arrays.

In [None]:
#Convert labels to an array and expand dimensions in axis = -1
y = 

### Dimensional reduction

As we saw in the EDA section, our data contains 28 features that are not very informative. Logically, we will need to reduce the size of our feature dataset which can be achieved with PCA.

In [None]:
#Pass features through PCA
pca = PCA(2)
X = pca.fit_transform(X)

### Data splitting

Finally, split the dataset into train and test data.

In [None]:
#Split data


Before building our model, we first need to transpose our train and test arrays (to exchange the dimensions).

In [None]:
#Transpose splitted arrays


## Creating model

Finally, we can build our neural network. To make everything easier to use, the whole model is built as a class object. Since the problem itself is quite complex, we are going to use multiple dense layers and nodes. For the activation functions, the hidden layers will have the *relu* function, while for the last layer, we are going to use *sigmoid* function.

As we are dealing with the categorical inbalanced dataset, we are going to use *cross entropy loss*.

**You will only need to fill commented fields**. In other words, your code should go in every ___.

In [None]:
class DenseLayer:
    def __init__(self, input_dimension, units, activation='', multiplier=0.01):
        
        self.weights, self.bias = self.initialize(input_dimension, units, multiplier)
        
        if activation == 'sigmoid':
            self.activation = activation
            self.activation_forward = self.sigmoid
            self.activation_backward = self.sigmoid_grad
        elif activation == 'relu':
            self.activation = activation
            self.activation_forward = self.relu
            self.activation_backward = self.relu_grad

    def initialize(self, input_size, nodes, multiplier):
        #Initialize weights and biases
        weights = multiplier * np.random.randn(___, ___)
        bias = np.zeros([___, 1])
        return weights, bias
    
    def sigmoid(self, Z):
        
        #Define sigmoid function
        A = ___
        return A
        
    def sigmoid_grad(self, dA):

        s = 1 / (1 + np.exp(-self.prevZ))
        dZ = dA * s * (1 - s)
        return dZ
    
    
    def relu(self, Z):
        
        #Define relu function
        
        A = ___
        return A
        
    def relu_grad(self, dA):

        s = np.maximum(0, self.prevZ)
        dZ = (s>0) * 1 * dA
        return dZ 
        
    
    def forward(self, A):

        Z = np.dot(self.weights, A) + self.bias
        self.prevZ = Z
        self.prevA = A
        A = self.activation_forward(Z)
        return A
    
    
    def backward(self, dA):

        dZ = self.activation_backward(dA)
        m = self.prevA.shape[1]
        self.dW = 1 / m * np.dot(dZ, self.prevA.T)
        self.db = 1 / m * np.sum(dZ, axis=1, keepdims=True)
        prevdA = np.dot(self.weights.T, dZ)
        return prevdA
    
    
    def update(self, lr):

        self.weights = self.weights - lr * self.dW
        
        #Similarly to the weights update, define next bias value
        self.___ = self.___ - lr * self.___

        
    def output_dimension(self):

        return len(self.bias)


class NeuralNetwork:
    
    def __init__(self, multiplier = 0.01):

        self.layers=[]
        self.multiplier = multiplier
        self.loss_function = self.cross_entropy_loss
        self.loss_backward = self.cross_entropy_loss_grad
        
    def add_layer(self, input_dimension=None, units=1, activation=''):
        
        if (input_dimension is None):
            input_dimension=self.layers[-1].output_dimension()
        layer = DenseLayer(input_dimension, units, activation, multiplier= self.multiplier)
        self.layers.append(layer)

    def cross_entropy_loss(self, Y, A, epsilon=1e-15):

        m = Y.shape[1]
        loss = -1 * (Y * np.log(A + epsilon) + (1 - Y) * np.log(1 - A + epsilon))
        cost = 1 / m * np.sum(loss)
        return np.squeeze(cost)
            
    def cross_entropy_loss_grad(self, Y, A):

        dA = -(np.divide(Y, A) - np.divide(1 - Y, 1 - A))
        return dA

    
    def cost(self, Y, A):

        return self.loss_function(Y, A)

        
    def forward(self, X):

        x = np.copy(X)
        for layer in self.layers:
            x = layer.forward(x)
        return x
                
    def backward(self, A, Y):

        dA = self.loss_backward(Y, A)
        for layer in reversed(self.layers):
            dA = layer.backward(dA)
    
    
    def update(self, lr=0.03):

        for layer in self.layers:
            layer.update(lr)
    

As we have defined our model, now it's time to actually use it. Try different layer and node combinations for the maximum accuracy.

In [None]:
#Extracting input dimension
input_size = ___

#Creating model object
model = NeuralNetwork()

#Adding layers
model.add_layer(input_dimension=___, units=___, activation=___)

#Adding final layer
model.add_layer(units=_, activation=___)

After creating and optimizing the model, test it by running the cell below.

In [None]:
#State the number of iterations
num_iterations = ___

def round_value(A):
    return np.uint8(A > 0.5)

def accuracy(yhat, Y):
    return round(np.sum(yhat==Y) / len(yhat.flatten()) * 1000) / 10

for idx in range(1, num_iterations+1):
    A = model.forward(X_train)
    model.backward(A, y_train)
    model.update(lr=0.03)
    if idx % 5 == 0:
        yhat = round_value(A)
        print('cost:', model.cost(y_train, A), f'\taccuracy: {accuracy(yhat, y_train)}%')