# Credit Card Fraud Detection

## Hyperparameter Tuning

In the Model Selection Stage multiple classification algorithms were tested. I employed more "classic" models as well as those that are often used for anomaly detection, which this problem could be framed as.

After some exploration the Random Forest and MLP showed good recall, which is the important metric here, as I am trying to minimize False Negatives. This did not came at the expense of precision. Some algorithms had fewer False Negatives but a large number of False Positives. I did not choose them, as tuning toward even better recall, thus less False Negatives would have driven up the number of Flase Positives considerably.

In [1]:
# import the needed libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#sklearn will be imported partially when necessary

#read in the data and make a copy of it in case anything goes wrong
path = "C:/Users/ms101/OneDrive/datasets"
credit_data = pd.read_csv(path + "/creditcard.csv")

data = credit_data.copy()

data.drop_duplicates(inplace = True)
data.reset_index(inplace = True, drop = True)
assert data.shape == (283726,31)#check if duplicates are removed correctly
data_red = data[["V17","V14","V12","V10","V16","V3","V7","V11","Class"]].copy()

In [2]:
data_red.to_csv(path + "/creditcard_reduced.csv")

In [3]:
X = data_red.drop("Class", axis = 1)
y = data_red["Class"]
assert X.shape == (283726,8)
assert y.shape == (283726,)

In [4]:
from sklearn.model_selection import StratifiedShuffleSplit

strat_split = StratifiedShuffleSplit(test_size = 0.2, random_state = 13)
for train_index, test_index in strat_split.split(X,y):
    X_train, X_test = X.loc[train_index], X.loc[test_index]
    y_train, y_test = y.loc[train_index], y.loc[test_index]

In [5]:
# import the metrics
from sklearn.metrics import confusion_matrix, precision_score, recall_score, precision_recall_curve


In [6]:
# Scaling the data
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [7]:
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

### Tuning the Random Forest Classifier

I will start with the Random Forest classifier.

In [None]:
from sklearn.ensemble import RandomForestClassifier

forest_clf