## Credit Card Fraud Detection Project
### Project Summary
This project aims to train a classifier that can robustly detect fraudulent credit card transactions
### Dataset
An open source dataset containing credit card transactions from European cardholders in September 2013.

### Notes:
- The dataset is unbalanced and only contains 0.172% fraudulent transactions


- The dataset is anonymised and transformed into PCA data, original features are confidential

- Features V1, ..., V28 are PCA

- Users are recommended to use Area Under the Precision-Recall Curve (AUPRC) to improve accuracy due to issues from the imbalanced dataset  

### Acknowledgements
This project was completed with help from a tutorial by Pranjal Saxena found __[here](https://towardsdatascience.com/credit-card-fraud-detection-using-machine-learning-python-5b098d4a8edc)__

### 1. Necessary library imports


In [4]:
#Packages related to general operating system & warnings
import os 
import warnings
warnings.filterwarnings('ignore')
#Packages related to data importing, manipulation, exploratory data #analysis, data understanding
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
from termcolor import colored as cl # text customization
#Packages related to data visualizaiton
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
#Setting plot sizes and type of plot
plt.rc("font", size=14)
plt.rcParams['axes.grid'] = True
plt.figure(figsize=(6,3))
plt.gray()
from matplotlib.backends.backend_pdf import PdfPages
import sklearn
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn import metrics
from sklearn.impute import MissingIndicator, SimpleImputer
from sklearn.preprocessing import  PolynomialFeatures, KBinsDiscretizer, FunctionTransformer
from sklearn.preprocessing import StandardScaler, MinMaxScaler, MaxAbsScaler
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, LabelBinarizer, OrdinalEncoder
import statsmodels.formula.api as smf
import statsmodels.tsa as tsa
from sklearn.linear_model import LogisticRegression, LinearRegression, ElasticNet, Lasso, Ridge
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor, export_graphviz, export_text
from sklearn.ensemble import BaggingClassifier, BaggingRegressor,RandomForestClassifier,RandomForestRegressor
from sklearn.ensemble import GradientBoostingClassifier,GradientBoostingRegressor, AdaBoostClassifier, AdaBoostRegressor 
from sklearn.svm import LinearSVC, LinearSVR, SVC, SVR
from xgboost import XGBClassifier
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix

# Dataset import
data = pd.read_csv("creditcard.csv")

<Figure size 600x300 with 0 Axes>

### 2. Investigating the distribution of data (bisaed towards clean transactions)

In [8]:
Total_transactions = len(data)
normal = len(data[data.Class == 0])
fraudulent = len(data[data.Class == 1])
fraud_percent = round(fraudulent/normal*100, 2)
print(cl('Total num of transactions: {}'.format(Total_transactions)))
print(cl('Total num of normal transactions: {}'.format(normal)))
print(cl('Total num of fraudulent transactions: {}'.format(fraudulent)))
print(cl('Percentage of fraudulent transactions: {}'.format(fraud_percent)))


Total num of transactions: 284807[0m
Total num of normal transactions: 284315[0m
Total num of fraudulent transactions: 492[0m
Percentage of fraudulent transactions: 0.17[0m
