## Data Camp Project

# Credit Card Fraud Detection 

Authors:
  * Yuhe Bai
  * Jérôme Bonnin
  * Louis Mercier
  * Dan Allouche
  * Lotfi Kobrosly

## Table of contents

The number of credit card transactions are increasing in line
with technological developments and the rise of e-commerce. Although credit card payments facilitate all kinds of business activities, credit card fraud is a significant problem.

Credit card fraud not only brings huge economic losses to
financial institutions and banks, but also trouble and stress
to the lives of individuals who are affected. Recent statistics
show that, in 2018 the global economic loss caused by credit
card fraud was 27.85 billion dollars, an increase of 16.2%
compared with 23.97 billion dollars in 2017. If this trend
continues, by 2023 the economic losses caused by credit card
fraud will exceed 35 billion dollars...

The ratio of fraudulent transactions to normal transactions is approximately 0,006%
worldwide. Although this rate may seem insignificant, every
fraudulent transaction hurts the reputation of banks. For this
reason, banks are investing in fraud detection. The number of fraudulent activities and their methods increases and
changes every day. It is very difficult and costly to detect
fraudulent activities only by examining the transactions. Fast
and accurate fraud detection is crucial to maintain customer
satisfaction and trust. Therefore, banks need to identify these
transactions as quickly as possible and in the least harmful
way for the customer.


(Today, fraudulent activities using social engineering are
predominantly performed through Internet. Malware and
phishing methods are engineered for this purpose. Most popular types of fraud include customer information altering
through call center and branches, ATM fraud, credit card
application fraud, card account theft, lost-stolen, fake credit
cards and card duplication [3].)


The design of efficient fraud detection algorithms is key for reducing these losses, and more and more algorithms rely on advanced machine learning techniques to assist fraud investigators. 
In addition effective fraud detection applications can increase customer confidence and
reduce customer complaints. Most credit card fraud detection
approaches make use of machine learning, espetially supervised learning methods.

However, in
credit fraud situations, the number of positive (fraudulent)
cases is much smaller than the number of negative cases.
This creates a problem of imbalanced classification, where
one class is very much smaller than the other class.

## Imports

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [8]:
data = pd.read_csv('creditcard.csv').drop(columns=['Time'])

Researchers have proposed many measurement metrics to
detect the performance of imbalanced classification models.
These metrics include recall rate [17], [38], specificity [17],
precision [17], [38], F-measure [17], [38], and accuracy. In
this paper, we focus on two indicators: precision and Fmeasure. In credit card fraud, these are considered the most
important indicators. Precision is the ratio of the number of
real positive cases and the number of predicted positive cases.
In the detection of credit card fraud, the first goal is to provide
the maximum truth, that is, the highest precision. In practice,
a false alarm can lead to a poor customer experience, and
potentially lead to the loss of customers, so the precision
of the model is very important, and is considered the most
important indicator of the fraud system. On the other hand,
although precision can be increased by reducing the model’s
recall rate, it is impossible to improve the precision by
reducing the recall rate without limitation. Recall rate is
the ratio of the number of predicted positive cases to the
number of all real positive cases. Real fraud problems have
real negative economic implications to the enterprise, so the
recall rate is also worthy of our attention. Another indicator
is the F-measure, which takes values on the range [0,1].
The F-measure captures both the precision and recall rate,
measuring improvements in both indicators simultaneously.
In machine learning, this index is often used to evaluate the
advantages and disadvantages of various algorithms, because
it can evaluate the precision and recall rate in combination.
Accuracy is a commonly used indicator, which represents the
proportion between the number of correctly classified cases
and the total number of cases. If the accuracy of a model
is too low, it cannot be applied in practice. Specificity is
the ratio of the number of predicted negative cases predicted
to be negative to the number of real negative cases. In the
experiment, we consider these five indicators to measure the
performance of the model.
