Fraud is one of the major issues we come up majorly in banks, life insurance, health insurance, and many others. These major frauds are dependent on the person who is trying to sell us the fake products or services. 

If we are matured enough to decide what is wrong then we will never get into any fraud transactions. But one such fraud that has been increasing a lot these days is fraud in making payments. In this file, we will find out a solution to fraud detection with machine learning.

The dataset we are using here is a transaction data for online purchases collected from an e-commerce retailer. 
* The dataset contains more than **39000** transactions, 
* Each transaction contains 5 features that will describe the nature of the transactions. 

So let’s start with importing all the necessary libraries we need for Fraud Detection

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

In [3]:
# from google.colab import files
# uploaded = files.upload()
df = pd.read_csv('payment_fraud.csv')
df.head()

Unnamed: 0,accountAgeDays,numItems,localTime,paymentMethod,paymentMethodAgeDays,label
0,29,1,4.745402,paypal,28.204861,0
1,725,1,4.742303,storecredit,0.0,0
2,845,1,4.921318,creditcard,0.0,0
3,503,1,4.886641,creditcard,0.0,0
4,2000,1,5.040929,creditcard,0.0,0


In [4]:
# Split dataset up into train and test sets
X_train, X_test, y_train, y_test = train_test_split(df.drop('label', axis=1), df['label'],test_size=0.33, random_state=17)

As this is a problem of binary classification, we will use **Logistic Regression** algorithm, as it is one of the most powerful algorithms for a binary classification model. 

Let’s simply train fraud detection model using logistic regression algorithm and have a look at the accuracy score that we will get by using this algorithm:

In [6]:
clf = LogisticRegression().fit(X_train.drop("paymentMethod", axis = 1), y_train)

# Make predictions on test set
y_pred = clf.predict(X_test.drop("paymentMethod", axis = 1))
from sklearn.metrics import accuracy_score
print(accuracy_score(y_pred, y_test))

1.0


Our fraud detection model gave an accuracy of **100 percent** by using the **logistic regression** algorithm.

### Evaluating the Fraud Detection Model

Now, let’s evaluate the performance of our model. We will use the **confusion matrix** algorithm to evaluate the performance of our model. We can use the confusion matrix algorithm with a one-line code only:

In [7]:
# Compare test set predictions with ground truth labels
print(confusion_matrix(y_test, y_pred))

[[12753     0]
 [    0   190]]


So out of all the transaction in the dataset,
* **190 transactions** are correctly recognized as fraud, and 
* **12753 transactions** are recognized as not fraudulent transactions.