
# Detecting Fraud Online Payments Using Ensembling Machine Learning

## Objective:
### In this project I used Machine learning models to detect fraudlent behavior.
#### The Models used were: Logistic Regression, Decision Tree Classifier, and KNeighbor Classifier 

We propose a general ensemble-based machine-learning detector that enables the security system to detect fraudlent behavior. To do that, we first train several machine learning models. Then, we use the best-performing models and use them in our ensemble-based detector.


###### Authors: Omar Mohamed Abdelsalam, and Magdy Abdullah Eissa


In [150]:
#Importing Libraries
import random
import pandas as pd 
from sklearn import model_selection
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import VotingClassifier
from sklearn import datasets
plt.style.use("ggplot")

#Importing Data
data = pd.read_csv('/Users/abdel/OneDrive/Desktop/Projects/Detecting Online Payment Fraud 11-22-2022/credit card.csv')
print(data.head)

<bound method NDFrame.head of          step      type      amount     nameOrig  oldbalanceOrg  \
0           1   PAYMENT     9839.64  C1231006815      170136.00   
1           1   PAYMENT     1864.28  C1666544295       21249.00   
2           1  TRANSFER      181.00  C1305486145         181.00   
3           1  CASH_OUT      181.00   C840083671         181.00   
4           1   PAYMENT    11668.14  C2048537720       41554.00   
...       ...       ...         ...          ...            ...   
6362615   743  CASH_OUT   339682.13   C786484425      339682.13   
6362616   743  TRANSFER  6311409.28  C1529008245     6311409.28   
6362617   743  CASH_OUT  6311409.28  C1162922333     6311409.28   
6362618   743  TRANSFER   850002.52  C1685995037      850002.52   
6362619   743  CASH_OUT   850002.52  C1280323807      850002.52   

         newbalanceOrig     nameDest  oldbalanceDest  newbalanceDest  isFraud  \
0             160296.36  M1979787155            0.00            0.00        0   
1  

In [151]:
print(data.isnull().sum())

step              0
type              0
amount            0
nameOrig          0
oldbalanceOrg     0
newbalanceOrig    0
nameDest          0
oldbalanceDest    0
newbalanceDest    0
isFraud           0
isFlaggedFraud    0
dtype: int64


In [152]:
print(data.type.value_counts())

CASH_OUT    2237500
PAYMENT     2151495
CASH_IN     1399284
TRANSFER     532909
DEBIT         41432
Name: type, dtype: int64


In [153]:
type = data["type"].value_counts()
transactions = type.index
quantity = type.values

import plotly.express as px
figure = px.pie(data, 
             values=quantity, 
             names=transactions,hole = 0.5, 
             title="Distribution of Transaction Type")
figure.show()

In [154]:
# Checking correlation
correlation = data.corr()
print(correlation["isFraud"].sort_values(ascending=False))

isFraud           1.000000
amount            0.076688
isFlaggedFraud    0.044109
step              0.031578
oldbalanceOrg     0.010154
newbalanceDest    0.000535
oldbalanceDest   -0.005885
newbalanceOrig   -0.008148
Name: isFraud, dtype: float64


In [155]:
data["type"] = data["type"].map({"CASH_OUT": 1, "PAYMENT": 2, 
                                 "CASH_IN": 3, "TRANSFER": 4,
                                 "DEBIT": 5})
data["isFraud"] = data["isFraud"].map({0: "No Fraud", 1: "Fraud"})
print(data.head())


   step  type    amount     nameOrig  oldbalanceOrg  newbalanceOrig  \
0     1     2   9839.64  C1231006815       170136.0       160296.36   
1     1     2   1864.28  C1666544295        21249.0        19384.72   
2     1     4    181.00  C1305486145          181.0            0.00   
3     1     1    181.00   C840083671          181.0            0.00   
4     1     2  11668.14  C2048537720        41554.0        29885.86   

      nameDest  oldbalanceDest  newbalanceDest   isFraud  isFlaggedFraud  
0  M1979787155             0.0             0.0  No Fraud               0  
1  M2044282225             0.0             0.0  No Fraud               0  
2   C553264065             0.0             0.0     Fraud               0  
3    C38997010         21182.0             0.0     Fraud               0  
4  M1230701703             0.0             0.0  No Fraud               0  


In [156]:
# splitting the data
from sklearn.model_selection import train_test_split
x = np.array(data[["type", "amount", "oldbalanceOrg", "newbalanceOrig"]])
y = np.array(data[["isFraud"]])

In [157]:
 #Training the machine learning model

xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10, random_state=42)
model = DecisionTreeClassifier()
model.fit(xtrain, ytrain)
print('Model 1')
print(model.score(xtest, ytest))

#from sklearn.ensemble import RandomForestClassifier
#xtrain,xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10, random_state=42)
#model_2 = RandomForestClassifier()
#model_2.fit(xtrain, ytrain)
#print('Model 2' ) 
#print(model_2.score(xtest, ytest))



xtrain,xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10, random_state=42)
model_2 =LogisticRegression()
model_2.fit(xtrain, ytrain)
print('Model 2' ) 
print(model_2.score(xtest, ytest))




xtrain,xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10, random_state=42)
model_3 = KNeighborsClassifier()
model_3.fit(xtrain, ytrain)
print('Model 3' ) 
print(model_3.score(xtest, ytest))





Model 1
0.9997391011878755



A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().



Model 2
0.9995049209287997



A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().



Model 3






0.9996652322470931


In [159]:
# prediction
#features = [type, amount, oldbalanceOrg, newbalanceOrig]
features = np.array([[1, 181, 181, 0]])
model.fit(xtrain, ytrain)
model2.fit(xtrain, ytrain)
model3.fit(xtrain, ytrain)

print(model.predict(features))

print(model_2.predict(features))

print(model_3.predict(features))



A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().


A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().



['Fraud']
['Fraud']
['No Fraud']






In [162]:
estimators = []
model = DecisionTreeClassifier(); estimators.append(("Decison Tree", model))
#model2 = RandomForestClassifier(); estimators.append(("Random Forest", model2))
model2 = LogisticRegression(); estimators.append(("Logistic Regression", model2))
model3 = KNeighborsClassifier(); estimators.append(("K Neighbor", model3))

ensemble = VotingClassifier(estimators)
results = model_selection.cross_val_score(ensemble, xtrain, ytrain, )
print(results.mean())
#Voting  Testing 
ensemble.fit(xtrain, ytrain)
print(ensemble.predict(features))


A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().


A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().




A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().


A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().




A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().


A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().




A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().


A column-vector y was passed when a 1d array was

0.999716049884827



A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().


A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().



['Fraud']






# Finish


## Credit
Aman Kharwal (https://thecleverprogrammer.com/2022/02/22/online-payments-fraud-detection-with-machine-learning/)
