<a href="https://colab.research.google.com/github/div-yash/Data-Science-Projects/blob/main/Project_Online_Fraud_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Nowadays the fraud are increasing day by day ....the reason is there are loopholes

**Algorithm Mastery**: I developed a fraud detection system using Python, experimenting with various algorithms including Decision Trees, Random Forests, Gradient Boosting, and Neural Networks. Each algorithm brought unique strengths to detecting fraudulent activities.

**Model Building**: Central to my project was constructing robust machine-learning models. I carefully prepared and split the dataset to balance fraud and non-fraud cases. Through hyperparameter optimization, algorithm fine-tuning, and cross-validation, I achieved reliable and accurate results.

**Feature Engineering**: I transformed raw data into valuable features, such as transaction time, amounts, and geographical information, which significantly enhanced the model's ability to distinguish genuine transactions from fraudulent ones.

**Data Integrity**: Ensuring clean data was critical. I addressed noisy data, handled missing values, and normalized features to eliminate biases, providing a solid foundation for the fraud detection system.

**Challenges Overcome**: Tackling imbalanced datasets, choosing suitable evaluation metrics, and fine-tuning algorithms for optimal performance were some of the challenges I faced and overcame.

**Ethical Considerations**: Balancing security with user privacy was a key ethical consideration. I aimed to create a system that effectively detected fraud while respecting user confidentiality.

**Continuous Learning**: This project required continuous learning to stay updated on new fraud patterns and machine learning advancements, ensuring the system remained effective against emerging threats.

Online fraud is a significant concern in today's digital world. Combining my passion for machine learning and cybersecurity, I am excited to contribute to creating a safer online environment.


In [None]:
#importing libraries :
import pandas as pd
import numpy as np
import seaborn as sns # for graphs charts
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split


In [None]:
df=pd.read_csv('/content/PS_20174392719_1491204439457_log.csv')

In [None]:
df.head()

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0.0,0.0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0.0,0.0
2,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,1.0,0.0
3,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,1.0,0.0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0.0,0.0


In [None]:
df.head(10)

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0.0,0.0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0.0,0.0
2,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,1.0,0.0
3,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,1.0,0.0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0.0,0.0
5,1,PAYMENT,7817.71,C90045638,53860.0,46042.29,M573487274,0.0,0.0,0.0,0.0
6,1,PAYMENT,7107.77,C154988899,183195.0,176087.23,M408069119,0.0,0.0,0.0,0.0
7,1,PAYMENT,7861.64,C1912850431,176087.23,168225.59,M633326333,0.0,0.0,0.0,0.0
8,1,PAYMENT,4024.36,C1265012928,2671.0,0.0,M1176932104,0.0,0.0,0.0,0.0
9,1,DEBIT,5337.77,C712410124,41720.0,36382.23,C195600860,41898.0,40348.79,0.0,0.0


In [None]:
df.columns


Index(['step', 'type', 'amount', 'nameOrig', 'oldbalanceOrg', 'newbalanceOrig',
       'nameDest', 'oldbalanceDest', 'newbalanceDest', 'isFraud',
       'isFlaggedFraud'],
      dtype='object')

In [None]:
df.shape


(83561, 11)

In [None]:
#what type of data in columns ?
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 83561 entries, 0 to 83560
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   step            83561 non-null  int64  
 1   type            83561 non-null  object 
 2   amount          83561 non-null  float64
 3   nameOrig        83561 non-null  object 
 4   oldbalanceOrg   83560 non-null  float64
 5   newbalanceOrig  83560 non-null  float64
 6   nameDest        83560 non-null  object 
 7   oldbalanceDest  83560 non-null  float64
 8   newbalanceDest  83560 non-null  float64
 9   isFraud         83560 non-null  float64
 10  isFlaggedFraud  83560 non-null  float64
dtypes: float64(7), int64(1), object(3)
memory usage: 7.0+ MB


In [None]:

df['step'].unique()


array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [None]:
# to check the null value
df.isnull()

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...
83556,False,False,False,False,False,False,False,False,False,False,False
83557,False,False,False,False,False,False,False,False,False,False,False
83558,False,False,False,False,False,False,False,False,False,False,False
83559,False,False,False,False,False,False,False,False,False,False,False


In [None]:
df.isnull().sum()

step              0
type              0
amount            0
nameOrig          0
oldbalanceOrg     1
newbalanceOrig    1
nameDest          1
oldbalanceDest    1
newbalanceDest    1
isFraud           1
isFlaggedFraud    1
dtype: int64

In [None]:
df.shape

(83561, 11)

In [None]:
df['type'].unique()

array(['PAYMENT', 'TRANSFER', 'CASH_OUT', 'DEBIT', 'CASH_IN'],
      dtype=object)

In [None]:
type= df['type'].value_counts()

In [None]:
transaction=type.index

In [None]:
quantity=type.values

In [None]:
import plotly.express as px

In [None]:
px.pie(df,values=quantity,names=transaction,hole=0.0,title="Distribution of Transacton type ")

In [None]:
df=df.dropna()

In [None]:
df

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.00,0.00,0.0,0.0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.00,0.00,0.0,0.0
2,1,TRANSFER,181.00,C1305486145,181.0,0.00,C553264065,0.00,0.00,1.0,0.0
3,1,CASH_OUT,181.00,C840083671,181.0,0.00,C38997010,21182.00,0.00,1.0,0.0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.00,0.00,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...
83555,10,CASH_OUT,14895.17,C214279684,51759.0,36863.83,C1298314970,979963.09,994858.25,0.0,0.0
83556,10,PAYMENT,7705.70,C1834114901,96490.0,88784.30,M1214836727,0.00,0.00,0.0,0.0
83557,10,CASH_OUT,319045.01,C1964329082,56471.0,0.00,C699133054,0.00,319045.01,0.0,0.0
83558,10,CASH_IN,249169.96,C1421944154,3481.0,252650.96,C790672270,38177.07,0.00,0.0,0.0


In [None]:
df.replace(to_replace=['PAYMENT', 'TRANSFER', 'CASH_OUT', 'DEBIT', 'CASH_IN'],value=[2,4,1,5,3],inplace=True)

In [None]:
type

type
PAYMENT     33529
CASH_OUT    25156
CASH_IN     16818
TRANSFER     7192
DEBIT         866
Name: count, dtype: int64

In [None]:
df

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,2,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.00,0.00,0.0,0.0
1,1,2,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.00,0.00,0.0,0.0
2,1,4,181.00,C1305486145,181.0,0.00,C553264065,0.00,0.00,1.0,0.0
3,1,1,181.00,C840083671,181.0,0.00,C38997010,21182.00,0.00,1.0,0.0
4,1,2,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.00,0.00,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...
83555,10,1,14895.17,C214279684,51759.0,36863.83,C1298314970,979963.09,994858.25,0.0,0.0
83556,10,2,7705.70,C1834114901,96490.0,88784.30,M1214836727,0.00,0.00,0.0,0.0
83557,10,1,319045.01,C1964329082,56471.0,0.00,C699133054,0.00,319045.01,0.0,0.0
83558,10,3,249169.96,C1421944154,3481.0,252650.96,C790672270,38177.07,0.00,0.0,0.0


In [None]:
df['isFraud']=df['isFraud'].map({0:'No Fraud',1:'Fraud'})

In [None]:
df

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,2,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.00,0.00,No Fraud,0.0
1,1,2,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.00,0.00,No Fraud,0.0
2,1,4,181.00,C1305486145,181.0,0.00,C553264065,0.00,0.00,Fraud,0.0
3,1,1,181.00,C840083671,181.0,0.00,C38997010,21182.00,0.00,Fraud,0.0
4,1,2,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.00,0.00,No Fraud,0.0
...,...,...,...,...,...,...,...,...,...,...,...
83555,10,1,14895.17,C214279684,51759.0,36863.83,C1298314970,979963.09,994858.25,No Fraud,0.0
83556,10,2,7705.70,C1834114901,96490.0,88784.30,M1214836727,0.00,0.00,No Fraud,0.0
83557,10,1,319045.01,C1964329082,56471.0,0.00,C699133054,0.00,319045.01,No Fraud,0.0
83558,10,3,249169.96,C1421944154,3481.0,252650.96,C790672270,38177.07,0.00,No Fraud,0.0


In [None]:
x=df[['type','amount','oldbalanceOrg','newbalanceOrig']]

In [None]:
y=df.iloc[:,-2]

In [None]:
y

0        No Fraud
1        No Fraud
2           Fraud
3           Fraud
4        No Fraud
           ...   
83555    No Fraud
83556    No Fraud
83557    No Fraud
83558    No Fraud
83559    No Fraud
Name: isFraud, Length: 83560, dtype: object

In [None]:
from sklearn.tree import DecisionTreeClassifier


In [None]:
model=DecisionTreeClassifier()

In [None]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)

In [None]:
model.fit(x_train,y_train)

In [None]:
model.score(x_test,y_test)

0.9985639061752034

In [None]:
model.predict([[2,9800,170136,160296]])


X does not have valid feature names, but DecisionTreeClassifier was fitted with feature names



array(['No Fraud'], dtype=object)

In [None]:
x


Unnamed: 0,type,amount,oldbalanceOrg,newbalanceOrig
0,2,9839.64,170136.0,160296.36
1,2,1864.28,21249.0,19384.72
2,4,181.00,181.0,0.00
3,1,181.00,181.0,0.00
4,2,11668.14,41554.0,29885.86
...,...,...,...,...
83555,1,14895.17,51759.0,36863.83
83556,2,7705.70,96490.0,88784.30
83557,1,319045.01,56471.0,0.00
83558,3,249169.96,3481.0,252650.96
