# Online Fraud Detection using ANN (Artificial Neural Networks)

### About the Dataset
This dataset is taken from Kaggle via this link <a href="https://www.kaggle.com/datasets/jainilcoder/online-payment-fraud-detection">https://www.kaggle.com/datasets/jainilcoder/online-payment-fraud-detection</a>
    
```
To identify online payment fraud with machine learning, we need to train a machine learning model for classifying fraudulent and non-fraudulent payments. For this, we need a dataset containing information about online payment fraud, so that we can understand what type of transactions lead to fraud. For this task, I collected a dataset from Kaggle, which contains historical information about fraudulent transactions which can be used to detect fraud in online payments. Below are all the columns from the dataset I’m using here:

step: represents a unit of time where 1 step equals 1 hour
type: type of online transaction
amount: the amount of the transaction
nameOrig: customer starting the transaction
oldbalanceOrg: balance before the transaction
newbalanceOrig: balance after the transaction
nameDest: recipient of the transaction
oldbalanceDest: initial balance of recipient before the transaction
newbalanceDest: the new balance of recipient after the transaction
isFraud: fraud transaction
```

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.metrics import accuracy_score, f1_score
from imblearn.over_sampling import SMOTE
from imblearn.pipeline import Pipeline
import joblib

In [None]:
# Load the dataset
file_path = 'onlinefraud.csv'
data = pd.read_csv(file_path)

In [25]:
data.head()

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud
0,1,3,-0.28156,757869,-0.22981,-0.237622,1662094,-0.323814,-0.333411,0
1,1,3,-0.294767,2188998,-0.281359,-0.285812,1733924,-0.323814,-0.333411,0
2,1,4,-0.297555,1002156,-0.288654,-0.292442,439685,-0.323814,-0.333411,1
3,1,1,-0.297555,5828262,-0.288654,-0.292442,391696,-0.317582,-0.333411,1
4,1,3,-0.278532,3445981,-0.274329,-0.282221,828919,-0.323814,-0.333411,0


In [None]:
# Encode categorical features
le_type = LabelEncoder()
data['type'] = le_type.fit_transform(data['type'])
le_nameOrig = LabelEncoder()
data['nameOrig'] = le_nameOrig.fit_transform(data['nameOrig'])
le_nameDest = LabelEncoder()
data['nameDest'] = le_nameDest.fit_transform(data['nameDest'])

In [None]:
# Handle missing values (if any)
data.fillna(0, inplace=True)
data.drop(columns="isFlaggedFraud",inplace=True)
# Normalize numerical features
scaler = StandardScaler()
data[['amount', 'oldbalanceOrg', 'newbalanceOrig', 'oldbalanceDest', 'newbalanceDest']] = scaler.fit_transform(
    data[['amount', 'oldbalanceOrg', 'newbalanceOrig', 'oldbalanceDest', 'newbalanceDest']]
)

In [None]:
# Define features and target
X = data.drop(columns=['isFraud'])
y = data['isFraud']

# Initialize StratifiedKFold and SMOTE
skf = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)
smote = SMOTE(random_state=42)

In [None]:
# Initialize lists to store results
accuracy_scores = []
f1_scores = []

for train_index, val_index in skf.split(X, y):
    X_train, X_val = X.iloc[train_index], X.iloc[val_index]
    y_train, y_val = y.iloc[train_index], y.iloc[val_index]

    # Apply SMOTE to the training data
    X_train_res, y_train_res = smote.fit_resample(X_train, y_train)
    
    # Build the model
    model = Sequential()
    model.add(Dense(64, input_dim=X_train_res.shape[1], activation='relu'))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

    # Train the model
    model.fit(X_train_res, y_train_res, epochs=2, batch_size=32, verbose=0)
    
    # Evaluate the model on the validation set
    y_val_pred = (model.predict(X_val) > 0.5).astype("int32")
    accuracy = accuracy_score(y_val, y_val_pred)
    f1 = f1_score(y_val, y_val_pred)
    
    accuracy_scores.append(accuracy)
    f1_scores.append(f1)

In [None]:
# Calculate average scores
average_accuracy = np.mean(accuracy_scores)
average_f1_score = np.mean(f1_scores)

print(f'Average Accuracy: {average_accuracy}')
print(f'Average F1 Score: {average_f1_score}')

In [None]:
# Save the final model trained on the entire dataset
smote = SMOTE(random_state=42)
X_res, y_res = smote.fit_resample(X, y)

model = Sequential()
model.add(Dense(64, input_dim=X_res.shape[1], activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X_res, y_res, epochs=2, batch_size=32)

model.save('ann_fraud_detection_model.h5', include_optimizer=False)

In [24]:
# Save the scaler and label encoders
joblib.dump(scaler, 'model/scaler.pkl')
joblib.dump(le_type, 'model/label_encoder_type.pkl')
joblib.dump(le_nameOrig, 'model/label_encoder_nameOrig.pkl')
joblib.dump(le_nameDest, 'model/label_encoder_nameDest.pkl')

Average Accuracy: 0.3337853482478774
Average F1 Score: 0.0017188267783042326
Epoch 1/2
Epoch 2/2


  saving_api.save_model(
