<a href="https://colab.research.google.com/github/Maedeabm/Fraud-Detection-Using-Neural-Network/blob/main/Fraud_Detection_improved_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

To further improve the model's performance, we can explore a combination of strategies:

  Model Complexity: Add more layers or nodes to the LSTM network to increase its capacity to learn more intricate patterns. However, be careful with overfitting. Always monitor the validation set performance.

  Regularization: If overfitting becomes an issue after making the model more complex, consider introducing L1 or L2 regularization in the model layers.

  Hyperparameter Tuning: Systematically tune hyperparameters, such as the learning rate, dropout rates, batch sizes, etc. You can use tools like Keras Tuner or Scikit-learn's GridSearchCV.

  Feature Engineering: It's a more time-consuming approach, but can you derive any other features from the existing ones? Maybe ratios between balances, or aggregations, etc.

  Ensemble Techniques: Combine the LSTM model with other models like Random Forest, Gradient Boosting Machines, or even simpler linear models. An ensemble method like stacking or blending could provide improved performance.

  Advanced Resampling Techniques: Besides SMOTE, there are other resampling techniques, such as ADASYN or Borderline-SMOTE, that could potentially give better results.

  Learning Rate Annealing: Decrease the learning rate over epochs. Keras provides learning rate schedules or callbacks for this purpose.

  Use Bi-directional LSTMs: Bi-directional LSTMs consider information from both past (backwards) and the future (forward) states simultaneously.

In [None]:
!pip install tensorflow
!pip install imbalanced-learn

import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from imblearn.over_sampling import SMOTE



In [None]:
from google.colab import files

uploaded = files.upload()

# Assuming the dataset is named "paysim.csv"
data = pd.read_csv('paysim.csv')

Saving paysim.csv to paysim.csv




Data Preprocessing:

This will be a basic preprocessing to get started:


In [None]:
# Dropping columns that may not be required for this basic model
data = data.drop(['nameOrig', 'nameDest', 'isFlaggedFraud'], axis=1)

# Convert categorical columns to numerical values
data = pd.get_dummies(data, columns=['type'], drop_first=True)

# Normalize the features
scaler = MinMaxScaler()
data[['amount', 'oldbalanceOrg', 'newbalanceOrig', 'oldbalanceDest', 'newbalanceDest']] = scaler.fit_transform(data[['amount', 'oldbalanceOrg', 'newbalanceOrig', 'oldbalanceDest', 'newbalanceDest']])

# Splitting data into features and target variable
X = data.drop('isFraud', axis=1).values
y = data['isFraud'].values

# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Reshape input to be 3D for LSTM [samples, timesteps, features]
X_train = X_train.reshape((X_train.shape[0], 1, X_train.shape[1]))
X_test = X_test.reshape((X_test.shape[0], 1, X_test.shape[1]))

1. Handling Class Imbalance with SMOTE:

First, we'll use the SMOTE technique to address the class imbalance issue before splitting the data.

In [None]:
from imblearn.over_sampling import SMOTE

# Make sure X_train is 2D
if len(X_train.shape) == 3:
    X_train = X_train.reshape(X_train.shape[0], X_train.shape[2])

smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

# Splitting the resampled data
X_train_resampled, X_test_resampled, y_train_resampled, y_test_resampled = train_test_split(X_resampled, y_resampled, test_size=0.2, random_state=42)

# Now, reshape the data for LSTM
X_train_resampled = X_train_resampled.reshape((X_train_resampled.shape[0], 1, X_train_resampled.shape[1]))
X_test_resampled = X_test_resampled.reshape((X_test_resampled.shape[0], 1, X_test_resampled.shape[1]))



2. Adjusting LSTM Model:

We'll modify the LSTM model structure, introducing more layers and nodes.

Here's an example using Bi-directional LSTMs and a more complex model structure:

In [None]:
from keras.layers import Bidirectional

model = Sequential()

model.add(Bidirectional(LSTM(128, return_sequences=True), input_shape=(X_train_resampled.shape[1], X_train_resampled.shape[2])))
model.add(Dropout(0.4))
model.add(Bidirectional(LSTM(64, return_sequences=True)))
model.add(Dropout(0.3))
model.add(Bidirectional(LSTM(32)))
model.add(Dropout(0.3))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


3. Training with Early Stopping:

To avoid overfitting and ensure the model stops training once the validation loss stops improving, we'll use Early Stopping.

In [None]:
from keras.callbacks import EarlyStopping

es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10)
history = model.fit(X_train_resampled, y_train_resampled, epochs=100, batch_size=64, validation_split=0.2, verbose=2, callbacks=[es])


Epoch 1/100
101671/101671 - 2026s - loss: 0.2559 - accuracy: 0.8826 - val_loss: 0.2242 - val_accuracy: 0.8988 - 2026s/epoch - 20ms/step
Epoch 2/100
101671/101671 - 1962s - loss: 0.1883 - accuracy: 0.9200 - val_loss: 0.1567 - val_accuracy: 0.9328 - 1962s/epoch - 19ms/step
Epoch 3/100
101671/101671 - 1942s - loss: 0.1699 - accuracy: 0.9289 - val_loss: 0.1214 - val_accuracy: 0.9525 - 1942s/epoch - 19ms/step
Epoch 4/100
101671/101671 - 1964s - loss: 0.1578 - accuracy: 0.9348 - val_loss: 0.1503 - val_accuracy: 0.9312 - 1964s/epoch - 19ms/step
Epoch 5/100
101671/101671 - 1929s - loss: 0.1487 - accuracy: 0.9391 - val_loss: 0.1357 - val_accuracy: 0.9448 - 1929s/epoch - 19ms/step
Epoch 6/100
101671/101671 - 1958s - loss: 0.1431 - accuracy: 0.9416 - val_loss: 0.1475 - val_accuracy: 0.9397 - 1958s/epoch - 19ms/step
Epoch 7/100
101671/101671 - 1953s - loss: 0.1370 - accuracy: 0.9445 - val_loss: 0.0827 - val_accuracy: 0.9707 - 1953s/epoch - 19ms/step
Epoch 8/100
101671/101671 - 1965s - loss: 0.1334

4. Model Evaluation:

Once the model has been trained, evaluate its performance using the test set.

In [None]:
y_pred = model.predict(X_test_resampled)
y_pred = (y_pred > 0.5).astype(int).flatten()

from sklearn.metrics import classification_report
print(classification_report(y_test_resampled, y_pred))


Remember, machine learning is iterative. Each change can be evaluated for its impact on performance. Make one adjustment at a time and see how it affects your metrics to understand which strategies are most effective for your dataset.