<a href="https://colab.research.google.com/github/Maedeabm/Fraud-Detection-Using-Neural-Network/blob/main/Fraud_Detection_improved_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

To improve the detection of those sneaky unicorns (fraudulent transactions), let's deploy a series of enhanced techniques and strategies. Here's a plan of action:
1. Class Imbalance Solutions:

Your dataset, being heavily imbalanced, is a classic issue in fraud detection. We've used SMOTE before, but there are other techniques too:

    Under-sampling: Reduce the number of regular transactions (horses) to match the number of frauds. However, be cautious: you might lose vital information this way.
    Combined SMOTE and Under-sampling: First, increase the frauds using SMOTE and then reduce the genuine transactions to balance.

2. Feature Engineering:

Expanding our clues to hunt those unicorns. Maybe there are other parameters or combinations of parameters that give clearer signals of a fraud.

  Transaction aggregations: For example, calculating the average transaction amount for a user over a day and comparing it with the current transaction.
    
  Time-based features: Like, is there a specific time when frauds are more likely?

3. More Advanced Models:

LSTM is powerful, but there are other architectures and algorithms worth trying:

  GRU (Gated Recurrent Units): Similar to LSTM but can be faster and just as effective in some cases.
  
  1D Convolutional Neural Networks: Great for sequence data, and sometimes combined with LSTM/GRU layers.
  
  Ensemble methods: Like combining predictions from Random Forest, Gradient Boosting, and Neural Networks to get a more robust result.

4. Hyperparameter Tuning:

  Use tools like GridSearchCV or RandomizedSearchCV to find the optimal parameters for your model. Think of it as adjusting the focus on a telescope to get the clearest view of stars (or in our case, unicorns).

5. Evaluation Metric:

Given our problem, accuracy isn't the best metric. We should prioritize Recall (to catch as many frauds as possible) but also keep an eye on Precision (to avoid too many false alarms).
6. Threshold Tuning:

After training, instead of using the default 0.5 threshold for classification, adjust it to find the sweet spot where Recall and Precision are balanced. Imagine this as adjusting the sensitivity of our unicorn detector.
7. Regularization:

Adding L1 or L2 regularization can prevent overfitting and make the model generalize better on unseen data.

Remember, in the wild world of machine learning, especially with fraud detection, there's no one-size-fits-all solution. It's a cycle of trying, learning, and refining. Let's give these steps a shot and see how close we get to becoming the ultimate unicorn hunter! 🦄🔍

In [None]:
!pip install tensorflow
!pip install imbalanced-learn

import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from imblearn.over_sampling import SMOTE



In [None]:
from google.colab import files

uploaded = files.upload()

# Assuming the dataset is named "paysim.csv"
data = pd.read_csv('paysim.csv')

Saving paysim.csv to paysim.csv




Data Preprocessing:

This will be a basic preprocessing to get started:


In [None]:
# Dropping columns that may not be required for this basic model
data = data.drop(['nameOrig', 'nameDest', 'isFlaggedFraud'], axis=1)

# Convert categorical columns to numerical values
data = pd.get_dummies(data, columns=['type'], drop_first=True)

# Normalize the features
scaler = MinMaxScaler()
data[['amount', 'oldbalanceOrg', 'newbalanceOrig', 'oldbalanceDest', 'newbalanceDest']] = scaler.fit_transform(data[['amount', 'oldbalanceOrg', 'newbalanceOrig', 'oldbalanceDest', 'newbalanceDest']])

# Splitting data into features and target variable
X = data.drop('isFraud', axis=1).values
y = data['isFraud'].values

# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Reshape input to be 3D for LSTM [samples, timesteps, features]
X_train = X_train.reshape((X_train.shape[0], 1, X_train.shape[1]))
X_test = X_test.reshape((X_test.shape[0], 1, X_test.shape[1]))


1. Handling Class Imbalance with SMOTE:

First, we'll use the SMOTE technique to address the class imbalance issue before splitting the data.

In [None]:
from imblearn.over_sampling import SMOTE

# Make sure X_train is 2D
if len(X_train.shape) == 3:
    X_train = X_train.reshape(X_train.shape[0], X_train.shape[2])

smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

# Splitting the resampled data
X_train_resampled, X_test_resampled, y_train_resampled, y_test_resampled = train_test_split(X_resampled, y_resampled, test_size=0.2, random_state=42)

# Now, reshape the data for LSTM
X_train_resampled = X_train_resampled.reshape((X_train_resampled.shape[0], 1, X_train_resampled.shape[1]))
X_test_resampled = X_test_resampled.reshape((X_test_resampled.shape[0], 1, X_test_resampled.shape[1]))



2. Adjusting LSTM Model:

We'll modify the LSTM model structure, introducing more layers and nodes.

In [None]:
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(X_train_resampled.shape[1], X_train_resampled.shape[2])))
model.add(Dropout(0.3))
model.add(LSTM(64, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


3. Training with Early Stopping:

To avoid overfitting and ensure the model stops training once the validation loss stops improving, we'll use Early Stopping.

In [None]:
from keras.callbacks import EarlyStopping

es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10)
history = model.fit(X_train_resampled, y_train_resampled, epochs=100, batch_size=64, validation_split=0.2, verbose=2, callbacks=[es])


Epoch 1/100
101671/101671 - 1019s - loss: 0.2431 - accuracy: 0.8897 - val_loss: 0.1526 - val_accuracy: 0.9399 - 1019s/epoch - 10ms/step
Epoch 2/100
101671/101671 - 995s - loss: 0.1779 - accuracy: 0.9249 - val_loss: 0.1661 - val_accuracy: 0.9257 - 995s/epoch - 10ms/step
Epoch 3/100
101671/101671 - 990s - loss: 0.1560 - accuracy: 0.9352 - val_loss: 0.1128 - val_accuracy: 0.9562 - 990s/epoch - 10ms/step
Epoch 4/100
101671/101671 - 965s - loss: 0.1420 - accuracy: 0.9415 - val_loss: 0.0960 - val_accuracy: 0.9617 - 965s/epoch - 9ms/step
Epoch 5/100
101671/101671 - 949s - loss: 0.1313 - accuracy: 0.9466 - val_loss: 0.0989 - val_accuracy: 0.9619 - 949s/epoch - 9ms/step
Epoch 6/100
101671/101671 - 952s - loss: 0.1230 - accuracy: 0.9501 - val_loss: 0.1002 - val_accuracy: 0.9623 - 952s/epoch - 9ms/step
Epoch 7/100
101671/101671 - 959s - loss: 0.1176 - accuracy: 0.9526 - val_loss: 0.1020 - val_accuracy: 0.9645 - 959s/epoch - 9ms/step
Epoch 8/100
101671/101671 - 967s - loss: 0.1171 - accuracy: 0.95

4. Model Evaluation:

Once the model has been trained, evaluate its performance using the test set.

In [None]:
y_pred = model.predict(X_test_resampled)
y_pred = (y_pred > 0.5).astype(int).flatten()

from sklearn.metrics import classification_report
print(classification_report(y_test_resampled, y_pred))


              precision    recall  f1-score   support

           0       0.83      1.00      0.90   1017028
           1       1.00      0.79      0.88   1016374

    accuracy                           0.89   2033402
   macro avg       0.91      0.89      0.89   2033402
weighted avg       0.91      0.89      0.89   2033402



These are just a few steps to help improve your model. Depending on the results, further refinements might include additional feature engineering, hyperparameter tuning, or even integrating other models into an ensemble. Remember, the aim is to boost the recall for the fraudulent class while keeping precision at an acceptable level.