#Fraudulent Transaction Detection using Deep Learning and TensorFlow

This code demonstrates the implementation of a deep learning model for detecting fraudulent transactions in a financial dataset. We utilize TensorFlow and Keras to build a neural network that learns to identify potentially fraudulent transactions based on a given set of features. The dataset contains various transaction-related attributes and a binary target variable indicating whether the transaction is fraudulent or not.

##Dependencies

Ensure you have the following libraries installed before running the code:



*   NumPy
*   Pandas
*   TensorFlow
*   Keras
*   Scikit-learn

In [1]:
!pip install numpy pandas tensorflow scikit-learn

Collecting scikit-learn
  Downloading scikit_learn-1.3.2-cp39-cp39-win_amd64.whl.metadata (11 kB)
Collecting scipy>=1.5.0 (from scikit-learn)
  Downloading scipy-1.11.4-cp39-cp39-win_amd64.whl.metadata (60 kB)
     ---------------------------------------- 0.0/60.4 kB ? eta -:--:--
     ------ --------------------------------- 10.2/60.4 kB ? eta -:--:--
     ------------------- ------------------ 30.7/60.4 kB 660.6 kB/s eta 0:00:01
     ------------------- ------------------ 30.7/60.4 kB 660.6 kB/s eta 0:00:01
     -------------------------------------- 60.4/60.4 kB 291.6 kB/s eta 0:00:00
Collecting joblib>=1.1.1 (from scikit-learn)
  Downloading joblib-1.3.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=2.0.0 (from scikit-learn)
  Downloading threadpoolctl-3.2.0-py3-none-any.whl.metadata (10.0 kB)
Downloading scikit_learn-1.3.2-cp39-cp39-win_amd64.whl (9.3 MB)
   ---------------------------------------- 0.0/9.3 MB ? eta -:--:--
   ----------------------------------------

In [2]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras




##Loading and Preprocessing the Data

In [3]:
data = pd.read_csv('Fraud.csv')

We start by loading the dataset using Pandas from a CSV file. The dataset contains features like transaction amount, type, and others. We preprocess the data by converting the 'type' column into one-hot encoded features, as neural networks require numerical inputs. Additionally, we split the data into training and test sets to evaluate the model's performance.

In [4]:
data.head()

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0,0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0,0
2,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,1,0
3,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,1,0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0,0


In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6362620 entries, 0 to 6362619
Data columns (total 11 columns):
 #   Column          Dtype  
---  ------          -----  
 0   step            int64  
 1   type            object 
 2   amount          float64
 3   nameOrig        object 
 4   oldbalanceOrg   float64
 5   newbalanceOrig  float64
 6   nameDest        object 
 7   oldbalanceDest  float64
 8   newbalanceDest  float64
 9   isFraud         int64  
 10  isFlaggedFraud  int64  
dtypes: float64(5), int64(3), object(3)
memory usage: 534.0+ MB


In [6]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder

# Convert the 'type' column to one-hot encoded features
encoder = OneHotEncoder()
type_encoded = encoder.fit_transform(data[['type']]).toarray()

# Concatenate the one-hot encoded features with the original features
features = np.concatenate((data.drop(['type', 'isFraud', 'nameOrig', 'nameDest', 'isFlaggedFraud'], axis=1).values, type_encoded), axis=1)

# Separate features (input) and labels (output)
labels = data["isFraud"].values

# Split the data into training and test sets
train_features, test_features, train_labels, test_labels = train_test_split(
    features, labels, test_size=0.2, random_state=42
)

# Normalize/Standardize the features (optional but recommended)
scaler = StandardScaler()
train_features = scaler.fit_transform(train_features)
test_features = scaler.transform(test_features)

##Building the Model

In [7]:
# Build the model
input_dim = features.shape[1]
model = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(input_dim,)),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(1, activation='sigmoid')  # Output layer with 1 unit and a sigmoid activation for binary classification
])





The neural network model is constructed using Keras' Sequential API. The model architecture consists of three dense layers, each followed by a dropout layer to prevent overfitting. The first layer has 128 neurons, followed by a dropout rate of 0.2, and the second layer contains 64 neurons with another dropout rate of 0.2. The output layer is a single neuron with a sigmoid activation function, which is ideal for binary classification tasks like fraud detection.

##Compiling and Training the Model

We compile the model using the Adam optimizer and binary cross-entropy loss, which is well-suited for binary classification problems. We use accuracy as a metric to monitor the model's performance during training. The model is then trained on the training data for 10 epochs with a batch size of 32

In [8]:
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(train_features, train_labels, epochs=10, batch_size=32)


Epoch 1/10


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x21791da9370>

##Model Evaluation

In [9]:
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(test_features, test_labels)
print("Test accuracy:", test_acc)

Test accuracy: 0.9995222091674805


After training, we evaluate the model's performance on the test set to measure its accuracy in predicting fraudulent transactions. The test loss and accuracy are computed and displayed.

##Saving the Model

In [10]:
# Specify the filename or directory where you want to save the model
model_filename = "trained_model.h5"

# Save the model
model.save(model_filename)

  saving_api.save_model(


Finally, the trained model is saved in the Hierarchical Data Format (HDF5) format with the file name "trained_model.h5". This saved model can later be loaded and used for making predictions on new data without retraining.