<a href="https://colab.research.google.com/github/amrahmani/NN/blob/main/Ch6_AutoencoderFraudDetection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 Using PyTorch, create an autoencoder for credit card fraud detection. Train and test it with the following dataset: https://raw.githubusercontent.com/amrahmani/Pythorch/main/CreditDataset.csv

This dataset has 31 columns (features) including 'Time', 'V1', 'V2', ..., 'V28', 'Amount', and 'Class'. All of its input data are numbers (positive and negative numbers) and a label 'Class' indicating fraud (1) or legitimate transaction (0).


**Load and preprocess the dataset**

In [None]:
# Import necessary libraries
import pandas as pd
import requests
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import recall_score, precision_score, f1_score, precision_recall_curve, accuracy_score
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# Load the dataset from URL using requests library
url = "https://raw.githubusercontent.com/amrahmani/Pythorch/main/CreditDataset.csv"
response = requests.get(url)  # Send a GET request to the URL
data = pd.read_csv(url)  # Read the CSV data into a pandas DataFrame

# Separate features and labels
X = data.drop('Class', axis=1)  # Rremove column 'Class', dropping a column (axis=1), a row (axis=0)
y = data['Class']  # Labels (only 'Class' column)

# Standardize the features using StandardScaler
scaler = StandardScaler()  # Create a StandardScaler object
X_scaled = scaler.fit_transform(X)  # Fit and transform the features

# Convert the standardized features and labels to PyTorch tensors
X_tensor = torch.tensor(X_scaled, dtype=torch.float32)  # Convert features to tensor
y_tensor = torch.tensor(y.values, dtype=torch.float32)  # Convert labels to tensor

# Use the last 20% rows as test data
num_test_rows = int(len(data) * 0.2)

# Split data using the calculated number of test rows
X_test = X_tensor[-num_test_rows:]
y_test = y_tensor[-num_test_rows:]

# Use the remaining data as training data
X_train = X_tensor[:-num_test_rows]
# y_train = y_tensor[:-num_test_rows] Why we do not need y_train?

# Convert training data to DataLoader for PyTorch
train_data = torch.utils.data.TensorDataset(X_train, X_train)  # Create a TensorDataset for training data, X_train = y_train why?
train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)  # Create a DataLoader for training

**Define and train the autoencoder**

In [None]:
# Define the autoencoder architecture using PyTorch's nn.Module
class FraudAutoencoder(nn.Module):
    def __init__(self):
        super(FraudAutoencoder, self).__init__()
        # Encoder layers Number of inputs = Number of outputs = 30
        self.encoder = nn.Sequential(
            nn.Linear(30, 20),  # Input size: 30, Output size: 20
            nn.ReLU(True),  # ReLU activation function
            nn.Linear(20, 10),  # Input size: 20, Output size: 10
            nn.ReLU(True))  # ReLU activation function
        # Decoder layers
        self.decoder = nn.Sequential(
            nn.Linear(10, 20),  # Input size: 10, Output size: 20
            nn.ReLU(True),  # ReLU activation function
            nn.Linear(20, 30),  # Input size: 20, Output size: 30
            nn.Tanh())  # Tanh activation function

    def forward(self, x):
        x = self.encoder(x)  # Pass input through encoder layers
        x = self.decoder(x)  # Pass output through decoder layers
        return x

# Instantiate the autoencoder model, define loss function and optimizer
model = FraudAutoencoder()  # Create an instance of the FraudAutoencoder class
criterion = nn.MSELoss()  # Mean Squared Error loss function
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)  # Adam optimizer with learning rate 0.001

# Train the autoencoder
num_epochs = 20  # Number of epochs for training
for epoch in range(num_epochs):
    for data in train_loader:
        inputs, _ = data  # Get input data (ignoring labels)
        optimizer.zero_grad()  # Clear gradients
        outputs = model(inputs)  # Forward pass

        loss = criterion(outputs, inputs)  # Calculate loss (error)
        loss.backward()  # Backward pass
        optimizer.step()  # Update weights
    if (epoch+1) % 10 == 0:
      print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))  # Print epoch and loss

Epoch [10/20], Loss: 0.4345
Epoch [20/20], Loss: 0.6813


**Evaluate the autoencoder using accuracy, recall, precision, and F-measure**

In [None]:
# Evaluate the autoencoder using AUPRC
with torch.no_grad():
  # reconstruction_error is very important in Autoencoders, it is the difference between the original test data (input) and the data reconstructed by the model.
  reconstruction_error = torch.mean((X_test - model(X_test)) ** 2, dim=1) # It calculates the mean squared error between the original test data and the predictions, in Autoencoders inputs = outputs
  precision, recall, _ = precision_recall_curve(y_test, reconstruction_error)

# Convert reconstruction error to binary predictions (fraud or not fraud) using a threshold of mean()+std()
predictions = (reconstruction_error > reconstruction_error.mean()+reconstruction_error.std()).float()

# Calculate accuracy, recall, precision, and F-measure
accuracy = accuracy_score(y_test, predictions)  # Import from sklearn.metrics
recall = recall_score(y_test, predictions)  # Import from sklearn.metrics
precision = precision_score(y_test, predictions)  # Import from sklearn.metrics
f1 = f1_score(y_test, predictions)  # Import from sklearn.metrics

print('Accuracy: {:.4f}'.format(accuracy))
print('Recall: {:.4f}'.format(recall))
print('Precision: {:.4f}'.format(precision))
print('F-measure: {:.4f}'.format(f1))

Accuracy: 0.9858
Recall: 0.8444
Precision: 0.2197
F-measure: 0.3486
