

---

# Intro

**Plan**: Import credit card fraud data. Use encoder only transformer network for classifying time series credit card data

**Purpose/Intro**: Task is to develop transformer architecture proof of concept for potential application at work, detecting fraud. In a normal data science project it might be considered best practice to begin with more interpretable models first, for research purposes, but this project is solely for the purpose of assessing the viability of a transformer for this task.

**Hypothesis**: The attention mechanism of the transformer, when combined with an appropriate positional embedding method, is able to capture both long-term and short-term dependencies in time series credit-card fraud data.

**Methodology**: Using cross valdiation techniques on test dataset to calculate appropriate accuracy metrics (adjusting for the significant class imbalance for the dataset), with an aim to assess the viability of transformer networks for fraud classification.





---

# Data Sourcing and Processing



In [4]:

#import packages:

import os
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'

from google.colab import drive

try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False

if IN_COLAB:
  # Check if drive is mounted by looking for the mount point in the file system.
  # This is a more robust approach than relying on potentially internal variables.
  import os
  if not os.path.exists('/content/drive'):
    drive.mount('/content/drive')

#basics
import os
from google.colab import drive
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#table one
!pip install tableone
from tableone import TableOne

#torch
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset

#sklearn
from imblearn.over_sampling import RandomOverSampler
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedKFold, GridSearchCV
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RepeatedStratifiedKFold

from imblearn.over_sampling import RandomOverSampler

Collecting tableone
  Downloading tableone-0.9.1-py3-none-any.whl.metadata (8.5 kB)
Downloading tableone-0.9.1-py3-none-any.whl (41 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.6/41.6 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tableone
Successfully installed tableone-0.9.1


In [8]:
data_set_filepath = '/content/drive/MyDrive/Colab_Notebooks/Data/creditcard.feather'

df = pd.read_feather(data_set_filepath)

columns = df.columns.tolist()

print(f"The dataset lenghth is {str(len(df))}")
print(f"The number of columns is {str(len(columns))}")
print(f"The column names are {str(columns)}")
df.head(10)

table1 = TableOne(df, columns=columns, groupby= 'Class', pval=True)
print(table1)

data = df





The dataset lenghth is 284807
The number of columns is 31
The column names are ['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10', 'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20', 'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'Amount', 'Class']


KeyboardInterrupt: 



---
# Transformer Model



**No HP Tuning**: First we will implement our model without HP tuning and try to overfit, to just prove that we have the generalization power, and just check that we can actually set up and run the architecture

In [None]:


# **Set device for GPU acceleration**
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# **Data Preparation**
# Assume `data` is a pandas DataFrame containing your dataset
X = data.iloc[:, :-1].values  # Features
y = data.iloc[:, -1].values  # Labels (fraud or not)

# **Split data**
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

# **Convert to PyTorch tensors**
# **Convert to PyTorch tensors**
# Ensure tensors have 3 dimensions by adding a dimension if necessary
train_data = TensorDataset(torch.tensor(X_train, dtype=torch.float32).to(device).unsqueeze(1),
                           torch.tensor(y_train, dtype=torch.float32).to(device))
val_data = TensorDataset(torch.tensor(X_val, dtype=torch.float32).to(device).unsqueeze(1),
                         torch.tensor(y_val, dtype=torch.float32).to(device))
test_data = TensorDataset(torch.tensor(X_test, dtype=torch.float32).to(device).unsqueeze(1),
                          torch.tensor(y_test, dtype=torch.float32).to(device))

# **DataLoader for batching**
batch_size = 64
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size)

# **Transformer Model with Batch Normalization**
# **Transformer Model with Batch Normalization**
# **Transformer Model with Batch Normalization**
class FraudDetectionTransformer(nn.Module):
    def __init__(self, input_dim, embed_dim, num_heads, ff_dim, num_layers, dropout=0.1):
        super(FraudDetectionTransformer, self).__init__()
        self.embedding = nn.Linear(input_dim, embed_dim)  # Embedding layer
        self.batch_norm = nn.BatchNorm1d(input_dim)  # Batch Normalization before embedding

        # Define the Transformer encoder
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=embed_dim,  # Embedding dimension
            nhead=num_heads,  # Number of attention heads
            dim_feedforward=ff_dim,  # Feed-forward network dimension
            dropout=dropout  # Dropout rate
        )
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)  # Stack layers

        self.pooling = nn.AdaptiveAvgPool1d(1)  # Global average pooling
        self.fc = nn.Linear(embed_dim, 1)  # Fully connected layer for classification
        self.sigmoid = nn.Sigmoid()  # Sigmoid activation for binary classification

    def forward(self, x):
        x = self.batch_norm(x.squeeze(1))  # Apply BatchNorm and squeeze before embedding
        x = self.embedding(x).unsqueeze(1)  # Embedding and unsqueeze to add sequence dimension
        x = x.permute(1, 0, 2)  # Reshape for transformer input (sequence_length, batch_size, embedding_dim)
        x = self.transformer(x)  # Pass input to encoder only
        x = self.pooling(x.permute(1, 2, 0)).squeeze()  # Global pooling to get a single feature vector
        x = self.fc(x)  # Apply fully connected layer for classification
        return self.sigmoid(x)  # Apply sigmoid to get probability

# **Training Function**
def train_model(model, train_loader, val_loader, epochs, lr):
    model = model.to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = nn.BCELoss()

    train_losses = []
    val_losses = []

    for epoch in range(epochs):
        # Training phase
        model.train()
        train_loss = 0.0
        for inputs, labels in train_loader:
            optimizer.zero_grad()
            outputs = model(inputs).squeeze()
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            train_loss += loss.item()
        train_losses.append(train_loss / len(train_loader))

        # Validation phase
        model.eval()
        val_loss = 0.0
        with torch.no_grad():
            for inputs, labels in val_loader:
                outputs = model(inputs).squeeze()
                loss = criterion(outputs, labels)
                val_loss += loss.item()
        val_losses.append(val_loss / len(val_loader))

        print(f"Epoch {epoch + 1}/{epochs} - Train Loss: {train_losses[-1]:.4f} - Val Loss: {val_losses[-1]:.4f}")

    return train_losses, val_losses

# **Plotting Function**
def plot_losses(train_losses, val_losses):
    plt.figure(figsize=(10, 6))
    plt.plot(range(1, len(train_losses) + 1), train_losses, label="Training Loss")
    plt.plot(range(1, len(val_losses) + 1), val_losses, label="Validation Loss")
    plt.xlabel("Epochs")
    plt.ylabel("Loss")
    plt.title("Training and Validation Loss")
    plt.legend()
    plt.grid(True)
    plt.show()

# Model Initialization and Training
input_dim = X_train.shape[1]
embed_dim = 128
num_heads = 4
ff_dim = 256
num_layers = 2
dropout = 0.1  # Set dropout rate

model = FraudDetectionTransformer(input_dim=input_dim, embed_dim=embed_dim, num_heads=num_heads,
                                   ff_dim=ff_dim, num_layers=num_layers, dropout=dropout)

# Train the model and plot results
train_losses, val_losses = train_model(model, train_loader, val_loader, epochs=20, lr=1e-3)
plot_losses(train_losses, val_losses)




Epoch 1/20 - Train Loss: 0.0055 - Val Loss: 0.0042
Epoch 2/20 - Train Loss: 0.0043 - Val Loss: 0.0045
Epoch 3/20 - Train Loss: 0.0040 - Val Loss: 0.0038
Epoch 4/20 - Train Loss: 0.0039 - Val Loss: 0.0039
Epoch 5/20 - Train Loss: 0.0039 - Val Loss: 0.0041
Epoch 6/20 - Train Loss: 0.0038 - Val Loss: 0.0038
Epoch 7/20 - Train Loss: 0.0037 - Val Loss: 0.0039
Epoch 8/20 - Train Loss: 0.0036 - Val Loss: 0.0037
Epoch 9/20 - Train Loss: 0.0036 - Val Loss: 0.0037
Epoch 10/20 - Train Loss: 0.0036 - Val Loss: 0.0036
Epoch 11/20 - Train Loss: 0.0035 - Val Loss: 0.0038
Epoch 12/20 - Train Loss: 0.0034 - Val Loss: 0.0038
Epoch 13/20 - Train Loss: 0.0035 - Val Loss: 0.0033
Epoch 14/20 - Train Loss: 0.0036 - Val Loss: 0.0035
Epoch 15/20 - Train Loss: 0.0034 - Val Loss: 0.0034
Epoch 16/20 - Train Loss: 0.0035 - Val Loss: 0.0033
Epoch 17/20 - Train Loss: 0.0034 - Val Loss: 0.0033
Epoch 18/20 - Train Loss: 0.0033 - Val Loss: 0.0034
Epoch 19/20 - Train Loss: 0.0034 - Val Loss: 0.0036




---

# Auto commit to github

In [3]:
import datetime
import os

# Navigate to the repository directory (if not already there)
%cd /content/drive/MyDrive/Colab_Notebooks/Deep_Learning_Practice

with open('/content/drive/MyDrive/IAM/PAT.txt', 'r') as file:
      github_pat = file.read().strip()
os.environ['GITHUB_PAT'] = github_pat

!git remote add origin "https://github.com/archiegoodman2/machine_learning_practice"

# Replace with your actual username and email (or configure globally)
USERNAME="archiegoodman2"
EMAIL="archiegoodman2011@gmail.com"

# Set global username and email configuration
!git config --global user.name "$USERNAME"
!git config --global user.email "$EMAIL"

now = datetime.datetime.now()
current_datetime = now.strftime("%Y-%m-%d %H:%M")

# Set remote URL using the PAT from environment variable
!git remote set-url origin https://{os.environ['GITHUB_PAT']}@github.com/archiegoodman2/machine_learning_practice.git

# Replace with your desired commit message
COMMIT_MESSAGE = str(current_datetime) + " debugged and applied default pytorch embeddings layer instead of the custom one. "

# Stage all changes
!git add .

# Commit the changes
!git commit -m "$COMMIT_MESSAGE"

# Push to origin
!git push origin master


/content/drive/MyDrive/Colab_Notebooks/Deep_Learning_Practice
error: remote origin already exists.
[master 593ccc6] 2024-12-03 20:23 commit
 2 files changed, 2 insertions(+), 2 deletions(-)
 rewrite nn_transformer_creditcardfraud.ipynb (80%)
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 2 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 1.77 KiB | 139.00 KiB/s, done.
Total 4 (delta 3), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (3/3), completed with 3 local objects.[K
To https://github.com/archiegoodman2/machine_learning_practice.git
   cea3757..593ccc6  master -> master
