**Fraud Detection using Graph Neural Networks (GNN)**

Notebook Overview
This notebook implements a **Graph Neural Network (GCN)** to detect fraudulent financial transactions by modeling
users and transactions as a graph.

Nodes represent users and transactions
Edges represent money flow between users via transactions
Fraud detection is performed by classifying transaction nodes

Key Steps Covered
1.Data loading and exploration
2.Transaction graph construction using NetworkX
3.Conversion to PyTorch Geometric format
4.GCN model implementation
5.Model training with class imbalance handling
6.Evaluation using Precision, Recall, and ROC-AUC

How to Run This Notebook
1.Ensure 'PaySim.csv' is present in the same directory
2.Install required dependencies:


In [1]:
import pandas as pd

df = pd.read_csv('PaySim.csv')
df.head()


Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0,0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0,0
2,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,1,0
3,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,1,0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0,0


In [2]:
df['isFraud'].value_counts()


isFraud
0    6354407
1       8213
Name: count, dtype: int64

**Transaction Graph Construction**

We construct a heterogeneous graph:
User nodes->senders and receivers
Transaction nodes->individual transactions
Edges connect users to transactions they participate in


In [3]:
import networkx as nx
import pandas as pd

def build_graph(csv_path, limit=50000):
    df = pd.read_csv(csv_path).head(limit)
    G = nx.Graph()

    for idx, row in df.iterrows():
        sender = f"user_{row['nameOrig']}"
        receiver = f"user_{row['nameDest']}"
        tx = f"tx_{idx}"

        # user nodes
        G.add_node(sender, node_type="user")
        G.add_node(receiver, node_type="user")

        # transaction node
        G.add_node(
            tx,
            node_type="transaction",
            amount=row['amount'],
            step=row['step'],
            label=row['isFraud']
        )

        # edges
        G.add_edge(sender, tx)
        G.add_edge(tx, receiver)

    return G


**Graph Conversion to PyTorch Geometric Format**

The NetworkX graph is converted into a 'torch_geometric.data.Data' object.
Node features are assigned
Labels are assigned only to transaction nodes
User nodes are marked with label '-1' and excluded from training


In [12]:
import torch
from torch_geometric.data import Data

def graph_to_pyg(G):
    node_map = {node: i for i, node in enumerate(G.nodes())}

    x = []
    y = []
    edge_index = []

    for node, data in G.nodes(data=True):
        if data["node_type"] == "transaction":
            x.append([data["amount"], data["step"]])
            y.append(data["label"])
        else:
            x.append([1.0, 0.0])
            y.append(-1) 

    for u, v in G.edges():
        edge_index.append([node_map[u], node_map[v]])
        edge_index.append([node_map[v], node_map[u]])

    return Data(
        x=torch.tensor(x, dtype=torch.float),
        edge_index=torch.tensor(edge_index, dtype=torch.long).t(),
        y=torch.tensor(y, dtype=torch.long)
    )


**Graph Convolutional Network (GCN) Model**

We use a two-layer **GCN architecture**:
First layer learns node embeddings
Second layer performs fraud classification


In [13]:
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

class FraudGCN(torch.nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.conv1 = GCNConv(input_dim, 32)
        self.conv2 = GCNConv(32, 2)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = self.conv2(x, edge_index)
        return x


**Model Training**

Data is split into train/test using boolean masks
Class imbalance is handled using weighted loss
Adam optimizer is used


In [15]:

import torch.nn.functional as F
import torch

G = build_graph('PaySim.csv')
data = graph_to_pyg(G)
num_nodes = data.y.size(0)
indices = torch.where(data.y >= 0)[0]

perm = torch.randperm(len(indices))
train_idx = indices[perm[:int(0.8 * len(indices))]]
test_idx  = indices[perm[int(0.8 * len(indices)):]]

train_mask = torch.zeros(num_nodes, dtype=torch.bool)
test_mask  = torch.zeros(num_nodes, dtype=torch.bool)

train_mask[train_idx] = True
test_mask[test_idx] = True

data.x = (data.x - data.x.mean(dim=0)) / (data.x.std(dim=0) + 1e-6)



model = FraudGCN(input_dim=data.x.shape[1])

optimizer = torch.optim.Adam(
    model.parameters(),
    lr=0.001,
    weight_decay=5e-4
    )





fraud = (data.y == 1).sum().item()
legit = (data.y == 0).sum().item()

class_weights = torch.tensor(
    [1.0, legit / fraud],
    dtype=torch.float
)

for epoch in range(100):
    model.train()
    optimizer.zero_grad()

    out = model(data.x, data.edge_index)
#     loss = F.cross_entropy(out[train_mask], data.y[train_mask])
    loss = F.cross_entropy(
    out[train_mask],
    data.y[train_mask],
    weight=class_weights
)


    loss.backward()
    optimizer.step()

    if epoch % 10 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")


Epoch 0, Loss: 0.6807
Epoch 10, Loss: 0.6668
Epoch 20, Loss: 0.6592
Epoch 30, Loss: 0.6537
Epoch 40, Loss: 0.6490
Epoch 50, Loss: 0.6447
Epoch 60, Loss: 0.6406
Epoch 70, Loss: 0.6366
Epoch 80, Loss: 0.6327
Epoch 90, Loss: 0.6289


**Model Evaluation**

We evaluate the model using:
**Precision**
**Recall**
**ROC-AUC**

Metrics are reported separately for training and test sets.


In [23]:
from sklearn.metrics import precision_score, recall_score, roc_auc_score

model.eval()
out = model(data.x, data.edge_index)

# pred = out[train_mask].argmax(dim=1)
probs = torch.softmax(out[train_mask], dim=1)[:,1]
pred = (probs > 0.9).long()

true = data.y[train_mask]

print("Precision:", precision_score(true, pred))
print("Recall:", recall_score(true, pred))
print("AUC:", roc_auc_score(true, out[train_mask][:,1].detach()))


Precision: 0.6666666666666666
Recall: 0.05333333333333334
AUC: 0.7165859319557504


In [21]:
from sklearn.metrics import precision_score, recall_score, roc_auc_score

model.eval()
out = model(data.x, data.edge_index)

probs = torch.softmax(out[test_mask], dim=1)[:, 1]
pred = (probs > 0.5).long()
true = data.y[test_mask]

print("Precision:", precision_score(true, pred))
print("Recall:", recall_score(true, pred))
print("AUC:", roc_auc_score(true, probs.detach()))


Precision: 0.007368421052631579
Recall: 0.56
AUC: 0.7081142857142858
