# Stage 1: Baseline Models

This notebook is for training and evaluating baseline models on the Elliptic++ dataset.

In [56]:
# Install dependencies
# Note: This assumes you are in a Colab/Kaggle environment.
# For local execution, it's better to use the requirements.txt file.
import sys
!"{sys.executable}" -m pip install -q numpy pandas torch torchvision scikit-learn pyyaml
!"{sys.executable}" -m pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-2.8.0+cpu.html


[notice] A new release of pip is available: 24.3.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Looking in links: https://data.pyg.org/whl/torch-2.8.0+cpu.html



[notice] A new release of pip is available: 24.3.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## Prepare Data

First, ensure your `ellipticpp.pt` file is accessible. If you are in Colab, you might need to upload it or mount your Google Drive.

In [57]:
# We assume the data is in the parent `data` directory
# If not, you may need to adjust the path.
# For example, in Colab, you might have it at '/content/ellipticpp.pt'
DATA_PATH = '../data/ellipticpp.pt'

## Run Baseline Training (Lite Mode)

Here, we'll run the training script with a small sample of the data to quickly verify that everything works. We will test GCN, GraphSAGE, and a simplified RGCN.

In [59]:
# Run the training script from the parent directory
# We need to change the working directory to the project root
import os
import sys
if os.path.basename(os.getcwd()) == 'notebooks':
    os.chdir('..')
if os.getcwd() not in sys.path:
    sys.path.append(os.getcwd())

# Train GCN
print("--- Training GCN ---")
!python src/train_baseline.py --model gcn --out_dir experiments/baseline/lite_gcn --epochs 5 --sample 1000

# Train GraphSAGE
print("\n--- Training GraphSAGE ---")
!python src/train_baseline.py --model graphsage --out_dir experiments/baseline/lite_graphsage --epochs 5 --sample 1000

# Train RGCN
# Note: The RGCN baseline is simplified and may not perform well without proper heterogeneous handling.
print("\n--- Training RGCN ---")
!python src/train_baseline.py --model rgcn --out_dir experiments/baseline/lite_rgcn --epochs 5 --sample 50000

--- Training GCN ---
Starting training for gcn...
Epoch 0 loss 19.5084 val_auc 0.7498
Epoch 1 loss 16.0915 val_auc 0.7750
Epoch 2 loss 16.1378 val_auc 0.7710
Epoch 3 loss 10.2079 val_auc 0.7356
Epoch 4 loss 8.5931 val_auc 0.6339
Final Test Metrics: {'auc': 0.5586854460093896, 'pr_auc': 0.9310536871586619, 'f1': 0.9594594594594594, 'precision': 0.922077922077922, 'recall': 1.0}

--- Training GraphSAGE ---
Starting training for gcn...
Epoch 0 loss 19.5084 val_auc 0.7498
Epoch 1 loss 16.0915 val_auc 0.7750
Epoch 2 loss 16.1378 val_auc 0.7710
Epoch 3 loss 10.2079 val_auc 0.7356
Epoch 4 loss 8.5931 val_auc 0.6339
Final Test Metrics: {'auc': 0.5586854460093896, 'pr_auc': 0.9310536871586619, 'f1': 0.9594594594594594, 'precision': 0.922077922077922, 'recall': 1.0}

--- Training GraphSAGE ---
Starting training for graphsage...
Epoch 0 loss 3.1348 val_auc 0.8831
Epoch 1 loss 2.8140 val_auc 0.8921
Epoch 2 loss 1.3252 val_auc 0.8975
Epoch 3 loss 1.4437 val_auc 0.8923
Epoch 4 loss 1.2803 val_auc 0.

## Evaluate the Trained Models

In [60]:
print("--- GCN Evaluation ---")
!python src/eval.py --model gcn --ckpt experiments/baseline/lite_gcn/ckpt.pth --sample 1000

print("\n--- GraphSAGE Evaluation ---")
!python src/eval.py --model graphsage --ckpt experiments/baseline/lite_graphsage/ckpt.pth --sample 1000

print("\n--- RGCN Evaluation ---")
!python src/eval.py --model rgcn --ckpt experiments/baseline/lite_rgcn/ckpt.pth --sample 50000

--- GCN Evaluation ---
Evaluation Metrics:
{
    "auc": 0.5661764705882353,
    "pr_auc": 0.8922645627227593,
    "f1": 0.9347079037800687,
    "precision": 0.8774193548387097,
    "recall": 1.0
}

--- GraphSAGE Evaluation ---
Evaluation Metrics:
{
    "auc": 0.5661764705882353,
    "pr_auc": 0.8922645627227593,
    "f1": 0.9347079037800687,
    "precision": 0.8774193548387097,
    "recall": 1.0
}

--- GraphSAGE Evaluation ---
Evaluation Metrics:
{
    "auc": 0.9629090909090909,
    "pr_auc": 0.9967456720839039,
    "f1": 0.9578544061302682,
    "precision": 0.9191176470588235,
    "recall": 1.0
}

--- RGCN Evaluation ---
Evaluation Metrics:
{
    "auc": 0.9629090909090909,
    "pr_auc": 0.9967456720839039,
    "f1": 0.9578544061302682,
    "precision": 0.9191176470588235,
    "recall": 1.0
}

--- RGCN Evaluation ---
Evaluation Metrics:
{
    "auc": 0.8524355828220858,
    "pr_auc": 0.9753117857999312,
    "f1": 0.9484666455896301,
    "precision": 0.9019843656043295,
    "recall": 1.0

## Check the Results

In [61]:
import json

def print_metrics(model_name, path):
    try:
        with open(path, 'r') as f:
            metrics = json.load(f)
        print(f"Metrics for {model_name}:")
        print(json.dumps(metrics, indent=4))
    except FileNotFoundError:
        print(f"metrics.json not found for {model_name}. Did the training script run successfully?")

print_metrics("GCN", "experiments/baseline/lite_gcn/metrics.json")
print_metrics("GraphSAGE", "experiments/baseline/lite_graphsage/metrics.json")
print_metrics("RGCN", "experiments/baseline/lite_rgcn/metrics.json")

Metrics for GCN:
{
    "auc": 0.5586854460093896,
    "pr_auc": 0.9310536871586619,
    "f1": 0.9594594594594594,
    "precision": 0.922077922077922,
    "recall": 1.0
}
Metrics for GraphSAGE:
{
    "auc": 0.8251141552511414,
    "pr_auc": 0.9698756500436133,
    "f1": 0.9511400651465798,
    "precision": 0.906832298136646,
    "recall": 1.0
}
Metrics for RGCN:
{
    "auc": 0.8777068415149686,
    "pr_auc": 0.9856827209766473,
    "f1": 0.958743842364532,
    "precision": 0.9207569485511532,
    "recall": 1.0
}


## Stage 1.2: Classical Baselines

Now, let's train and evaluate classical machine learning models (Logistic Regression and Random Forest) on the node features to establish a non-GNN baseline.

In [63]:
import torch
import numpy as np
import json
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from src.metrics import compute_metrics
import warnings
warnings.filterwarnings('ignore')

# --- 1. Load and Prepare Data ---
# We'll reuse the data loading logic concept from the GNN scripts
# to get the same training, validation, and test splits.

# Define the data path
DATA_PATH = 'data/ellipticpp/ellipticpp.pt'

# Load the full dataset
full_data = torch.load(DATA_PATH, weights_only=False)
tx_data = full_data['transaction']

# Ensure masks exist
if not hasattr(tx_data, 'train_mask') or tx_data.train_mask is None:
    num_tx_nodes = tx_data.num_nodes
    perm = torch.randperm(num_tx_nodes)
    tx_data.train_mask = torch.zeros(num_tx_nodes, dtype=torch.bool); tx_data.train_mask[perm[:int(0.7*num_tx_nodes)]] = True
    tx_data.val_mask = torch.zeros(num_tx_nodes, dtype=torch.bool); tx_data.val_mask[perm[int(0.7*num_tx_nodes):int(0.85*num_tx_nodes)]] = True
    tx_data.test_mask = torch.zeros(num_tx_nodes, dtype=torch.bool); tx_data.test_mask[perm[int(0.85*num_tx_nodes):]] = True

# Filter out unknown labels (y=3)
known_mask = tx_data.y != 3

# Get features and impute NaNs
X = tx_data.x[known_mask].numpy()
X[np.isnan(X)] = 0

# Get labels and remap: licit (1) -> 0, illicit (2) -> 1
y = tx_data.y[known_mask].numpy()
y[y == 1] = 0
y[y == 2] = 1

# Get masks
train_mask = tx_data.train_mask[known_mask].numpy()
test_mask = tx_data.test_mask[known_mask].numpy()

X_train, y_train = X[train_mask], y[train_mask]
X_test, y_test = X[test_mask], y[test_mask]

# --- 2. Train and Evaluate Models ---

models = {
    "Logistic Regression": LogisticRegression(max_iter=1000, random_state=42),
    "Random Forest": RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
}

results = {}

for name, model in models.items():
    print(f"--- Training {name} ---")
    model.fit(X_train, y_train)
    
    print(f"--- Evaluating {name} ---")
    y_pred_proba = model.predict_proba(X_test)[:, 1]
    
    metrics = compute_metrics(y_test, y_pred_proba)
    results[name] = metrics
    
    print(f"Metrics for {name}:")
    print(json.dumps(metrics, indent=4))
    print("\n")

--- Training Logistic Regression ---
--- Evaluating Logistic Regression ---
Metrics for Logistic Regression:
{
    "auc": 0.9675189304784065,
    "pr_auc": 0.9962562593245183,
    "f1": 0.9774386197743862,
    "precision": 0.970995385629532,
    "recall": 0.9839679358717435
}


--- Training Random Forest ---
--- Evaluating Random Forest ---
Metrics for Random Forest:
{
    "auc": 0.9960144001649832,
    "pr_auc": 0.9995579850479792,
    "f1": 0.9904005296259517,
    "precision": 0.9816272965879265,
    "recall": 0.9993319973279893
}


