# NetGuard-GNN

NetGuard-GNN is an insider threat detection framework that integrates graph-based deep learning and statistical modeling.  
In this notebook, pre-collected or mock datasets are used to emulate the full system workflow, where operational data is originally obtained from hardware-level monitoring modules.

**Workflow Overview:**
1. Load heterogeneous graph and tabular datasets.
2. Train a Graph Neural Network (GNN) to capture structural patterns in user–resource interactions.
3. Train a One-Class SVM (OCSVM) on tabular features to detect statistical anomalies.
4. Fuse model outputs to generate ranked lists of high-risk users and resources.


## Mock Data Generation

### Data Generation for GNN

This step generates a synthetic activity log when no hardware-captured data is available:

- **Purpose:** Provide test input for the pipeline during development.  
- **Contents:**  
  - Timestamps (hourly intervals)  
  - User IDs and Resource IDs  
  - Actions (`login`, `read`, `download`)  
  - Data size in bytes  
  - Operation success flag (0 or 1)  
- **Output:** Saved as `verilog_output.csv` in the `data` directory for downstream processing.


In [2]:
import os
import pandas as pd
import numpy as np

DATA_DIR = "data"
CSV_FILE = os.path.join(DATA_DIR, "verilog_output.csv")

def mock_logs(path):
    """Generate a small mock CSV if no Verilog output exists yet."""
    df = pd.DataFrame({
        "timestamp": pd.date_range("2025-01-01", periods=20, freq="H"),
        "user_id": ["U001","U002","U003","U001","U002"] * 4,
        "resource_id": ["R01","R02","R03","R01","R03"] * 4,
        "action": np.random.choice(["login","read","download"], size=20),
        "bytes": np.random.randint(500, 5000, size=20),
        "success": np.random.choice([0, 1], size=20, p=[0.1, 0.9])
    })
    df.to_csv(path, index=False)
    print(f"[MOCK] Created mock log file at {path}")

### Data Loading

This step ensures the required activity log is available and loads it for processing:

- **Check & Create:**  
  - If `verilog_output.csv` does not exist, create the `data` directory (if missing) and generate mock logs.  
- **Load Data:**  
  - Read the CSV file into a DataFrame.  
  - Parse the `timestamp` column as datetime objects.  
- **Info Output:**  
  - Display the total number of loaded events.


In [3]:
if not os.path.exists(CSV_FILE):
    os.makedirs(DATA_DIR, exist_ok=True)
    mock_logs(CSV_FILE)

df = pd.read_csv(CSV_FILE, parse_dates=["timestamp"])
print(f"[INFO] Loaded {len(df)} events from {CSV_FILE}")

[INFO] Loaded 20 events from data\verilog_output.csv


### Graph Construction

This step transforms the event log into a heterogeneous graph structure for GNN processing:

- **Node Index Mapping:**  
  - Assign unique indices to each user (`user_id`) and resource (`resource_id`).  

- **Edge Creation:**  
  - Construct `edge_index` tensors representing user → resource access events.  
  - Attach edge attributes:  
    - Data size in bytes  
    - Operation success flag  
    - Encoded action type  

- **Node Feature Extraction:**  
  - **Users:** Mean bytes transferred and mean success rate.  
  - **Resources:** Mean bytes transferred.  

- **HeteroData Assembly:**  
  - Create `user` and `resource` node types with corresponding features.  
  - Add `accessed` edges and attributes between users and resources.  

- **Save Graph:**  
  - Store as `graph_data.pt` for downstream GNN training.


In [4]:
import torch
from torch_geometric.data import HeteroData

user_ids = df["user_id"].unique()
user_map = {uid: i for i, uid in enumerate(user_ids)}

# Map resource_id -> resource node index
resource_ids = df["resource_id"].unique()
res_map = {rid: i for i, rid in enumerate(resource_ids)}

# Build edge index (user -> resource)
src_nodes = df["user_id"].map(user_map).to_numpy()
dst_nodes = df["resource_id"].map(res_map).to_numpy()
edge_index = torch.tensor([src_nodes, dst_nodes], dtype=torch.long)

# Edge attributes (bytes, success, action_code)
edge_attr = torch.tensor(
    np.stack([
        df["bytes"].to_numpy(),
        df["success"].to_numpy(),
        pd.Categorical(df["action"]).codes
    ], axis=1),
    dtype=torch.float
)

# Node features (simple averages)
user_feats = np.zeros((len(user_ids), 2))
res_feats = np.zeros((len(resource_ids), 1))

for uid, idx in user_map.items():
    subset = df[df["user_id"] == uid]
    user_feats[idx, 0] = subset["bytes"].mean()
    user_feats[idx, 1] = subset["success"].mean()

for rid, idx in res_map.items():
    subset = df[df["resource_id"] == rid]
    res_feats[idx, 0] = subset["bytes"].mean()

user_x = torch.tensor(user_feats, dtype=torch.float)
res_x = torch.tensor(res_feats, dtype=torch.float)

# Create HeteroData object for GNN
data = HeteroData()
data["user"].x = user_x
data["resource"].x = res_x
data["user", "accessed", "resource"].edge_index = edge_index
data["user", "accessed", "resource"].edge_attr = edge_attr

# Save graph data
torch.save(data, os.path.join(DATA_DIR, "graph_data.pt"))
print(f"[INFO] Saved GNN graph data to {os.path.join(DATA_DIR, 'graph_data.pt')}")

[INFO] Saved GNN graph data to data\graph_data.pt


  edge_index = torch.tensor([src_nodes, dst_nodes], dtype=torch.long)


**Note:** The GNN graph data has been successfully saved and is available at `data\graph_data.pt`.

### Tabular Feature Generation

This step prepares event-level features for the One-Class SVM (OC-SVM) anomaly detection model:

- **Feature Extraction:**  
  - `bytes` – Size of data transferred.  
  - `success` – Operation success flag (0 or 1).  
  - `action_code` – Encoded action type (`login`, `read`, `download`).  
  - `hour` – Hour of the event timestamp.  
  - `dayofweek` – Day of the week of the event.  

- **Output:**  
  - Save features to `tabular_features.csv` in the `data` directory for OC-SVM training.  

- **Progress Update:**  
  - Indicate completion of Step 1 (data preparation) and readiness for Step 2 (model training).


In [5]:
tabular_df = pd.DataFrame({
    "bytes": df["bytes"],
    "success": df["success"],
    "action_code": pd.Categorical(df["action"]).codes,
    "hour": df["timestamp"].dt.hour,
    "dayofweek": df["timestamp"].dt.dayofweek
})
tabular_df.to_csv(os.path.join(DATA_DIR, "tabular_features.csv"), index=False)
print(f"[INFO] Saved OC-SVM features to {os.path.join(DATA_DIR, 'tabular_features.csv')}")

print("[DONE] Step 1 complete — ready for Step 2 training")

[INFO] Saved OC-SVM features to data\tabular_features.csv
[DONE] Step 1 complete — ready for Step 2 training


**Note:** The OC-SVM features have been successfully saved and Step 1 is complete. Features are available at `data\tabular_features.csv`. Ready to proceed with Step 2 training.

---

## Training GNN and One-Class SVM(OC-SVM) Models

### Data File Paths Initialization

This code snippet sets up the directory and file paths for storing graph data, tabular features, and Verilog output in the `data` folder.


In [6]:
import os
import numpy as np

DATA_DIR = "data"
GRAPH_FILE = os.path.join(DATA_DIR, "graph_data.pt")
TAB_FILE = os.path.join(DATA_DIR, "tabular_features.csv")
CSV_FILE = os.path.join(DATA_DIR, "verilog_output.csv")

### Min-Max Scaling of Scores

- **Function:** `scale_scores`
- **Purpose:** Normalize input data to the range `[0, 1]` for consistent feature scaling.
- **Input:** `x` — NumPy array (`np.ndarray`) containing the values to be scaled.
- **Methodology:** 
  - Reshape the input array to 2D to ensure compatibility with `MinMaxScaler`.
  - Fit the `MinMaxScaler` to the data.
  - Transform the data and flatten it back to 1D.
- **Output:** Scaled NumPy array (`np.ndarray`) with values constrained between 0 and 1.


In [7]:
def scale_scores(x: np.ndarray) -> np.ndarray:
    from sklearn.preprocessing import MinMaxScaler
    scaler = MinMaxScaler(feature_range=(0, 1))
    x_2d = x.reshape(-1, 1)
    scaler.fit(x_2d)
    return scaler.transform(x_2d).ravel()

### Load Heterogeneous Graph Data

- **Function:** `load_graph`  
- **Purpose:** Load graph data from disk and move it to GPU if available.  
- **Dependencies:** `torch`, `torch_geometric.data.HeteroData`, `BaseStorage`, `NodeStorage`, `EdgeStorage`  
- **Process:**  
  - Register safe globals for PyTorch serialization  
  - Load `HeteroData` from `GRAPH_FILE`  
  - Transfer data to CUDA if available, otherwise CPU  
- **Output:** `HeteroData` object ready for GNN processing  


In [8]:
def load_graph():
    import torch
    from torch_geometric.data import HeteroData
    from torch_geometric.data.storage import BaseStorage, NodeStorage, EdgeStorage


    torch.serialization.add_safe_globals([BaseStorage, NodeStorage, EdgeStorage])
    data: HeteroData = torch.load(GRAPH_FILE)
    return data.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))

data = load_graph()

### Edge Splitting Function for Graph Data

This code defines a function `split_edges` that splits the edges of a graph into training and validation sets. It uses **scikit-learn's `train_test_split`** to randomly partition the edges based on a specified training ratio. The function:

- Converts the edge index from a PyTorch tensor to a NumPy array for splitting.
- Splits edges into training and validation sets according to the `train_ratio`.
- Converts the resulting splits back to PyTorch tensors on the same device as the input.
- Returns the training and validation edge indices.  

The last line demonstrates splitting the `'user' → 'resource'` edges from the heterogeneous graph data into training edges.


In [9]:
def split_edges(edge_index, train_ratio=0.85):
    from sklearn.model_selection import train_test_split
    import torch
    
    edges = edge_index.cpu().numpy().T  # shape: (num_edges, 2)
    train_edges, val_edges = train_test_split(edges,
                                            train_size=train_ratio,
                                            shuffle=True,
                                            random_state=42)
    
    train_edges = torch.tensor(train_edges.T, dtype=torch.long, device=edge_index.device)
    val_edges = torch.tensor(val_edges.T, dtype=torch.long, device=edge_index.device)
    
    return train_edges, val_edges

train_pos, _ = split_edges(data[('user', 'accessed', 'resource')].edge_index)

### Heterogeneous Graph Neural Network (HeteroGNN) Definition

This code defines a **Heterogeneous Graph Neural Network (HeteroGNN)** using PyTorch Geometric. Key components:

- **HeteroGNN class**:
  - Inherits from `torch.nn.Module`.
  - Uses `HeteroConv` to handle multiple types of edges in a heterogeneous graph.
  - Applies `SAGEConv` on:
    - `('user', 'accessed', 'resource')` edges
    - `('resource', 'rev_accessed', 'user')` edges
  - Aggregates messages using the `'mean'` method.
  - Applies ReLU activation to node embeddings after convolution.

- **edge_score function**:
  - Computes a similarity score between user and resource embeddings using a dot product.
  - Useful for predicting edge existence or link strength in the graph.


In [10]:
from torch_geometric.nn import SAGEConv, HeteroConv
import torch.nn as nn

class HeteroGNN(nn.Module):
    def __init__(self, hidden=64):
        super().__init__()
        self.convs = HeteroConv({
            ('user', 'accessed', 'resource'): SAGEConv((-1, -1), hidden),
            ('resource', 'rev_accessed', 'user'): SAGEConv((-1, -1), hidden)
        }, aggr='mean')

    def forward(self, x_dict, edge_index_dict):
        x_dict = self.convs(x_dict, edge_index_dict)
        x_dict = {k: v.relu() for k, v in x_dict.items()}
        return x_dict

def edge_score(u_emb, r_emb):
    return (u_emb * r_emb).sum(dim=-1)

### Training HeteroGNN on Heterogeneous Graph Data

This code defines the `train_gnn` function to train the previously defined **HeteroGNN** on a heterogeneous graph dataset. Key steps:

- **Model and Loss Initialization**:
  - Instantiate `HeteroGNN` with 64 hidden units.
  - Use `BCEWithLogitsLoss` for edge prediction.
  - Optimizer: Adam with learning rate `1e-3` and weight decay `1e-4`.

- **Training Loop** (`epochs=50` by default):
  - Forward pass: Compute node embeddings for `'user'` and `'resource'`.
  - Positive edges: Compute logits for existing edges using `edge_score`.
  - Negative edges: Sample non-existent edges using `negative_sampling` and compute their logits.
  - Concatenate positive and negative logits; compute BCE loss and update model parameters.

- **Saving Model Weights**:
  - Extract trained parameters and save them using `pickle` to `gnn_model_manual.pth`.

This function trains the GNN to distinguish existing edges from randomly sampled non-existent edges in the heterogeneous graph.


In [11]:
from torch import nn
import torch
from torch_geometric.utils import negative_sampling
import os,pickle

def train_gnn(data, train_pos, epochs=50):
    device = data['user'].x.device
    gnn = HeteroGNN(hidden=64).to(device)
    bce = nn.BCEWithLogitsLoss()
    opt = torch.optim.Adam(gnn.parameters(), lr=1e-3, weight_decay=1e-4)

    for _ in range(epochs):
        gnn.train()
        opt.zero_grad()
        out = gnn(
            {'user': data['user'].x, 'resource': data['resource'].x},
            {
                ('user', 'accessed', 'resource'): train_pos,
                ('resource', 'rev_accessed', 'user'): train_pos.flip(0)
            }
        )

        user_z, res_z = out['user'], out['resource']
        u_pos, r_pos = train_pos[0], train_pos[1]
        pos_logits = edge_score(user_z[u_pos], res_z[r_pos])

        neg_edges = negative_sampling(
            edge_index=train_pos,
            num_nodes=(data['user'].x.size(0), data['resource'].x.size(0)),
            num_neg_samples=u_pos.size(0),
            method='sparse'
        )
        u_neg, r_neg = neg_edges[0], neg_edges[1]
        neg_logits = edge_score(user_z[u_neg], res_z[r_neg])

        logits = torch.cat([pos_logits, neg_logits], dim=0)
        labels = torch.cat([torch.ones_like(pos_logits), torch.zeros_like(neg_logits)], dim=0)
        loss = bce(logits, labels)
        loss.backward()
        opt.step()

    return gnn

gnn = train_gnn(data, train_pos)
weights = {name: param.detach().cpu() for name, param in gnn.named_parameters()}

path = os.path.join(DATA_DIR, "gnn_model_manual.pth")
with open(path, "wb") as f:
    pickle.dump(weights, f)


### Training One-Class SVM (OC-SVM) on Tabular Features

This code defines the `train_ocsvm` function to train a **One-Class SVM** for anomaly detection on tabular features. Key steps:

- **Data Loading**:
  - Load tabular features from `TAB_FILE` into a DataFrame.
  - Use all columns as features for training.

- **Pipeline Setup**:
  - **StandardScaler**: Standardizes features to zero mean and unit variance.
  - **OneClassSVM**: RBF kernel, `gamma='scale'`, and `nu=0.05` to detect anomalies.

- **Model Training and Saving**:
  - Fit the pipeline on the tabular features.
  - Save the trained pipeline using `joblib` to `ocsvm.joblib`.

- **Return Value**:
  - Returns the trained OC-SVM pipeline for later use in anomaly detection tasks.


In [12]:
def train_ocsvm():
    from sklearn.svm import OneClassSVM
    from sklearn.preprocessing import StandardScaler
    from sklearn.pipeline import Pipeline
    from joblib import dump

    tab = pd.read_csv(TAB_FILE)
    feat_cols = tab.columns.tolist()
    
    pipe = Pipeline([
        ("scaler", StandardScaler()),
        ("ocsvm", OneClassSVM(kernel="rbf", gamma="scale", nu=0.05))
    ])
    pipe.fit(tab[feat_cols])
    
    dump(pipe, os.path.join(DATA_DIR, "ocsvm.joblib"))
    return pipe

ocsvm_model = train_ocsvm()

## Anomaly Score Computation

### Computing GNN-Based Anomaly Scores

This code defines the `get_gnn_anomaly_scores` function to compute anomaly scores for nodes in a heterogeneous graph using a trained **HeteroGNN**. Key steps:

- **Input and Setup**:
  - Receives a trained `gnn` model and `data` containing nodes and edges.
  - Ensures node types `'user'` and `'resource'` exist in the data.
  - Prepares the edge index dictionary for message passing.

- **Forward Pass**:
  - Computes node embeddings using the GNN in evaluation mode (`torch.no_grad()`).
  - Checks that embeddings for both `'user'` and `'resource'` nodes are returned.

- **Anomaly Scoring**:
  - Computes edge probabilities using the `edge_score` and a sigmoid function.
  - Calculates a "surprisal" score (`1 - probability`) for each edge.
  - Aggregates scores per node and normalizes by the number of connected edges.

- **Output**:
  - Returns two arrays: `u_scores` for users and `r_scores` for resources, representing node-level anomaly scores.
  - Example usage: `u_gnn, r_gnn = get_gnn_anomaly_scores(gnn, data)` and normalized scores with `u_gnn_norm = scale_scores(u_gnn)`.


In [13]:
def get_gnn_anomaly_scores(gnn, data):
    import torch

    device = next(gnn.parameters()).device
    gnn.eval()

    print("Node types in data:", data.node_types)
    print("Edge types in data:", data.edge_types)

    try:
        x_dict = {
            'user': data['user'].x.to(device),
            'resource': data['resource'].x.to(device)
        }
    except KeyError as e:
        raise ValueError(f"Missing expected node type in data: {e}")

    edge_index_dict = {}
    if ('user', 'accessed', 'resource') in data.edge_types:
        eidx = data[('user', 'accessed', 'resource')].edge_index.to(device)
        edge_index_dict[('user', 'accessed', 'resource')] = eidx
        edge_index_dict[('resource', 'rev_accessed', 'user')] = eidx.flip(0)
    else:
        raise ValueError("Expected edge type ('user','accessed','resource') not found in data.")

    with torch.no_grad():
        emb = gnn(x_dict, edge_index_dict)

    if not isinstance(emb, dict):
        raise ValueError(f"GNN output is not a dict. Got type: {type(emb)}")

    user_z = emb.get('user', None)
    res_z = emb.get('resource', None)
    if user_z is None or res_z is None:
        raise ValueError(
            f"GNN output missing 'user' or 'resource' embeddings. Got keys: {list(emb.keys())}"
        )

    u_all = edge_index_dict[('user', 'accessed', 'resource')][0]
    r_all = edge_index_dict[('user', 'accessed', 'resource')][1]

    with torch.no_grad():
        lp = torch.sigmoid(edge_score(user_z[u_all], res_z[r_all]))

    u_scores = torch.zeros(user_z.size(0), device=device)
    u_count = torch.zeros_like(u_scores)
    r_scores = torch.zeros(res_z.size(0), device=device)
    r_count = torch.zeros_like(r_scores)

    for i in range(u_all.size(0)):
        u, r = u_all[i], r_all[i]
        surprisal = 1.0 - lp[i]
        u_scores[u] += surprisal
        u_count[u] += 1
        r_scores[r] += surprisal
        r_count[r] += 1

    u_scores = (u_scores / torch.clamp(u_count, min=1)).cpu().numpy()
    r_scores = (r_scores / torch.clamp(r_count, min=1)).cpu().numpy()

    return u_scores, r_scores

u_gnn, r_gnn = get_gnn_anomaly_scores(gnn, data)
u_gnn_norm = scale_scores(u_gnn)


Node types in data: ['user', 'resource']
Edge types in data: [('user', 'accessed', 'resource')]


- **Node types in data:** Lists all node categories present in the heterogeneous graph. Here, the graph contains `'user'` and `'resource'` nodes.  
- **Edge types in data:** Lists all edge relationships between node types. Here, the graph has a single edge type: `'user' → 'resource'` via `'accessed'`.  

This confirms that the input data matches the expected structure for the GNN anomaly scoring function.


### Computing OC-SVM-Based Anomaly Scores

This code defines the `get_ocsvm_anomaly_scores` function to calculate anomaly scores using a trained **One-Class SVM (OC-SVM)** pipeline. Key steps:

- **Data Loading**:
  - Reads the original raw CSV (`CSV_FILE`) and the tabular features (`TAB_FILE`).

- **Anomaly Scoring**:
  - Applies the OC-SVM `decision_function` to the tabular features.
  - Converts the decision scores into anomaly scores and normalizes them using `scale_scores`.

- **Aggregation**:
  - Assigns anomaly scores to individual events in the raw data.
  - Computes per-user anomaly scores by averaging event-level anomalies for each `user_id`.
  - Normalizes the user-level scores and handles missing users by filling with 0.0.

- **Return Value**:
  - `ocsvm_user`: Normalized anomaly scores for each user.
  - `user_ids`: List of all user IDs corresponding to the scores.

- **Example Usage**:
  ```python
  ocsvm_model = train_ocsvm()
  ocsvm_user, user_ids = get_ocsvm_anomaly_scores(ocsvm_model)


In [14]:
def get_ocsvm_anomaly_scores(pipe):
    raw = pd.read_csv(CSV_FILE, parse_dates=["timestamp"])
    tab = pd.read_csv(TAB_FILE)
    
    dfn = pipe.decision_function(tab)
    ocsvm_event_anom = scale_scores(-dfn)
    
    raw["ocsvm_event_anom"] = ocsvm_event_anom
    
    ocsvm_user = raw.groupby("user_id")["ocsvm_event_anom"].mean().reindex(
        raw["user_id"].astype("category").cat.categories
    ).fillna(0.0).values
    
    ocsvm_user = scale_scores(ocsvm_user)
    user_ids = raw["user_id"].astype("category").cat.categories.tolist()
    
    return ocsvm_user, user_ids

ocsvm_model = train_ocsvm()
ocsvm_user, user_ids = get_ocsvm_anomaly_scores(ocsvm_model)

### Saving and Aggregating Anomaly Scores

This code defines the `save_results` function to combine, rank, and save anomaly scores for users and resources. Key steps:

- **Ensemble Scoring**:
  - Combines normalized GNN (`u_gnn_norm`) and OC-SVM (`ocsvm_user`) user scores using a simple average (`0.5 * u_gnn_norm + 0.5 * ocsvm_user`).

- **DataFrames Creation**:
  - **User Scores**: Includes individual GNN and OC-SVM scores, the ensemble score, and is sorted in descending order of anomaly.
  - **Resource Scores**: Includes GNN-based resource scores (normalized) and sorted by descending anomaly.

- **Saving Results**:
  - Saves user scores to `user_scores.csv`.
  - Saves resource scores to `resource_scores.csv`.

- **Return Value**:
  - Returns the sorted user and resource score DataFrames.

- **Example Usage**:
  ```python
  resource_ids = pd.read_csv(CSV_FILE)["resource_id"].astype("category").cat.categories.tolist()
  user_scores, _ = save_results(u_gnn_norm, ocsvm_user, user_ids, r_gnn, resource_ids)
  print(user_scores.head(5))  # Display top suspicious users


In [15]:
def save_results(u_gnn_norm, ocsvm_user, user_ids, r_gnn, resource_ids):
    ensemble = 0.5 * u_gnn_norm + 0.5 * ocsvm_user
    user_scores = pd.DataFrame({
        "user_id": user_ids,
        "gnn_score": u_gnn_norm,
        "ocsvm_score": ocsvm_user,
        "ensemble_score": ensemble
    }).sort_values("ensemble_score", ascending=False)
    resource_scores = pd.DataFrame({
        "resource_id": resource_ids,
        "gnn_score": scale_scores(r_gnn)
    }).sort_values("gnn_score", ascending=False)
    user_scores.to_csv(os.path.join(DATA_DIR, "user_scores.csv"), index=False)
    resource_scores.to_csv(os.path.join(DATA_DIR, "resource_scores.csv"), index=False)
    return user_scores, resource_scores

resource_ids = pd.read_csv(CSV_FILE)["resource_id"].astype("category").cat.categories.tolist()
user_scores, _ = save_results(u_gnn_norm, ocsvm_user, user_ids, r_gnn, resource_ids)
print("\nTop suspicious users:")
print(user_scores.head(min(5, len(user_scores))))


Top suspicious users:
  user_id  gnn_score  ocsvm_score  ensemble_score
2    U003        0.0     1.000000        0.500000
0    U001        0.0     0.188167        0.094083
1    U002        0.0     0.000000        0.000000


**Output Explanation:**

The table shows the top suspicious users ranked by their ensemble anomaly scores. Higher scores indicate higher anomaly likelihood. The columns represent:

- **gnn_score**: Node-level anomaly score predicted by the GNN.  
- **ocsvm_score**: Anomaly score predicted by the OC-SVM.  
- **ensemble_score**: Combined score (average of GNN and OC-SVM) used for ranking users.

In this example, **U003** is flagged as the most suspicious user.

---
