## GNN for fraud detection:
Creating a multigraph for fraud detection using transaction data and applying a Graph Neural Network (GNN) on the edge list can be done in the following steps:

1. Prepare the transaction data: Collect and organize the transaction data into a format that can be used to create the edges of the multigraph. For example, each transaction could be represented as a tuple (node1, node2, attributes), where node1 and node2 represent the sender and receiver of the transaction, and attributes is a dictionary containing properties such as the amount, timestamp, and transaction type.

2. Create the multigraph: Use the transaction data to create a multigraph using the NetworkX library. The add_edge() method can be used to add edges to the multigraph, where each edge represents a transaction.

3. Extract the edges list and their features: Use the edges() method of the multigraph to extract the edges list and their features, which will be used as input to the GNN.

4. Apply a GNN on the edge list: Use a GNN library such as PyTorch Geometric, Deep Graph Library (DGL) or Spektral to apply a GNN on the edge list. The GNN will learn representations of the edges in the multigraph and use them to classify the edges as fraudulent or non-fraudulent.

5. Evaluation: To evaluate the performance of the GNN, you can split the data into train and test sets, and use the test set to evaluate the accuracy, precision, recall, and F1-score of the model.

### Graph construction 

When constructing a graph with transaction edges between card_id and merchant_name, the first step is to identify the nodes in the graph. In this case, the card_id and merchant_name represent the nodes in the graph. Each card_id represents a unique credit card and each merchant_name represents a unique merchant. These nodes can be created by extracting the card_id and merchant_name information from the tabular data and storing them in separate lists.

Once the nodes have been identified, the next step is to create edges between them. These edges represent the transactions that have taken place between a card_id and a merchant_name. To create the edges, a list of transactions is created and for each transaction, an edge is created between the card_id and merchant_name.


We are creating an empty multigraph object called G using the nx.MultiGraph() function from the NetworkX library. Then we add nodes to the graph for each unique card_id and merchant_name from the dataframe df.

The add_nodes_from method is used to add nodes to the graph, it takes an iterable as input and creates a node for each element in the iterable. The df["card_id"].unique() will return a list of unique card_ids in the dataframe, and the df["Merchant Name"].unique will return a list of all the merchant names in the dataframe.

The type attribute is added to each node, it is used to differentiate between card_id and merchant_name nodes. This will help later on when we want to analyze the graph.

**Why did we use a multigraph and not graph?**

The same user (card_id) can buy from the same merchant (Merchant Name) multiple times, so we can have multiple edges between the user and the merchant and for this reason we used multigraph instead of graph.


In [1]:
import os
os.chdir("../")
%pwd

'd:\\Final-Year-Project\\Credit-Card-Fraud-Detection-Using-GNN'

In [2]:
# Entity

from dataclasses import dataclass
from pathlib import Path

@dataclass(frozen=True)
class GraphConstructionConfig:
    root_dir: Path
    transformed_data_path: Path

In [3]:
from Credit_Card_Fraud_Detection.constants import *
from Credit_Card_Fraud_Detection.utils.common import read_yaml, create_directories

In [4]:
# Configuration

class ConfigurationManager:
    def __init__(
        self,
        config_filepath = CONFIG_FILE_PATH,
        params_filepath = PARAMS_FILE_PATH,
        schema_filepath = SCHEMA_FILE_PATH ):

        self.config = read_yaml(config_filepath)
        self.params = read_yaml(params_filepath)
        self.schema = read_yaml(schema_filepath)

        create_directories([self.config.artifacts_root])


def get_graph_construction_config(self) -> GraphConstructionConfig:
    config = self.config.graph_construction

    create_directories([config.root_dir])

    graph_construction_config = GraphConstructionConfig(
        root_dir=config.root_dir,
        transformed_data_path=config.transformed_data_path,
    )

    return graph_construction_config

In [5]:
import os
import torch
import pandas as pd
from Credit_Card_Fraud_Detection import logger
from torch_geometric.data import Data

In [6]:
class GraphConstruction:
    def __init__(self, config: GraphConstructionConfig):
        self.config = config
        self.graph = None  # PyG Graph

    def construct_graph(self):
        print("Loading transformed data...")
        dataset = pd.read_csv(self.config.transformed_data_path)

        # Create unique ID mappings for cards and merchants
        card_ids = {card: idx for idx, card in enumerate(dataset['card_id'].unique())}
        merchant_ids = {merchant: idx + len(card_ids) for idx, merchant in enumerate(dataset["Merchant Name"].unique())}

        # Convert card_id and Merchant Name to node indices
        src_nodes = torch.tensor(dataset['card_id'].map(card_ids).values, dtype=torch.long)
        dst_nodes = torch.tensor(dataset["Merchant Name"].map(merchant_ids).values, dtype=torch.long)

        # Stack to create edge_index (2, num_edges) for PyG
        edge_index = torch.stack([src_nodes, dst_nodes], dim=0)

        # Edge attributes (Converted to tensors)
        edge_attr = torch.stack([
            torch.tensor(dataset["Year"].values, dtype=torch.int32),
            torch.tensor(dataset["Month"].values, dtype=torch.int32),
            torch.tensor(dataset["Day"].values, dtype=torch.int32),
            torch.tensor(dataset["Hour"].values, dtype=torch.int32),
            torch.tensor(dataset["Minute"].values, dtype=torch.int32),
            torch.tensor(dataset["Amount"].values, dtype=torch.float32),
            torch.tensor(dataset["Use Chip"].values, dtype=torch.int32),
            torch.tensor(dataset["Merchant City"].factorize()[0], dtype=torch.int32),  # Encoding strings
            torch.tensor(dataset["Errors?"].values, dtype=torch.int32),
            torch.tensor(dataset["MCC"].values, dtype=torch.int32)
        ], dim=1)  # Shape: (num_edges, num_features)

        # Create PyG Graph Data Object
        self.graph = Data(edge_index=edge_index, edge_attr=edge_attr)

        print(f"Graph constructed with {self.graph.num_nodes} nodes and {self.graph.num_edges} edges.")

    def save_graph(self):
        graph_path = os.path.join(self.config.root_dir, "fraud_graph.pt")
        torch.save(self.graph, graph_path)
        print(f"Graph saved at {graph_path}")

In [8]:
# Pipeline

try:
    config_manager = ConfigurationManager()
    graph_config = get_graph_construction_config(config_manager)
    graph_builder = GraphConstruction(config=graph_config)
    graph_builder.construct_graph()
    graph_builder.save_graph()
except Exception as e:
    logger.error(f"Error in graph construction: {e}")
    raise e

[2025-03-17 12:06:49,171: INFO: common: yaml file: config\config.yaml loaded successfully]
[2025-03-17 12:06:49,172: INFO: common: yaml file: params.yaml loaded successfully]
[2025-03-17 12:06:49,173: INFO: common: yaml file: schema.yaml loaded successfully]
[2025-03-17 12:06:49,175: INFO: common: created directory at: artifacts]
[2025-03-17 12:06:49,176: INFO: common: created directory at: artifacts/data_transformation]
Loading transformed data...




Graph constructed with 106482 nodes and 24386900 edges.
Graph saved at artifacts/data_transformation\fraud_graph.pt
