## GNN for fraud detection:
Creating a multigraph for fraud detection using transaction data and applying a Graph Neural Network (GNN) on the edge list can be done in the following steps:

1. Prepare the transaction data: Collect and organize the transaction data into a format that can be used to create the edges of the multigraph. For example, each transaction could be represented as a tuple (node1, node2, attributes), where node1 and node2 represent the sender and receiver of the transaction, and attributes is a dictionary containing properties such as the amount, timestamp, and transaction type.

2. Create the multigraph: Use the transaction data to create a multigraph using the NetworkX library. The add_edge() method can be used to add edges to the multigraph, where each edge represents a transaction.

3. Extract the edges list and their features: Use the edges() method of the multigraph to extract the edges list and their features, which will be used as input to the GNN.

4. Apply a GNN on the edge list: Use a GNN library such as PyTorch Geometric, Deep Graph Library (DGL) or Spektral to apply a GNN on the edge list. The GNN will learn representations of the edges in the multigraph and use them to classify the edges as fraudulent or non-fraudulent.

5. Evaluation: To evaluate the performance of the GNN, you can split the data into train and test sets, and use the test set to evaluate the accuracy, precision, recall, and F1-score of the model.

### Graph construction 

When constructing a graph with transaction edges between card_id and merchant_name, the first step is to identify the nodes in the graph. In this case, the card_id and merchant_name represent the nodes in the graph. Each card_id represents a unique credit card and each merchant_name represents a unique merchant. These nodes can be created by extracting the card_id and merchant_name information from the tabular data and storing them in separate lists.

Once the nodes have been identified, the next step is to create edges between them. These edges represent the transactions that have taken place between a card_id and a merchant_name. To create the edges, a list of transactions is created and for each transaction, an edge is created between the card_id and merchant_name.


We are creating an empty multigraph object called G using the nx.MultiGraph() function from the NetworkX library. Then we add nodes to the graph for each unique card_id and merchant_name from the dataframe df.

The add_nodes_from method is used to add nodes to the graph, it takes an iterable as input and creates a node for each element in the iterable. The df["card_id"].unique() will return a list of unique card_ids in the dataframe, and the df["Merchant Name"].unique will return a list of all the merchant names in the dataframe.

The type attribute is added to each node, it is used to differentiate between card_id and merchant_name nodes. This will help later on when we want to analyze the graph.

**Why did we use a multigraph and not graph?**

The same user (card_id) can buy from the same merchant (Merchant Name) multiple times, so we can have multiple edges between the user and the merchant and for this reason we used multigraph instead of graph.


In [1]:
import os
os.chdir("../")
%pwd

'd:\\Final-Year-Project\\Credit-Card-Fraud-Detection-Using-GNN'

In [2]:
# Entity

from dataclasses import dataclass
from pathlib import Path

@dataclass(frozen=True)
class GraphConstructionConfig:
    root_dir: Path
    transformed_data_path: Path
    graph_data_path: Path

In [3]:
from Credit_Card_Fraud_Detection.constants import *
from Credit_Card_Fraud_Detection.utils.common import read_yaml, create_directories

In [4]:
class ConfigurationManager:
    def __init__(
        self,
        config_filepath=CONFIG_FILE_PATH,
        params_filepath=PARAMS_FILE_PATH,
        schema_filepath=SCHEMA_FILE_PATH
    ):
        self.config = read_yaml(config_filepath)
        self.params = read_yaml(params_filepath)
        self.schema = read_yaml(schema_filepath)
        create_directories([self.config.artifacts_root])

    def get_graph_construction_config(self) -> GraphConstructionConfig:
        print("get_graph_construction_config method called") # add this line
        config = self.config.graph_construction
        create_directories([config.root_dir])
        
        graph_construction_config = GraphConstructionConfig(
            root_dir=config.root_dir,
            transformed_data_path=config.transformed_data_path,
            graph_data_path=config.graph_data_path,
        )
        return graph_construction_config


In [5]:
import os
import torch
import pandas as pd
from Credit_Card_Fraud_Detection import logger
from torch_geometric.data import Data 
from torch_geometric.data import HeteroData

In [6]:

class GraphConstructor:
    def __init__(self, config):
        self.config = config

    def load_transformed_data(self):
        try:
            df = pd.read_csv(self.config.transformed_data_path)
            logger.info("Transformed data loaded successfully.")
            return df
        except FileNotFoundError:
            logger.error(f"File not found: {self.config.transformed_data_path}")
            return None

    def create_node_ids(self, df):
        #check if the correct column name exists.
        if "card_id" in df.columns:
          card_nodes = pd.Series(df["card_id"].unique()).reset_index(drop=True)
          df["card_node"] = df["card_id"].map(pd.Series(card_nodes.index, index=card_nodes))
        elif "customer_id" in df.columns: # example of renamed column
          card_nodes = pd.Series(df["customer_id"].unique()).reset_index(drop=True)
          df["card_node"] = df["customer_id"].map(pd.Series(card_nodes.index, index=card_nodes))
        else:
          logger.error("card_id or customer_id was not found")
          return None, None, None, None

        merchant_nodes = pd.Series(df["merchant"].unique()).reset_index(drop=True)
        transaction_nodes = pd.Series(df.index).reset_index(drop=True)

        merchant_node_mapping = pd.Series(merchant_nodes.index, index=merchant_nodes)
        transaction_node_mapping = pd.Series(transaction_nodes.index, index=transaction_nodes)

        df["merchant_node"] = df["merchant"].map(merchant_node_mapping)
        df["trans_node"] = df.index.map(transaction_node_mapping)

        logger.info("Node IDs created.")
        return df, len(card_nodes), len(merchant_nodes), len(transaction_nodes)

    def create_edge_indices(self, df):
        try:
            transaction_to_card_edges = torch.tensor([df["trans_node"].tolist(), df["card_node"].tolist()], dtype=torch.long)
            transaction_to_merchant_edges = torch.tensor([df["trans_node"].tolist(), df["merchant_node"].tolist()], dtype=torch.long)
            card_to_transaction_edges = torch.tensor([df["card_node"].tolist(), df["trans_node"].tolist()], dtype=torch.long)
            merchant_to_transaction_edges = torch.tensor([df["merchant_node"].tolist(), df["trans_node"].tolist()], dtype=torch.long)

            return transaction_to_card_edges, transaction_to_merchant_edges, card_to_transaction_edges, merchant_to_transaction_edges

        except RuntimeError as e:
            logger.error(f"Error creating edge indices: {e}")
            return None, None, None, None

    def construct_graph(self):
        df = self.load_transformed_data()
        if df is None:
            return None

        df, num_card_nodes, num_merchant_nodes, num_transaction_nodes = self.create_node_ids(df)

        if df is None:
            return None

        transaction_to_card_edges, transaction_to_merchant_edges, card_to_transaction_edges, merchant_to_transaction_edges = self.create_edge_indices(df)

        if transaction_to_card_edges is None:
            return None

        # Node features (example: amount, category, etc.)
        node_features = torch.tensor(df[["amt", "category", "state", "city_pop", "lat", "long", "merch_lat", "merch_long"]].values, dtype=torch.float)

        # Create the graph data object
        edge_index = torch.cat([
            transaction_to_card_edges,
            transaction_to_merchant_edges,
            card_to_transaction_edges,
            merchant_to_transaction_edges
        ], dim=1)

        data = Data(x=node_features, edge_index=edge_index)

        # Save the graph data
        torch.save(data, self.config.graph_data_path)
        logger.info(f"Graph data saved to: {self.config.graph_data_path}")
        return data

In [7]:
# Pipeline Execution
try:
    config = ConfigurationManager()
    print(f"Config object module: {config.__class__.__module__}") # Add this line
    print(f"Config object file: {config.__class__.__module__}") # Add this line
    graph_construction_config = config.get_graph_construction_config()
    print(graph_construction_config) # Add this line
    graph_constructor = GraphConstructor(config=graph_construction_config)
    graph_data = graph_constructor.construct_graph()
    if graph_data is not None:
        logger.info("Graph construction completed successfully.")

except Exception as e:
    logger.exception("An error occurred during graph construction.")
    raise e

[2025-03-18 01:07:07,139: INFO: common: yaml file: config\config.yaml loaded successfully]
[2025-03-18 01:07:07,142: INFO: common: yaml file: params.yaml loaded successfully]
[2025-03-18 01:07:07,142: INFO: common: yaml file: schema.yaml loaded successfully]
[2025-03-18 01:07:07,144: INFO: common: created directory at: artifacts]
Config object module: __main__
Config object file: __main__
get_graph_construction_config method called
[2025-03-18 01:07:07,144: INFO: common: created directory at: artifacts/graph_construction]
GraphConstructionConfig(root_dir='artifacts/graph_construction', transformed_data_path='artifacts\\data_transformation\\transformed_dataset.csv', graph_data_path='artifacts/graph_construction/graph_data.pt')
[2025-03-18 01:07:09,696: INFO: 1737171361: Transformed data loaded successfully.]
[2025-03-18 01:07:09,897: INFO: 1737171361: Node IDs created.]
[2025-03-18 01:07:10,736: INFO: 1737171361: Graph data saved to: artifacts/graph_construction/graph_data.pt]
[2025-03-

In [8]:
# # Pipeline Execution
# try:
#     config = ConfigurationManager()
#     graph_construction_config = config.get_graph_construction_config()
#     graph_constructor = GraphConstructor(config=graph_construction_config)
#     graph_data = graph_constructor.construct_graph()
#     if graph_data is not None:
#         logger.info("Graph construction completed successfully.")

# except Exception as e:
#     logger.exception("An error occurred during graph construction.")
#     raise e