## GNN for fraud detection:
Creating a multigraph for fraud detection using transaction data and applying a Graph Neural Network (GNN) on the edge list can be done in the following steps:

1. Prepare the transaction data: Collect and organize the transaction data into a format that can be used to create the edges of the multigraph. For example, each transaction could be represented as a tuple (node1, node2, attributes), where node1 and node2 represent the sender and receiver of the transaction, and attributes is a dictionary containing properties such as the amount, timestamp, and transaction type.

2. Create the multigraph: Use the transaction data to create a multigraph using the NetworkX library. The add_edge() method can be used to add edges to the multigraph, where each edge represents a transaction.

3. Extract the edges list and their features: Use the edges() method of the multigraph to extract the edges list and their features, which will be used as input to the GNN.

4. Apply a GNN on the edge list: Use a GNN library such as PyTorch Geometric, Deep Graph Library (DGL) or Spektral to apply a GNN on the edge list. The GNN will learn representations of the edges in the multigraph and use them to classify the edges as fraudulent or non-fraudulent.

5. Evaluation: To evaluate the performance of the GNN, you can split the data into train and test sets, and use the test set to evaluate the accuracy, precision, recall, and F1-score of the model.

### Graph construction 

When constructing a graph with transaction edges between card_id and merchant_name, the first step is to identify the nodes in the graph. In this case, the card_id and merchant_name represent the nodes in the graph. Each card_id represents a unique credit card and each merchant_name represents a unique merchant. These nodes can be created by extracting the card_id and merchant_name information from the tabular data and storing them in separate lists.

Once the nodes have been identified, the next step is to create edges between them. These edges represent the transactions that have taken place between a card_id and a merchant_name. To create the edges, a list of transactions is created and for each transaction, an edge is created between the card_id and merchant_name.


We are creating an empty multigraph object called G using the nx.MultiGraph() function from the NetworkX library. Then we add nodes to the graph for each unique card_id and merchant_name from the dataframe df.

The add_nodes_from method is used to add nodes to the graph, it takes an iterable as input and creates a node for each element in the iterable. The df["card_id"].unique() will return a list of unique card_ids in the dataframe, and the df["Merchant Name"].unique will return a list of all the merchant names in the dataframe.

The type attribute is added to each node, it is used to differentiate between card_id and merchant_name nodes. This will help later on when we want to analyze the graph.

**Why did we use a multigraph and not graph?**

The same user (card_id) can buy from the same merchant (Merchant Name) multiple times, so we can have multiple edges between the user and the merchant and for this reason we used multigraph instead of graph.


In [1]:
import os
os.chdir("../")
%pwd

'd:\\Final-Year-Project\\Credit-Card-Fraud-Detection-Using-GNN'

In [2]:
import pandas as pd
df = pd.read_csv("artifacts/data_transformation/transformed_dataset.csv")

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1296675 entries, 0 to 1296674
Data columns (total 18 columns):
 #   Column                        Non-Null Count    Dtype  
---  ------                        --------------    -----  
 0   high_amt                      1296675 non-null  float64
 1   amt_ratio_merchant            1296675 non-null  float64
 2   sqrt_amt                      1296675 non-null  float64
 3   amt                           1296675 non-null  float64
 4   customer_avg_amt              1296675 non-null  float64
 5   amt_diff_customer_avg         1296675 non-null  float64
 6   hour_cos                      1296675 non-null  float64
 7   amt_per_city_pop              1296675 non-null  float64
 8   customer_min_amt              1296675 non-null  float64
 9   merchant_category_fraud_risk  1296675 non-null  float64
 10  merchant_avg_amt              1296675 non-null  float64
 11  merchant_min_amt              1296675 non-null  float64
 12  customer_amt_std            

In [4]:
# Entity

from dataclasses import dataclass
from pathlib import Path

@dataclass(frozen=True)
class GraphConstructionConfig:
    root_dir: Path
    transformed_data_path: Path
    graph_data_path: Path

In [5]:
from Credit_Card_Fraud_Detection.constants import *
from Credit_Card_Fraud_Detection.utils.common import read_yaml, create_directories

In [6]:
class ConfigurationManager:
    def __init__(
        self,
        config_filepath=CONFIG_FILE_PATH,
        params_filepath=PARAMS_FILE_PATH,
        schema_filepath=SCHEMA_FILE_PATH
    ):
        self.config = read_yaml(config_filepath)
        self.params = read_yaml(params_filepath)
        self.schema = read_yaml(schema_filepath)
        create_directories([self.config.artifacts_root])

    def get_graph_construction_config(self) -> GraphConstructionConfig:
        print("get_graph_construction_config method called") # add this line
        config = self.config.graph_construction
        create_directories([config.root_dir])
        
        graph_construction_config = GraphConstructionConfig(
            root_dir=config.root_dir,
            transformed_data_path=config.transformed_data_path,
            graph_data_path=config.graph_data_path,
        )
        return graph_construction_config


In [7]:
# import os
# import torch
# import pandas as pd
# import numpy as np
# from Credit_Card_Fraud_Detection import logger
# from torch_geometric.data import Data 
# from torch_geometric.data import HeteroData
# from sklearn.model_selection import train_test_split

In [8]:
# import pandas as pd
# import torch
# from torch_geometric.data import HeteroData
# import logging
# import os
# from sklearn.model_selection import train_test_split

# # Configure logging
# logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
# logger = logging.getLogger(__name__)

# class GraphConstructor:
#     def __init__(self, config):
#         self.config = config

#     def create_node_ids(self, df):
#         card_ids = {num: i for i, num in enumerate(df["card_id"].unique())}
#         merchant_ids = {name: i for i, name in enumerate(df["merchant"].unique())}
#         transaction_ids = {t: i for i, t in enumerate(df["transaction_unique"].unique())}

#         df["transaction_node"] = df["transaction_unique"].map(transaction_ids).astype(int)
#         df["card_node"] = df["card_id"].map(card_ids).astype(int)
#         df["merchant_node"] = df["merchant"].map(merchant_ids).astype(int)

#         logger.info(f"New max transaction_node in df: {df['transaction_node'].max()}")
#         logger.info(f"Total transaction nodes after remapping: {len(transaction_ids)}")
#         logger.info("Node indices created successfully.")

#         return df, len(card_ids), len(merchant_ids), len(transaction_ids)

#     def create_edge_indices(self, df):
#         card_to_transaction_edges = torch.tensor(df[["card_node", "transaction_node"]].values.T, dtype=torch.long)
#         transaction_to_merchant_edges = torch.tensor(df[["transaction_node", "merchant_node"]].values.T, dtype=torch.long)

#         logger.info(f"Card-to-Transaction edges shape: {card_to_transaction_edges.shape}")
#         logger.info(f"Transaction-to-Merchant edges shape: {transaction_to_merchant_edges.shape}")

#         return card_to_transaction_edges, transaction_to_merchant_edges

#     def create_node_features(self, df, num_card_nodes, num_merchant_nodes, num_transaction_nodes):
#         numerical_features = ["amt", "city_pop", "lat", "long", "merch_lat", "merch_long", "age"] # Hardcoded list

#         node_features_dim = len(numerical_features)

#         card_features = torch.zeros((num_card_nodes, node_features_dim), dtype=torch.float32)
#         merchant_features = torch.zeros((num_merchant_nodes, node_features_dim), dtype=torch.float32)

#         for card_id, group in df.groupby("card_node"):
#             if card_id < num_card_nodes:
#                 card_features[card_id] = torch.tensor(group[numerical_features].mean().values, dtype=torch.float32)

#         for merchant_id, group in df.groupby("merchant_node"):
#             if merchant_id < num_merchant_nodes:
#                 merchant_features[merchant_id] = torch.tensor(group[numerical_features].mean().values, dtype=torch.float32)

#         transaction_features = torch.tensor(df[numerical_features].values, dtype=torch.float32)

#         logger.info("Node features created correctly.")

#         return card_features, merchant_features, transaction_features

#     def create_transaction_labels(self, df):
#         y = torch.tensor(df["is_fraud"].values, dtype=torch.float32).view(-1, 1)
#         return y

#     def train_test_split_nodes(self, y, num_transaction_nodes, test_size=0.2, random_state=42):
#         train_mask, test_mask = train_test_split(
#             torch.arange(num_transaction_nodes),
#             test_size=test_size,
#             random_state=random_state,
#             stratify=y.squeeze().numpy()
#         )

#         transaction_train_mask = torch.zeros(num_transaction_nodes, dtype=torch.bool)
#         transaction_test_mask = torch.zeros(num_transaction_nodes, dtype=torch.bool)
#         transaction_train_mask[train_mask] = True
#         transaction_test_mask[test_mask] = True

#         logger.info("Train-test split applied.")

#         return transaction_train_mask, transaction_test_mask

#     def describe_data_structure(self, data, filepath):
#         with open(filepath, 'w') as f:
#             f.write("Data Object Structure:\n")
#             for node_type in data.node_types:
#                 f.write(f"  Node type: {node_type}\n")
#                 if hasattr(data[node_type], 'x'):
#                     f.write(f"    x: {data[node_type].x.shape}, dtype={data[node_type].x.dtype}\n")
#                 if hasattr(data[node_type], 'y'):
#                     f.write(f"    y: {data[node_type].y.shape}, dtype={data[node_type].y.dtype}\n")
#                 if hasattr(data[node_type], 'train_mask'):
#                     f.write(f"    train_mask: {data[node_type].train_mask.shape}, dtype={data[node_type].train_mask.dtype}\n")
#                 if hasattr(data[node_type], 'test_mask'):
#                     f.write(f"    test_mask: {data[node_type].test_mask.shape}, dtype={data[node_type].test_mask.dtype}\n")
#             for edge_type in data.edge_types:
#                 f.write(f"  Edge type: {edge_type}\n")
#                 f.write(f"    edge_index: {data[edge_type].edge_index.shape}, dtype={data[edge_type].edge_index.dtype}\n")

#         logger.info(f"Data structure description saved to: {filepath}")

#     def construct_graph(self):
#         df = pd.read_csv(self.config.transformed_data_path)

#         df, num_card_nodes, num_merchant_nodes, num_transaction_nodes = self.create_node_ids(df)
#         card_to_transaction_edges, transaction_to_merchant_edges = self.create_edge_indices(df)
#         card_features, merchant_features, transaction_features = self.create_node_features(df, num_card_nodes, num_merchant_nodes, num_transaction_nodes)
#         y = self.create_transaction_labels(df)
#         transaction_train_mask, transaction_test_mask = self.train_test_split_nodes(y, num_transaction_nodes)

#         data = HeteroData()
#         data["card"].x = card_features
#         data["merchant"].x = merchant_features
#         data["transaction"].x = transaction_features
#         data["card", "transacts", "transaction"].edge_index = card_to_transaction_edges
#         data["transaction", "occurs_at", "merchant"].edge_index = transaction_to_merchant_edges
#         data["transaction", "transacted_by", "card"].edge_index = card_to_transaction_edges.flip(0)
#         data["merchant", "related_to", "transaction"].edge_index = transaction_to_merchant_edges.flip(0)
#         data["transaction"].y = y
#         data["transaction"].train_mask = transaction_train_mask
#         data["transaction"].test_mask = transaction_test_mask

#         torch.save(data, self.config.graph_data_path)
#         logger.info(f"Graph data saved to: {self.config.graph_data_path}")

#         # Save graph structure description
#         structure_save_path = os.path.join(self.config.root_dir, "graph_structure.txt")
#         self.describe_data_structure(data, structure_save_path)

#         # Save updated DataFrame
#         node_mapped_data_path = os.path.join(self.config.root_dir, "node_mapped_data.csv")
#         df.to_csv(node_mapped_data_path, index=False)
#         logger.info(f"Updated data frame saved to: {node_mapped_data_path}")

#         return data


In [9]:
import pandas as pd
import torch
from torch_geometric.data import HeteroData
import logging
import os
from sklearn.model_selection import train_test_split

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

class GraphConstructor:
    def __init__(self, config):
        self.config = config

    def create_node_ids(self, df):
        customer_ids = {num: i for i, num in enumerate(df["customer_id"].unique())}
        merchant_ids = {num: i for i, num in enumerate(df["merchant_id"].unique())}
        transaction_ids = {t: i for i, t in enumerate(df["transaction_unique"].unique())}

        df["transaction_node"] = df["transaction_unique"].map(transaction_ids).astype(int)
        df["customer_node"] = df["customer_id"].map(customer_ids).astype(int)
        df["merchant_node"] = df["merchant_id"].map(merchant_ids).astype(int)

        logger.info(f"New max transaction_node in df: {df['transaction_node'].max()}")
        logger.info(f"Total transaction nodes after remapping: {len(transaction_ids)}")
        logger.info("Node indices created successfully.")

        return df, customer_ids, merchant_ids, transaction_ids

    def create_edge_indices(self, df):
        customer_to_transaction_edges = torch.tensor(df[["customer_node", "transaction_node"]].values.T, dtype=torch.long)
        transaction_to_merchant_edges = torch.tensor(df[["transaction_node", "merchant_node"]].values.T, dtype=torch.long)

        logger.info(f"Customer-to-Transaction edges shape: {customer_to_transaction_edges.shape}")
        logger.info(f"Transaction-to-Merchant edges shape: {transaction_to_merchant_edges.shape}")

        return customer_to_transaction_edges, transaction_to_merchant_edges

    def create_node_features(self, df, customer_ids, merchant_ids, transaction_ids):
        customer_features_list = ["customer_avg_amt", "customer_min_amt", "customer_amt_std"]
        merchant_features_list = ["merchant_avg_amt", "merchant_min_amt", "merchant_amt_std"]
        transaction_features_list = [
            "high_amt", "amt_ratio_merchant", "sqrt_amt", "amt", "amt_diff_customer_avg",
            "hour_cos", "amt_per_city_pop", "merchant_category_fraud_risk"
        ]

        customer_features_dim = len(customer_features_list)
        merchant_features_dim = len(merchant_features_list)
        transaction_features_dim = len(transaction_features_list)

        customer_features = torch.zeros((len(customer_ids), customer_features_dim), dtype=torch.float32)
        merchant_features = torch.zeros((len(merchant_ids), merchant_features_dim), dtype=torch.float32)
        transaction_features = torch.tensor(df[transaction_features_list].values, dtype=torch.float32)

        for customer_id, group in df.groupby("customer_node"):
            if customer_id < len(customer_ids):
                customer_features[customer_id] = torch.tensor(group[customer_features_list].mean().values, dtype=torch.float32)

        for merchant_id, group in df.groupby("merchant_node"):
            if merchant_id < len(merchant_ids):
                merchant_features[merchant_id] = torch.tensor(group[merchant_features_list].mean().values, dtype=torch.float32)

        logger.info("Node features created correctly.")

        return customer_features, merchant_features, transaction_features

    def create_transaction_labels(self, df, transaction_ids):
        """Creates labels only for unique transaction nodes."""
        transaction_labels = {}
        for transaction_id, group in df.groupby("transaction_unique"):
            transaction_labels[transaction_id] = group["is_fraud"].iloc[0]

        y = torch.tensor([transaction_labels[transaction_id] for transaction_id in transaction_ids.keys()], dtype=torch.float32).view(-1, 1)
        return y

    def train_test_split_nodes(self, y, num_transaction_nodes, test_size=0.2, random_state=42):
        train_indices, test_indices = train_test_split(
            torch.arange(num_transaction_nodes),
            test_size=test_size,
            random_state=random_state,
            stratify=y.squeeze().numpy()
        )

        transaction_train_mask = torch.zeros(num_transaction_nodes, dtype=torch.bool)
        transaction_test_mask = torch.zeros(num_transaction_nodes, dtype=torch.bool)

        transaction_train_mask[train_indices] = True
        transaction_test_mask[test_indices] = True

        logger.info("Train-test split applied.")

        return transaction_train_mask, transaction_test_mask

    def describe_data_structure(self, data, filepath):
        with open(filepath, 'w') as f:
            f.write("Data Object Structure:\n")
            for node_type in data.node_types:
                f.write(f"  Node type: {node_type}\n")
                if hasattr(data[node_type], 'x'):
                    f.write(f"    x: {data[node_type].x.shape}, dtype={data[node_type].x.dtype}\n")
                if hasattr(data[node_type], 'y'):
                    f.write(f"    y: {data[node_type].y.shape}, dtype={data[node_type].y.dtype}\n")
                if hasattr(data[node_type], 'train_mask'):
                    f.write(f"    train_mask: {data[node_type].train_mask.shape}, dtype={data[node_type].train_mask.dtype}\n")
                if hasattr(data[node_type], 'test_mask'):
                    f.write(f"    test_mask: {data[node_type].test_mask.shape}, dtype={data[node_type].test_mask.dtype}\n")
                if hasattr(data[node_type], 'n_id'):
                    f.write(f"    n_id: {data[node_type].n_id.shape}, dtype={data[node_type].n_id.dtype}\n")
            for edge_type in data.edge_types:
                f.write(f"  Edge type: {edge_type}\n")
                f.write(f"    edge_index: {data[edge_type].edge_index.shape}, dtype={data[edge_type].edge_index.dtype}\n")

        logger.info(f"Data structure description saved to: {filepath}")

    def construct_graph(self):
        df = pd.read_csv(self.config.transformed_data_path)

        df, customer_ids, merchant_ids, transaction_ids = self.create_node_ids(df)
        customer_to_transaction_edges, transaction_to_merchant_edges = self.create_edge_indices(df)
        customer_features, merchant_features, transaction_features = self.create_node_features(df, customer_ids, merchant_ids, transaction_ids)
        y = self.create_transaction_labels(df, transaction_ids)
        transaction_train_mask, transaction_test_mask = self.train_test_split_nodes(y, len(transaction_ids))

        data = HeteroData()
        data["customer"].x = customer_features
        data["merchant"].x = merchant_features
        data["transaction"].x = transaction_features
        data["customer", "transacts", "transaction"].edge_index = customer_to_transaction_edges
        data["transaction", "occurs_at", "merchant"].edge_index = transaction_to_merchant_edges
        data["transaction", "transacted_by", "customer"].edge_index = customer_to_transaction_edges.flip(0)
        data["merchant", "related_to", "transaction"].edge_index = transaction_to_merchant_edges.flip(0)
        data["transaction"].y = y
        data["transaction"].train_mask = transaction_train_mask
        data["transaction"].test_mask = transaction_test_mask

        # Assign n_id attributes
        data["customer"].n_id = torch.tensor(list(customer_ids.keys()))
        data["merchant"].n_id = torch.tensor(list(merchant_ids.values()))
        data["transaction"].n_id = torch.tensor(list(transaction_ids.keys()))

        torch.save(data, self.config.graph_data_path)
        logger.info(f"Graph data saved to: {self.config.graph_data_path}")

        # Save graph structure description
        structure_save_path = os.path.join(self.config.root_dir, "graph_structure.txt")
        self.describe_data_structure(data, structure_save_path)

        # Save updated DataFrame
        node_mapped_data_path = os.path.join(self.config.root_dir, "node_mapped_data.csv")
        df.to_csv(node_mapped_data_path, index=False)
        logger.info(f"Updated data frame saved to: {node_mapped_data_path}")

        return data

In [10]:
# Pipeline Execution

try:
    config = ConfigurationManager()
    graph_construction_config = config.get_graph_construction_config()
    graph_constructor = GraphConstructor(config=graph_construction_config)
    data = graph_constructor.construct_graph()  # Change this line

    if data is not None:
        logger.info("Graph construction completed successfully.")

except Exception as e:
    logger.exception("An error occurred during graph construction.")
    raise e

[2025-03-22 22:05:00,625: INFO: common: yaml file: config\config.yaml loaded successfully]
[2025-03-22 22:05:00,625: INFO: common: yaml file: params.yaml loaded successfully]
[2025-03-22 22:05:00,625: INFO: common: yaml file: schema.yaml loaded successfully]
[2025-03-22 22:05:00,625: INFO: common: created directory at: artifacts]
get_graph_construction_config method called
[2025-03-22 22:05:00,625: INFO: common: created directory at: artifacts/graph_construction]
[2025-03-22 22:05:02,762: INFO: 1079044015: New max transaction_node in df: 1296674]
[2025-03-22 22:05:02,762: INFO: 1079044015: Total transaction nodes after remapping: 1296675]
[2025-03-22 22:05:02,762: INFO: 1079044015: Node indices created successfully.]
[2025-03-22 22:05:02,795: INFO: 1079044015: Customer-to-Transaction edges shape: torch.Size([2, 1296675])]
[2025-03-22 22:05:02,795: INFO: 1079044015: Transaction-to-Merchant edges shape: torch.Size([2, 1296675])]
[2025-03-22 22:05:04,056: INFO: 1079044015: Node features c

In [11]:
# import torch
# import networkx as nx
# import matplotlib.pyplot as plt
# import pandas as pd
# import random
# import logging

# def visualize_fraud_subgraph(graph_data_path, transformed_data_path, num_transactions=500, fraud_ratio=0.1):
#     """
#     Visualizes a subgraph containing customers, merchants, and transactions,
#     ensuring ~10% fraudulent transactions.
#     """
#     try:
#         # Load graph data
#         data = torch.load(graph_data_path)
#         df = pd.read_csv(transformed_data_path)
        
#         # Identify fraud and non-fraud transactions
#         fraud_transactions = df[df['is_fraud'] == 1]['transaction_node'].tolist()
#         non_fraud_transactions = df[df['is_fraud'] == 0]['transaction_node'].tolist()
        
#         # Determine how many fraud transactions to include
#         num_fraud = min(len(fraud_transactions), int(num_transactions * fraud_ratio))
#         num_non_fraud = min(len(non_fraud_transactions), num_transactions - num_fraud)
        
#         # Select transactions
#         selected_fraud = random.sample(fraud_transactions, num_fraud)
#         selected_non_fraud = random.sample(non_fraud_transactions, num_non_fraud)
#         selected_transactions = set(selected_fraud + selected_non_fraud)
        
#         # Extract relevant edges
#         edges = []
#         card_to_transaction = data["card", "transacts", "transaction"].edge_index.numpy()
#         transaction_to_merchant = data["transaction", "occurs_at", "merchant"].edge_index.numpy()
        
#         # Filter edges that involve selected transactions
#         for src, dst in zip(*card_to_transaction):
#             if dst in selected_transactions:
#                 edges.append((src, dst))
        
#         for src, dst in zip(*transaction_to_merchant):
#             if src in selected_transactions:
#                 edges.append((src, dst))
        
#         # Create the graph
#         G = nx.DiGraph()
#         G.add_edges_from(edges)
        
#         # Node types and sizes
#         node_colors = {}
#         node_sizes = {}
#         for node in G.nodes:
#             if node in df['card_node'].values:
#                 node_colors[node] = 'skyblue'  # Customers (Cards)
#                 node_sizes[node] = 800  # Large
#             elif node in df['merchant_node'].values:
#                 node_colors[node] = 'lightgreen'  # Merchants
#                 node_sizes[node] = 800  # Large
#             elif node in selected_transactions:
#                 node_colors[node] = 'red' if node in selected_fraud else 'orange'  # Transactions
#                 node_sizes[node] = 400  # Smaller
        
#         # Edge colors (fraud vs non-fraud)
#         edge_colors = ['red' if dst in selected_fraud else 'black' for src, dst in edges]
        
#         # Graph layout
#         pos = nx.spring_layout(G, k=0.3, iterations=50)
        
#         # Draw graph
#         plt.figure(figsize=(14, 10))
#         nx.draw(
#             G, pos, with_labels=True,
#             node_color=[node_colors[n] for n in G.nodes],
#             node_size=[node_sizes[n] for n in G.nodes],
#             edge_color=edge_colors, width=1.5,
#             font_size=8, arrows=True
#         )
        
#         # Legend
#         legend_labels = {
#             'skyblue': 'Customers',
#             'lightgreen': 'Merchants',
#             'orange': 'Non-Fraud Transactions',
#             'red': 'Fraud Transactions',
#             'black': 'Non-Fraud Edges',
#             'red': 'Fraud Edges'
#         }
#         legend_patches = [plt.Line2D([0], [0], marker='o', color='w', label=label,
#                                       markersize=10, markerfacecolor=color) for color, label in legend_labels.items()]
#         plt.legend(handles=legend_patches, loc="upper left")
#         plt.title("Fraudulent and Non-Fraudulent Transactions Subgraph")
#         plt.show()
    
#     except Exception as e:
#         logging.error(f"Error visualizing graph: {e}")

# # Example usage
# graph_data_path = 'artifacts/graph_construction/graph_data.pt'
# transformed_data_path = 'artifacts/graph_construction/node_mapped_data.csv'
# visualize_fraud_subgraph(graph_data_path, transformed_data_path, num_transactions=3, fraud_ratio=0.1)

In [12]:
# import torch
# import networkx as nx
# import matplotlib.pyplot as plt
# import pandas as pd
# import random
# import logging

# def visualize_fraud_subgraph(graph_data_path, transformed_data_path, num_transactions=500, fraud_ratio=0.1):
#     """
#     Visualizes a subgraph containing customers, merchants, and transactions,
#     ensuring ~10% fraudulent transactions.
#     """
#     try:
#         # Load graph data
#         data = torch.load(graph_data_path)
#         df = pd.read_csv(transformed_data_path)

#         # Identify fraud and non-fraud transactions
#         fraud_transactions = df[df['is_fraud'] == 1]['transaction_node'].tolist()
#         non_fraud_transactions = df[df['is_fraud'] == 0]['transaction_node'].tolist()

#         # Determine how many fraud transactions to include
#         num_fraud = min(len(fraud_transactions), int(num_transactions * fraud_ratio))
#         num_non_fraud = min(len(non_fraud_transactions), num_transactions - num_fraud)

#         # Select transactions
#         selected_fraud = random.sample(fraud_transactions, num_fraud)
#         selected_non_fraud = random.sample(non_fraud_transactions, num_non_fraud)
#         selected_transactions = set(selected_fraud + selected_non_fraud)

#         print(f"Selected Fraud Transactions: {selected_fraud}")
#         print(f"Selected Non-Fraud Transactions: {selected_non_fraud}")

#         # Extract relevant edges
#         edges = []
#         card_to_transaction = data["card", "transacts", "transaction"].edge_index.numpy()
#         transaction_to_merchant = data["transaction", "occurs_at", "merchant"].edge_index.numpy()

#         # Get node IDs for card, transaction, and merchant
#         card_n_ids = data["card"].n_id.numpy()
#         transaction_n_ids = data["transaction"].n_id.numpy()
#         merchant_n_ids = data["merchant"].n_id.numpy()

#         # Create a mapping from transaction_node in df to transaction_n_ids
#         transaction_map = dict(zip(df['transaction_node'], range(len(transaction_n_ids))))

#         # Filter edges that involve selected transactions
#         for src, dst in zip(*card_to_transaction):
#             if transaction_n_ids[dst] in [transaction_n_ids[transaction_map[t]] for t in selected_transactions]:
#                 edges.append((card_n_ids[src], transaction_n_ids[dst]))

#         for src, dst in zip(*transaction_to_merchant):
#             if transaction_n_ids[src] in [transaction_n_ids[transaction_map[t]] for t in selected_transactions]:
#                 edges.append((transaction_n_ids[src], merchant_n_ids[dst]))

#         print(f"Edges: {edges}")

#         # Create the graph
#         G = nx.DiGraph()
#         G.add_edges_from(edges)

#         # Node types and sizes
#         node_colors = {}
#         node_sizes = {}

#         # Get card and merchant node IDs from node_stores
#         card_nodes = set(card_n_ids)
#         merchant_nodes = set(merchant_n_ids)

#         print(f"Card Nodes: {card_nodes}")
#         print(f"Merchant Nodes: {merchant_nodes}")

#         for node in G.nodes:
#             if node in card_nodes:
#                 node_colors[node] = 'skyblue'  # Customers (Cards)
#                 node_sizes[node] = 800  # Large
#             elif node in merchant_nodes:
#                 node_colors[node] = 'lightgreen'  # Merchants
#                 node_sizes[node] = 800  # Large
#             elif node in transaction_n_ids:
#                 if df[df['transaction_node'] == list(transaction_map.keys())[list(transaction_map.values()).index(list(transaction_n_ids).index(node))]['is_fraud'].values[0] == 1]:
#                     node_colors[node] = 'red'
#                 else:
#                     node_colors[node] = 'orange'
#                 node_sizes[node] = 400

#         # Edge colors (fraud vs non-fraud)
#         edge_colors = []
#         for src, dst in edges:
#             if dst in transaction_n_ids:
#                 if df[df['transaction_node'] == list(transaction_map.keys())[list(transaction_map.values()).index(list(transaction_n_ids).index(dst))]['is_fraud'].values[0] == 1]:
#                     edge_colors.append('red')
#                 else:
#                     edge_colors.append('black')

#         print(f"Edge Colors: {edge_colors}")

#         # Graph layout
#         pos = nx.spring_layout(G, k=0.3, iterations=50)

#         # Draw graph
#         plt.figure(figsize=(14, 10))
#         nx.draw(
#             G, pos, with_labels=True,
#             node_color=[node_colors[n] for n in G.nodes],
#             node_size=[node_sizes[n] for n in G.nodes],
#             edge_color=edge_colors, width=1.5,
#             font_size=8, arrows=True
#         )

#         # Legend
#         legend_labels = {
#             'skyblue': 'Customers',
#             'lightgreen': 'Merchants',
#             'orange': 'Non-Fraud Transactions',
#             'red': 'Fraud Transactions',
#             'black': 'Non-Fraud Edges',
#             'red': 'Fraud Edges'
#         }
#         legend_patches = [plt.Line2D([0], [0], marker='o', color='w', label=label,
#                                     markersize=10, markerfacecolor=color) for color, label in legend_labels.items()]
#         plt.legend(handles=legend_patches, loc="upper left")
#         plt.title("Fraudulent and Non-Fraudulent Transactions Subgraph")
#         plt.show()

#     except Exception as e:
#         logging.error(f"Error visualizing graph: {e}")

# # Example usage
# graph_data_path = 'artifacts/graph_construction/graph_data.pt'
# transformed_data_path = 'artifacts/graph_construction/node_mapped_data.csv'
# visualize_fraud_subgraph(graph_data_path, transformed_data_path, num_transactions=10, fraud_ratio=0.1)

In [13]:
# import torch
# import networkx as nx
# import matplotlib.pyplot as plt
# import pandas as pd
# import logging
# import numpy as np

# def visualize_fraud_subgraph_small(graph_data_path, transformed_data_path, num_fraud=1, neighborhood_depth=1):
#     """
#     Visualizes a small subgraph with all arrows pointing to fraudulent transactions in red, excluding self-loops.
#     """
#     try:
#         data = torch.load(graph_data_path)
#         df = pd.read_csv(transformed_data_path)

#         fraud_transactions = df[df['is_fraud'] == 1]['transaction_node'].tolist()
#         selected_fraud = random.sample(fraud_transactions, min(num_fraud, len(fraud_transactions)))

#         edges = []
#         card_to_transaction = data["card", "transacts", "transaction"].edge_index.numpy()
#         transaction_to_merchant = data["transaction", "occurs_at", "merchant"].edge_index.numpy()

#         card_n_ids = data["card"].n_id.numpy()
#         transaction_n_ids = data["transaction"].n_id.numpy()
#         merchant_n_ids = data["merchant"].n_id.numpy()

#         transaction_map = dict(zip(transaction_n_ids, df['transaction_node']))

#         selected_nodes = set()
#         for transaction_node_id in selected_fraud:
#             transaction_graph_id = list(transaction_map.keys())[list(transaction_map.values()).index(transaction_node_id)]
#             selected_nodes.add(transaction_graph_id)

#             def expand_neighborhood(node_id, depth):
#                 if depth == 0:
#                     return
#                 for src, dst in zip(*card_to_transaction):
#                     if transaction_n_ids[dst] == node_id:
#                         selected_nodes.add(card_n_ids[src])
#                         selected_nodes.add(transaction_n_ids[dst])
#                         expand_neighborhood(card_n_ids[src], depth - 1)
#                 for src, dst in zip(*transaction_to_merchant):
#                     if transaction_n_ids[src] == node_id:
#                         selected_nodes.add(merchant_n_ids[dst])
#                         selected_nodes.add(transaction_n_ids[src])
#                         expand_neighborhood(merchant_n_ids[dst], depth - 1)

#             expand_neighborhood(transaction_graph_id, neighborhood_depth)

#         for src, dst in zip(*card_to_transaction):
#             if card_n_ids[src] in selected_nodes and transaction_n_ids[dst] in selected_nodes:
#                 edges.append((card_n_ids[src], transaction_n_ids[dst]))
#         for src, dst in zip(*transaction_to_merchant):
#             if transaction_n_ids[src] in selected_nodes and merchant_n_ids[dst] in selected_nodes:
#                 edges.append((transaction_n_ids[src], merchant_n_ids[dst]))

#         G = nx.DiGraph()
#         G.add_edges_from([(src, dst) for src, dst in edges if src != dst])  # Exclude self-loops

#         node_colors = {}
#         node_sizes = {}
#         card_nodes = set(card_n_ids)
#         merchant_nodes = set(merchant_n_ids)

#         for node in G.nodes:
#             if node in card_nodes:
#                 node_colors[node] = 'skyblue'
#                 node_sizes[node] = 800
#             elif node in merchant_nodes:
#                 node_colors[node] = 'lightgreen'
#                 node_sizes[node] = 800
#             elif node in transaction_n_ids:
#                 transaction_node_id = transaction_map[node]
#                 is_fraud = df[df['transaction_node'] == transaction_node_id]['is_fraud'].values[0]
#                 if is_fraud == 1:
#                     node_colors[node] = 'red'
#                     node_sizes[node] = 400
#                 else:
#                     node_colors[node] = 'lightcoral'
#                     node_sizes[node] = 400

#         for src, dst in G.edges():
#             if dst in G.nodes():
#                 if node_colors.get(dst) != 'red':
#                     node_colors[dst] = 'lightgreen'

#         fraudulent_node_ids = [node for node, color in node_colors.items() if color == 'red']
#         edge_colors = ['red' if dst in fraudulent_node_ids or src in fraudulent_node_ids else 'black' for src, dst in G.edges()]

#         # Debugging output
#         print("Fraudulent Node IDs:", fraudulent_node_ids)
#         print("Edges:", G.edges())
#         print("Node Colors:", node_colors)
#         print("Edge Colors:", edge_colors)

#         # Plotting code
#         plt.figure(figsize=(8, 6))
#         pos = nx.spring_layout(G)
#         nx.draw(G, pos, with_labels=True, node_color=[node_colors[node] for node in G.nodes],
#                 node_size=[node_sizes[node] for node in G.nodes], font_size=8, font_color='black', arrows=True,
#                 edge_color=edge_colors)

#         legend_elements = [plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='skyblue', markersize=10, label='Customers'),
#                            plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='lightgreen', markersize=10, label='Receivers'),
#                            plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='red', markersize=10, label='Fraud Transactions'),
#                            plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='lightcoral', markersize=10, label='Non-Fraud Transactions')]
#         plt.legend(handles=legend_elements)

#         plt.title("Small Fraudulent Transaction Subgraph")
#         plt.show()

#     except Exception as e:
#         logging.error(f"Error visualizing graph: {e}")

# graph_data_path = 'artifacts/graph_construction/graph_data.pt'
# transformed_data_path = 'artifacts/graph_construction/node_mapped_data.csv'
# visualize_fraud_subgraph_small(graph_data_path, transformed_data_path)

In [14]:
# import torch
# import networkx as nx
# import matplotlib.pyplot as plt
# import pandas as pd
# import logging
# import numpy as np

# def visualize_fraud_subgraph_small(graph_data_path, transformed_data_path, num_fraud=2, neighborhood_depth=5):
#     """
#     Visualizes a small subgraph with all arrows pointing to fraudulent transactions in red, excluding self-loops, with an expanded neighborhood and improved visualization.
#     """
#     try:
#         data = torch.load(graph_data_path)
#         df = pd.read_csv(transformed_data_path)

#         fraud_transactions = df[df['is_fraud'] == 1]['transaction_node'].tolist()
#         selected_fraud = random.sample(fraud_transactions, min(num_fraud, len(fraud_transactions)))

#         edges = []
#         card_to_transaction = data["card", "transacts", "transaction"].edge_index.numpy()
#         transaction_to_merchant = data["transaction", "occurs_at", "merchant"].edge_index.numpy()

#         card_n_ids = data["card"].n_id.numpy()
#         transaction_n_ids = data["transaction"].n_id.numpy()
#         merchant_n_ids = data["merchant"].n_id.numpy()

#         transaction_map = dict(zip(transaction_n_ids, df['transaction_node']))

#         selected_nodes = set()
#         for transaction_node_id in selected_fraud:
#             transaction_graph_id = list(transaction_map.keys())[list(transaction_map.values()).index(transaction_node_id)]
#             selected_nodes.add(transaction_graph_id)

#             def expand_neighborhood(node_id, depth):
#                 if depth == 0:
#                     return
#                 for src, dst in zip(*card_to_transaction):
#                     if transaction_n_ids[dst] == node_id:
#                         selected_nodes.add(card_n_ids[src])
#                         selected_nodes.add(transaction_n_ids[dst])
#                         expand_neighborhood(card_n_ids[src], depth - 1)
#                 for src, dst in zip(*transaction_to_merchant):
#                     if transaction_n_ids[src] == node_id:
#                         selected_nodes.add(merchant_n_ids[dst])
#                         selected_nodes.add(transaction_n_ids[src])
#                         expand_neighborhood(merchant_n_ids[dst], depth - 1)

#             expand_neighborhood(transaction_graph_id, neighborhood_depth)

#         for src, dst in zip(*card_to_transaction):
#             if card_n_ids[src] in selected_nodes and transaction_n_ids[dst] in selected_nodes:
#                 edges.append((card_n_ids[src], transaction_n_ids[dst]))
#         for src, dst in zip(*transaction_to_merchant):
#             if transaction_n_ids[src] in selected_nodes and merchant_n_ids[dst] in selected_nodes:
#                 edges.append((transaction_n_ids[src], merchant_n_ids[dst]))

#         G = nx.DiGraph()
#         G.add_edges_from([(src, dst) for src, dst in edges if src != dst])

#         node_colors = {}
#         node_sizes = {}
#         card_nodes = set(card_n_ids)
#         merchant_nodes = set(merchant_n_ids)

#         for node in G.nodes:
#             if node in card_nodes:
#                 node_colors[node] = 'skyblue'
#                 node_sizes[node] = 400  # Reduced node size
#             elif node in merchant_nodes:
#                 node_colors[node] = 'lightgreen'
#                 node_sizes[node] = 400  # Reduced node size
#             elif node in transaction_n_ids:
#                 transaction_node_id = transaction_map[node]
#                 is_fraud = df[df['transaction_node'] == transaction_node_id]['is_fraud'].values[0]
#                 if is_fraud == 1:
#                     node_colors[node] = 'red'
#                     node_sizes[node] = 200  # Reduced node size
#                 else:
#                     node_colors[node] = 'lightcoral'
#                     node_sizes[node] = 200  # Reduced node size

#         for src, dst in G.edges():
#             if dst in G.nodes():
#                 if node_colors.get(dst) != 'red':
#                     node_colors[dst] = 'lightgreen'

#         fraudulent_node_ids = [node for node, color in node_colors.items() if color == 'red']
#         edge_colors = ['red' if dst in fraudulent_node_ids or src in fraudulent_node_ids else 'black' for src, dst in G.edges()]

#         # Debugging output
#         print("Fraudulent Node IDs:", fraudulent_node_ids)
#         print("Edges:", G.edges())
#         print("Node Colors:", node_colors)
#         print("Edge Colors:", edge_colors)

#         # Plotting code
#         plt.figure(figsize=(18, 12))  # Increased figure size
#         pos = nx.kamada_kawai_layout(G)  # Use kamada_kawai_layout
#         nx.draw(G, pos, with_labels=False, node_color=[node_colors[node] for node in G.nodes],
#                 node_size=[node_sizes[node] for node in G.nodes], font_size=8, font_color='black', arrows=True,
#                 edge_color=edge_colors, width=0.5)  # Reduced edge width

#         # Label only a subset of nodes
#         labels = {node: node for node in G.nodes if node in fraudulent_node_ids or node in list(G.neighbors(fraudulent_node_ids[0]))}
#         nx.draw_networkx_labels(G, pos, labels=labels, font_size=8)

#         legend_elements = [plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='skyblue', markersize=10, label='Customers'),
#                            plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='lightgreen', markersize=10, label='Receivers'),
#                            plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='red', markersize=10, label='Fraud Transactions'),
#                            plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='lightcoral', markersize=10, label='Non-Fraud Transactions')]
#         plt.legend(handles=legend_elements)

#         plt.title("Expanded Fraudulent Transaction Subgraph (Improved Visualization)")
#         plt.show()

#     except Exception as e:
#         logging.error(f"Error visualizing graph: {e}")

# graph_data_path = 'artifacts/graph_construction/graph_data.pt'
# transformed_data_path = 'artifacts/graph_construction/node_mapped_data.csv'
# visualize_fraud_subgraph_small(graph_data_path, transformed_data_path)