This script demonstrates the construction of an agentic application for analyzing a synthetic financial fraud dataset.
It integrates ArangoDB for graph storage, NVIDIA cuGraph and NetworkX for graph analytics, and LangChain/LangGraph for
natural language query processing. The application supports querying a fraud ring graph using AQL (ArangoDB Query Language)
and NetworkX algorithms, with results translated back into natural language.

### Step 0: Package Installation & setup

In [1]:
# Install required packages for graph processing, database interaction, and language model integration
!pip install nx-arangodb nx-cugraph-cu12 --extra-index-url https://pypi.nvidia.com langchain langchain-community langchain-groq langgraph

Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Collecting nx-arangodb
  Downloading nx_arangodb-1.3.0-py3-none-any.whl.metadata (9.3 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.19-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-groq
  Downloading langchain_groq-0.2.5-py3-none-any.whl.metadata (2.6 kB)
Collecting langgraph
  Downloading langgraph-0.3.5-py3-none-any.whl.metadata (17 kB)
Collecting networkx<=3.4,>=3.0 (from nx-arangodb)
  Downloading networkx-3.4-py3-none-any.whl.metadata (6.3 kB)
Collecting phenolrs~=0.5 (from nx-arangodb)
  Downloading phenolrs-0.5.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.7 kB)
Collecting python-arango~=8.1 (from nx-arangodb)
  Downloading python_arango-8.1.6-py3-none-any.whl.metadata (8.2 kB)
Collecting adbnx-adapter~=5.0.5 (from nx-arangodb)
  Downloading adbnx_adapter-5.0.6-py3-none-any.whl.metadata (21 kB)
Collecting langchain-core<1.0.0,>=0.3.35 (from langcha

In [2]:
# Verify NVIDIA GPU availability (optional, for cuGraph acceleration)
!nvidia-smi
!nvcc --version

Mon Mar 10 15:01:40 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   47C    P8             11W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [3]:
# Import necessary libraries
import os
import re
from random import randint
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from arango import ArangoClient
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import MemorySaver
from langchain_groq import ChatGroq
from langchain_community.graphs import ArangoGraph
from langchain_community.chains.graph_qa.arangodb import ArangoGraphQAChain
from langchain_core.tools import tool
from pydantic import BaseModel, Field
from typing import Literal
from langchain_core.prompts import ChatPromptTemplate

In [4]:
os.environ["NX_CUGRAPH_AUTOCONFIG"] = "True"
import nx_arangodb as nxadb  # Must import after setting environment variable

[15:01:46 +0000] [INFO]: NetworkX-cuGraph is available.
INFO:nx_arangodb:NetworkX-cuGraph is available.


In [None]:
# TODO: Replace with actual credentials

os.environ["GROQ_API_KEY"] = ""
os.environ["ARANGODB_URL"] = ""
os.environ["ARANGODB_USERNAME"] = ""
os.environ["ARANGODB_PASSWORD"] = ""
os.environ["DB_NAME"] = ""

In [13]:
# Connect to ArangoDB cloud database

arangodb_url = os.getenv("ARANGODB_URL")
arangodb_username = os.getenv("ARANGODB_USERNAME")
arangodb_password = os.getenv("ARANGODB_PASSWORD")
db_name = os.getenv("DB_NAME")

db = ArangoClient(hosts=arangodb_url).db(
    username=arangodb_username,
    password=arangodb_password,
    verify=True
)

### Step 1: Choose & prepare your dataset for NetworkX

In [None]:
# Load synthetic fraud dataset from CSV
fraud_ring_graph = pd.read_csv(
    "/content/fraud_23pct_synthetic_dataset_fixed.csv",
)

fraud_ring_graph

Unnamed: 0,Transaction_ID,Sender_account,Sender_age,Sender_is_elderly,Receiver_account,Receiver_age,Receiver_is_elderly,Amount,Date,Is_fraud,Sender_gender,Receiver_gender,Type_of_fraud,Method_of_contact,Loss,Time_of_day,Resolution_status
0,TXN-ZGH4A9ZJ,6953697,38,0,206718272,21,0,457.46,2023-05-15,0,Female,Male,Legitimate,Direct,0.00,Morning,
1,TXN-CIJQN6C4,89029013,20,0,30635852,62,1,407.64,2023-03-22,0,Male,Male,Legitimate,Direct,0.00,Morning,
2,TXN-NNS9PZEM,674715057,69,1,453651788,44,0,657.94,2023-12-23,1,Female,Male,Investment Fraud,Email,657.94,Afternoon,Reported
3,TXN-UE6EU8UI,896619255,73,1,284968123,33,0,240.21,2023-11-19,0,Female,Female,Legitimate,Direct,0.00,Evening,
4,TXN-3LKEISJE,175484941,64,1,435131719,21,0,878.00,2023-11-08,0,Male,Female,Legitimate,Direct,0.00,Morning,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,TXN-44JCK2ZC,864406382,25,0,166950383,37,0,124.67,2023-08-30,0,Female,Male,Legitimate,Direct,0.00,Evening,
9996,TXN-LYFAY0XA,727600539,83,1,578909930,33,0,146.02,2023-09-07,1,Male,Male,Tech Support Scam,Phone,146.02,Morning,Reported
9997,TXN-X6ZLIM1B,505988121,29,0,463837684,62,1,57.42,2023-01-13,0,Female,Male,Legitimate,Direct,0.00,Afternoon,
9998,TXN-JACEN74I,149578017,41,0,596499669,21,0,69.78,2023-12-23,0,Female,Female,Legitimate,Direct,0.00,Night,


### Step 2: Convert and Load Graph Data into NetworkX

In [None]:
# Create a MultiDiGraph from the dataset, representing transactions between accounts

G = nx.from_pandas_edgelist(
    fraud_ring_graph,
    source='Sender_account',        # Source node (account)
    target='Receiver_account',      # Target node (account)
    edge_attr=['Transaction_ID', 'Amount', 'Date', 'Is_fraud', 'Type_of_fraud', 'Loss'],  # Edge properties
    create_using=nx.MultiDiGraph()  # Allows multiple directed edges between nodes
)

# Add node attributes (e.g., age, gender) from the dataset
node_attributes = {}
for _, row in fraud_ring_graph.iterrows():
    sender = row['Sender_account']
    receiver = row['Receiver_account']

    # Initialize sender node attributes if not already present
    if sender not in node_attributes:
        node_attributes[sender] = {
            'account': str(sender),         # Explicit account number as string
            'age': row['Sender_age'],
            'is_elderly': row['Sender_is_elderly'],
            'gender': row['Sender_gender']
        }

    # Initialize receiver node attributes if not already present
    if receiver not in node_attributes:
        node_attributes[receiver] = {
            'account': str(receiver),       # Explicit account number as string
            'age': row['Receiver_age'],
            'is_elderly': row['Receiver_is_elderly'],
            'gender': row['Receiver_gender']
        }

# Apply node attributes to the graph
nx.set_node_attributes(G, node_attributes)

# Display attributes of a sample node for verification
sample_node = list(G.nodes())[0]
print(f"Sample node ({sample_node}) attributes: {G.nodes[sample_node]}")

In [None]:
# Visualize the graph (optional)
plot_options = {"node_size": 10, "with_labels": False, "width": 0.15}
pos = nx.spring_layout(G, iterations=15, seed=1721)  # Layout for visualization
fig, ax = plt.subplots(figsize=(15, 9))
nx.draw_networkx(G, pos=pos, ax=ax, **plot_options)
plt.show()

### Step 3: Persist the Graph in ArangoDB

In [19]:
# Load the NetworkX graph into ArangoDB cloud database
G_adb = nxadb.Graph(
    name="knowledge_graph",
    db=db,                     # ArangoDB connection
    # incoming_graph_data=G,   # Uncomment to load the graph (disabled for demo)
    # write_batch_size=500,    # Batch size for writing
    # overwrite_graph=True     # Overwrite existing graph if present
)

[15:06:41 +0000] [INFO]: Graph 'knowledge_graph' exists.
INFO:nx_arangodb:Graph 'knowledge_graph' exists.
[15:06:41 +0000] [INFO]: Default node type set to 'knowledge_graph_node'
INFO:nx_arangodb:Default node type set to 'knowledge_graph_node'


In [20]:
# Enable GPU acceleration for ArangoDB queries (if available)
nx.config.backends.arangodb.use_gpu = True

In [21]:
# Test the graph with a random AQL query (retrieve 3 random nodes)
result = G_adb.query("""
    FOR node IN knowledge_graph_node
        SORT RAND()
        LIMIT 3
        RETURN node
""")
print(list(result))

[{'_key': '62', '_id': 'knowledge_graph_node/62', '_rev': '_jVPJ2qu--c', 'account': '705436922', 'age': 53, 'is_elderly': 0, 'gender': 'Female'}, {'_key': '191', '_id': 'knowledge_graph_node/191', '_rev': '_jVPJ2qy--Z', 'account': '929363301', 'age': 41, 'is_elderly': 0, 'gender': 'Female'}, {'_key': '407', '_id': 'knowledge_graph_node/407', '_rev': '_jVPJ2q2-_d', 'account': '263863730', 'age': 80, 'is_elderly': 1, 'gender': 'Male'}]


### Step 4: Build the Agentic App with LangChain & LangGraph

In [22]:
# Create an ArangoGraph wrapper for LangChain integration
arango_graph = ArangoGraph(db)

In [24]:
# Define a tool to convert natural language to AQL and back to text
@tool
def text_to_aql_to_text(query: str):
    """
    Translates a natural language query into an AQL query, executes it on the ArangoDB graph,
    and converts the result back to natural language.

    Args:
        query (str): Natural language query (e.g., "Show details of account 12345.")

    Returns:
        str: Result in natural language format

    Examples:
        "Show all transactions above $100."
        "Find all accounts linked to elderly individuals."
    """
    # Initialize a deterministic LLM for consistent AQL generation
    llm = ChatGroq(temperature=0.2, model_name="llama-3.3-70b-versatile")

    # Create an AQL query chain with examples for better performance
    chain = ArangoGraphQAChain.from_llm(
        llm=llm,
        graph=arango_graph,
        verbose=True,
        allow_dangerous_requests=True,
        top_k=20,                     # Limit to top 20 results
        max_aql_generation_attempts=5,  # Retry AQL generation up to 5 times
        aql_examples="""
        # Example: Transactions with loss > $100
        FOR startNode IN knowledge_graph_node
          FOR v, e IN 1..1 OUTBOUND startNode._id GRAPH 'knowledge_graph'
            FILTER e.Loss > 100
            RETURN {
              transaction_id: e.Transaction_ID,
              amount: e.Amount,
              date: e.Date,
              is_fraud: e.Is_fraud,
              type_of_fraud: e.Type_of_fraud,
              loss: e.Loss,
              from_account: startNode.account,
              to_account: v.account
            }
        # Example: Fraudulent transactions involving elderly
        FOR startNode IN knowledge_graph_node
            FILTER startNode.age > 60
            FOR v, e IN 1..1 OUTBOUND startNode._id GRAPH 'knowledge_graph'
                FILTER e.Is_fraud == 1
                RETURN {
                    transaction_id: e.Transaction_ID,
                    amount: e.Amount,
                    date: e.Date,
                    type_of_fraud: e.Type_of_fraud,
                    loss: e.Loss,
                    elderly_account: startNode.account
                }
        """
    )

    # Execute the query and return the result
    result = chain.invoke(query)
    return str(result["result"])


In [51]:
@tool
def text_to_nx_algorithm_to_text(query):
    """
    This tool invokes a NetworkX algorithm on the ArangoDB Graph.
    It accepts a natural language query, determines the best algorithm to execute,
    generates Python code to answer the query using the G_adb NetworkX graph, executes the code,
    and then synthesizes a concise natural language answer.

    IMPORTANT:
    - Use this tool only for queries that require graph analytics (e.g., centrality, shortest path, clustering).
    - For simple traversals solvable via AQL, do not use this tool.
    """
    import re

    # LLM instance for generating and refining Python code
    llm = ChatGroq(temperature=0.2, model_name="llama-3.3-70b-versatile")

    aql_query = """
            FOR edge IN knowledge_graph_node_to_knowledge_graph_node
                FOR sender IN knowledge_graph_node
                    FILTER sender._id == edge._from
                FOR receiver IN knowledge_graph_node
                    FILTER receiver._id == edge._to
                    RETURN {
                        source: sender.account,
                        target: receiver.account,
                        edge_attrs: {
                            Transaction_ID: edge.Transaction_ID,
                            Amount: edge.Amount,
                            Date: edge.Date,
                            Is_fraud: edge.Is_fraud,
                            Type_of_fraud: edge.Type_of_fraud,
                            Loss: edge.Loss
                        },
                        sender_attrs: {
                            age: sender.age,
                            is_elderly: sender.is_elderly,
                            gender: sender.gender
                        },
                        receiver_attrs: {
                            age: receiver.age,
                            is_elderly: receiver.is_elderly,
                            gender: receiver.gender
                        }
                    }
        """
    result = G_adb.query(aql_query)
    edges_data = list(result)
    if not edges_data:
        return "No data found in the graph to analyze."

    # Convert to DataFrame and build NetworkX graph
    df_edges = pd.DataFrame(edges_data)
    G = nx.from_pandas_edgelist(
        df_edges,
        source='source',
        target='target',
        edge_attr='edge_attrs',
        create_using=nx.MultiDiGraph()
    )
    # Add node attributes
    node_attrs = {}
    for _, row in df_edges.iterrows():
        node_attrs[row['source']] = row['sender_attrs']
        node_attrs[row['target']] = row['receiver_attrs']
    nx.set_node_attributes(G, node_attrs)

    # Step 2: Generate NetworkX code using LLM
    prompt = f"""
    Given a NetworkX directed graph `G` with the following schema:
    - Nodes: 'account' (string), 'age' (int), 'is_elderly' (0/1), 'gender' (Male/Female)
    - Edges: 'Transaction_ID' (string), 'Amount' (float), 'Date' (YYYY-MM-DD), 'Is_fraud' (0/1), 'Type_of_fraud' (string), 'Loss' (float)

    Note: Edge attributes are nested within a dictionary under the key 'edge_attrs'.

    Write Python code using NetworkX to answer this query: "{query}"
    The code should:
    - Use appropriate NetworkX algorithms (e.g., nx.pagerank, nx.betweenness_centrality, nx.clustering).
    - Access edge attributes using `data["edge_attrs"]["attribute_name"]`.
    - Return a result in a simple format (e.g., dictionary, list).
    - Assume `G` is already defined as a DiGraph.
    Provide only the code block, no explanations.
    """
    code_response = llm.invoke(prompt)
    text_to_nx_cleaned = re.sub(r"^```python\n|```$", "", code_response.content, flags=re.MULTILINE).strip()
    print(text_to_nx_cleaned)

    # Step 3: Execute the generated code
    exec_globals = {'nx': nx, 'G': G}
    exec_locals = {}
    try:
        exec(text_to_nx_cleaned, exec_globals, exec_locals)
        result = exec_locals.get('result', None)  # Assume the code defines a 'result' variable
        if result is None:
            return "The generated code did not produce a valid result."
    except Exception as e:
        return f"Error executing NetworkX code: {str(e)}"

    # Step 4: Convert result to natural language using LLM
    result_prompt = f"""
    Analyze the following result obtained from a NetworkX graph analysis:

    Result:
    {result}

    Context:
    The result is derived from executing a graph algorithm to answer the query: "{query}".
    Ensure the analysis considers the significance of the output, the impact of the values, and any patterns or insights present.

    Task:
    1. Interpret the result by identifying key insights, such as the most influential nodes, anomalies, or trends.
    2. Provide a concise and informative natural language response that conveys these insights clearly.
    3. If applicable, suggest possible interpretations or actions based on the findings.

    Ensure the response is well-structured, informative, and easy to understand.
    """
    nl_response = llm.invoke(result_prompt)
    return nl_response.content.strip()


In [26]:
from pydantic import BaseModel, Field
from typing import Literal
from langchain_core.prompts import ChatPromptTemplate

# Define a Pydantic model for tool selection
class QueryTool(BaseModel):
    selected_tools: Literal[
        '[text_to_aql_to_text]',
        '[text_to_nx_algorithm_to_text]',
        '[text_to_aql_to_text, text_to_nx_algorithm_to_text]',
    ] = Field(..., description="Determines which tool(s) to use.")

# Prompt for classifying queries
query_classifier_prompt = ChatPromptTemplate.from_messages([
    ("system", """
You are an AI assistant responsible for classifying user queries about a financial fraud graph to determine which tools should be used.

**Available Tools:**
1. `text_to_aql_to_text`: For retrieving structured data from ArangoDB (e.g., listing transactions).
2. `text_to_nx_algorithm_to_text`: For graph analytics using NetworkX (e.g., centrality, clustering).
3. Both in sequence: For hybrid queries needing retrieval and analysis.

**Instructions:**
- Analyze the query and decide which tool(s) to use.
- Return your decision in the format: `[selected_tools=<tool_selection>]`
  - Examples: `[selected_tools="[text_to_aql_to_text]"]`, `[selected_tools="[text_to_nx_algorithm_to_text]"]`, `[selected_tools="[text_to_aql_to_text, text_to_nx_algorithm_to_text]"]`
- Provide reasoning before your decision.

**Examples:**
- Query: "Show transactions flagged as fraudulent."
  - Reasoning: This is a data retrieval task.
  - Output: `[selected_tools="[text_to_aql_to_text]"]`

- Query: "Find the most influential fraudsters in the network., Find the top 5 accounts that have the most influence in fraudulent transactions, considering both direct and indirect connections."
  - Reasoning: This requires graph analytics.
  - Output: `[selected_tools="[text_to_nx_algorithm_to_text]"]`

- Query: "Who are the most influential fraudsters connected to elderly accounts?"
  - Reasoning: This needs data retrieval (fraudsters connected to elderly) and analytics (influence ranking).
  - Output: `[selected_tools="[text_to_aql_to_text, text_to_nx_algorithm_to_text]"]`
"""),
    ("human", "Query: {query}")
])

In [27]:
# Initialize classifier LLM
classifier_llm = ChatGroq(model="mixtral-8x7b-32768")
classifier = query_classifier_prompt | classifier_llm.with_structured_output(QueryTool)

In [28]:
def execute_tools(query: str, classifier_output: str) -> str:
    selected_tools = classifier_output.split("selected_tools=")[1]

    if selected_tools == '[text_to_aql_to_text]':
        return text_to_aql_to_text(query)
    elif selected_tools == '[text_to_nx_algorithm_to_text]':
        return text_to_nx_algorithm_to_text(query)
    elif selected_tools == '[text_to_aql_to_text, text_to_nx_algorithm_to_text]':
        aql_result = text_to_aql_to_text(query)
        nx_query = f"{query} based on the following data: {aql_result}"
        return text_to_nx_algorithm_to_text(nx_query)
    else:
        return "Invalid tool selection."

# Define the two-agent query_graph function
def query_graph(query: str) -> str:
    # Step 1: Classify the query
    classifier_result = classifier.invoke({"query": query})
    classifier_output = f"selected_tools={classifier_result.selected_tools}"
    print(f"Classifier Decision: {classifier_output}")

    # Step 2: Execute the tools based on classification
    result = execute_tools(query, classifier_output)
    return result

In [29]:
# 7. Experiment with example queries
# Note: Some may work, some may not!

# simple queries
query = "Show transactions flagged as fraudulent."
query = "Show all transactions above $100."
query = "Find all direct connections of account 500326438."
query = "Show details of account 287501362."
query = "Find all accounts linked to elderly individuals."
query = "How many fraud transactions involved elderly people?"

# complex queries
query = "Find the top 5 accounts that have the most influence in fraudulent transactions, considering both direct and indirect connections."

# hybrid queries
query = "Who are the most influential fraudsters connected to elderly accounts?"

In [31]:
# testing a simple AQL query
query_graph("Find all accounts linked to elderly individuals.")

Classifier Decision: selected_tools=[text_to_aql_to_text]


[1m> Entering new ArangoGraphQAChain chain...[0m


  return text_to_aql_to_text(query)


AQL Query (1):[32;1m[1;3m
WITH knowledge_graph_node
FOR node IN knowledge_graph_node
  FILTER node.is_elderly == 1
  RETURN {
    account: node.account,
    age: node.age,
    gender: node.gender
  }
[0m
AQL Result:
[32;1m[1;3m[{'account': '30635852', 'age': 62, 'gender': 'Male'}, {'account': '674715057', 'age': 69, 'gender': 'Female'}, {'account': '896619255', 'age': 73, 'gender': 'Female'}, {'account': '175484941', 'age': 64, 'gender': 'Male'}, {'account': '991307112', 'age': 80, 'gender': 'Female'}, {'account': '876106572', 'age': 61, 'gender': 'Male'}, {'account': '936575986', 'age': 61, 'gender': 'Male'}, {'account': '739328947', 'age': 63, 'gender': 'Male'}, {'account': '925602213', 'age': 65, 'gender': 'Male'}, {'account': '69212356', 'age': 87, 'gender': 'Male'}, {'account': '796285932', 'age': 65, 'gender': 'Female'}, {'account': '47654552', 'age': 73, 'gender': 'Male'}, {'account': '731448745', 'age': 80, 'gender': 'Male'}, {'account': '756420240', 'age': 63, 'gender': '

"Based on the provided information, I will create a summary that responds to the user's input.\n\nThe user asked to find all accounts linked to elderly individuals. After analyzing the data, we found a total of 20 accounts that belong to individuals who are considered elderly, with ages ranging from 61 to 87. The accounts are associated with both male and female individuals. The accounts and their corresponding details are as follows:\n\n- Account 30635852: Male, 62 years old\n- Account 674715057: Female, 69 years old\n- Account 896619255: Female, 73 years old\n- Account 175486941: Male, 64 years old\n- Account 991307112: Female, 80 years old\n- Account 876106572: Male, 61 years old\n- Account 936575986: Male, 61 years old\n- Account 739328947: Male, 63 years old\n- Account 925602213: Male, 65 years old\n- Account 69212356: Male, 65 years old\n- Account 796285932: Female, 65 years old\n- Account 47654552: Male, 73 years old\n- Account 731448745: Male, 65 years old\n- Account 756420240:

In [38]:
query_graph("Show transactions flagged as fraudulent.")

Classifier Decision: selected_tools=[text_to_aql_to_text]


[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
FOR startNode IN knowledge_graph_node
  FOR v, e IN 1..1 OUTBOUND startNode._id GRAPH 'knowledge_graph'
    FILTER e.Is_fraud == 1
    RETURN {
      transaction_id: e.Transaction_ID,
      amount: e.Amount,
      date: e.Date,
      type_of_fraud: e.Type_of_fraud,
      loss: e.Loss,
      from_account: startNode.account,
      to_account: v.account
    }
[0m
AQL Result:
[32;1m[1;3m[{'transaction_id': 'TXN-AY7OV50E', 'amount': 156.28, 'date': '2023-07-22', 'type_of_fraud': 'Investment Fraud', 'loss': 156.28, 'from_account': '674715057', 'to_account': '271874491'}, {'transaction_id': 'TXN-TT1BLOW2', 'amount': 133.44, 'date': '2023-10-30', 'type_of_fraud': 'Investment Fraud', 'loss': 133.44, 'from_account': '674715057', 'to_account': '88946147'}, {'transaction_id': 'TXN-YS04SJU6', 'amount': 210.59, 'date': '2023-06-21', 'type_of_fraud': 'Investmen

"Here is a summary based on the AQL Result:\n\nThe following transactions have been flagged as fraudulent: \n\nThere are 20 transactions in total, with the majority being classified as Investment Fraud, totaling 17 transactions, and the remaining 3 transactions being classified as Government Impersonation. \n\nThe transactions flagged as Investment Fraud have amounts ranging from $14.86 to $1,793.97, with a total loss of $3,716.02. The transactions flagged as Government Impersonation have amounts ranging from $32.83 to $454.36, with a total loss of $1,529.93.\n\nThe transactions involve multiple accounts, with the most frequent from_account being '674715057' and the most frequent to_account being '453651788' and '21812641' for Investment Fraud, and '29955006' and '669143326' for Government Impersonation.\n\nThese transactions occurred between January 2023 and December 2023, with the earliest transaction occurring on January 4, 2023, and the latest on December 23, 2023. \n\nPlease revie

In [52]:
# running complex query

query_graph("Find the top 5 accounts that have the most influence in fraudulent transactions, considering both direct and indirect connections.")

Classifier Decision: selected_tools=[text_to_nx_algorithm_to_text]
import networkx as nx

def find_influential_accounts(G):
    # Create a subgraph with only fraudulent transactions
    fraudulent_G = nx.DiGraph()
    for u, v, data in G.edges(data=True):
        if data["edge_attrs"]["Is_fraud"] == 1:
            fraudulent_G.add_edge(u, v, **data)

    # Calculate PageRank centrality
    pr = nx.pagerank(fraudulent_G)

    # Get the top 5 accounts with the highest PageRank
    top_accounts = sorted(pr.items(), key=lambda x: x[1], reverse=True)[:5]

    return dict(top_accounts)

# Example usage:
result = find_influential_accounts(G)
print(result)
{'32165166': 0.0076513877327570035, '974807454': 0.007430477453974335, '522454127': 0.007339077507822518, '56926488': 0.007182651325928198, '803638839': 0.007161080102042163}


"**Analysis of Influential Accounts in Fraudulent Transactions**\n\nThe result obtained from the NetworkX graph analysis reveals the top 5 accounts with the most influence in fraudulent transactions, considering both direct and indirect connections. The output is a dictionary where the keys represent the account IDs and the values represent their respective influence scores.\n\n**Key Insights:**\n\n1. **Most Influential Nodes:** The account with the ID '32165166' has the highest influence score (0.0076513877327570035), indicating that it has the most significant impact on fraudulent transactions. The top 5 accounts, in order of their influence, are:\n\t* '32165166' (0.0076513877327570035)\n\t* '974807454' (0.007430477453974335)\n\t* '522454127' (0.007339077507822518)\n\t* '56926488' (0.007182651325928198)\n\t* '803638839' (0.007161080102042163)\n2. **Trends and Patterns:** The influence scores are relatively close to each other, suggesting that these top 5 accounts have a similar level