# Knowledge Graph (KG) Retrieval
This document outlines the process of retrieving information from a Neo4j-based Knowledge Graph (KG) using Python. The KG represents software code elements and their relationships (e.g., which methods call which, what class a method belongs to, etc.).

---

## Step 1: Connecting to Neo4j
To begin, we establish a connection to a Neo4j database that contains our Knowledge Graph. This is done using a Python client library such as neo4j, py2neo, or any other supported driver. The database stores nodes (e.g., methods, classes, variables) and relationships (e.g., CALLS, CALLED_BY, BELONGS_TO, USES) which we will query.

---

## Step 2: Retrieving Relationships from the KG
We will implement a Python function that takes the following parameters to extract specific relationships from the KG:

### Input Parameters

1. `method_name` (`str`): 

The name of the method node in the Knowledge Graph. The type of this node is `METHOD`.

2. `calls` (`int`): 

Specifies the depth of forward traversal using the `CALLS` relationship.
For example, if `calls = 2`, we retrieve all methods called by `method_name` within 2 levels.

3. `called_by` (`int`): 

Specifies the depth of backward traversal using the `CALLED_BY` relationship.
This retrieves methods that call the given `method_name`, up to the specified depth.


4. `belongs_to` (`bool`): 

If `True`, retrieves the class (or other container) to which the method belongs using the `BELONGS_TO` relationship.

5. `uses` (`bool`): 

If `True`, retrieves variables used or defined within the given method using the `USES` relationship.

---

## Step 3: Response Format

The function will return a Python dictionary (serializable to JSON) representing the retrieved data.

Each node in the result includes:

- `method_name` (or `variable_name` / `class_name`): the identifier of the node.

- `depth`: the depth of the node from the origin node.

- `parent`: the immediate predecessor in the traversal path.

- `level`: the relative level from the root `method_name`, useful for hierarchical visualizations or traceability.

### Sample Output
```json
{
  "method_name": "M1",
  "depth": 2,
  "level": 0,
  "CALLS": [
    {
      "method_name": "M2",
      "depth": 1,
      "parent": "M1",
      "level": 1
    },
    {
      "method_name": "M3",
      "depth": 0,
      "parent": "M2",
      "level": 2
    }
  ],
  "CALLED_BY": [
    {
      "method_name": "M4",
      "depth": 3,
      "parent": "M1",
      "level": 1
    }
  ],
  "BELONGS_TO": {
    "class_name": "C1",
    "depth": 3,
    "parent": "M1",
    "level": 1
  },
  "USES": [
    {
      "variable_name": "V1",
      "depth": 0,
      "parent": "M1",
      "level": 1
    }
  ]
}
```

### Notes: 

- The `depth` field can be used for understanding proximity in terms of traversal cost.

- The `level` field indicates the distance from the source method_name node in a readable way.

- The `parent` field helps trace the origin path of each node and is useful for reconstructing call chains.

If you are open to format suggestions, an alternative structure could involve grouping results by relationship type in a tree-like format to better support hierarchical visualization tools—but for most cases, the current format is clear and efficient.

---

## Step 4: Returning the Response

Once the graph traversal and relationship extraction are complete, the result should be returned as a Python dictionary structured as shown above. This dictionary can then be converted to JSON for use in APIs or visualizations.





In [None]:
# If not installed already
!pip install neo4j
!pip install openai

from neo4j import GraphDatabase
from typing import List, Dict, Any, Optional
import json



You should consider upgrading via the 'c:\python\python39\python.exe -m pip install --upgrade pip' command.
You should consider upgrading via the 'c:\python\python39\python.exe -m pip install --upgrade pip' command.


Collecting openai
  Downloading openai-1.97.1-py3-none-any.whl (764 kB)
Collecting distro<2,>=1.7.0
  Downloading distro-1.9.0-py3-none-any.whl (20 kB)
Collecting pydantic<3,>=1.9.0
  Downloading pydantic-2.11.7-py3-none-any.whl (444 kB)
Collecting jiter<1,>=0.4.0
  Downloading jiter-0.10.0-cp39-cp39-win_amd64.whl (208 kB)
Collecting tqdm>4
  Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
Collecting pydantic-core==2.33.2
  Downloading pydantic_core-2.33.2-cp39-cp39-win_amd64.whl (2.0 MB)
Collecting typing-inspection>=0.4.0
  Downloading typing_inspection-0.4.1-py3-none-any.whl (14 kB)
Collecting annotated-types>=0.6.0
  Downloading annotated_types-0.7.0-py3-none-any.whl (13 kB)
Installing collected packages: typing-inspection, pydantic-core, annotated-types, tqdm, pydantic, jiter, distro, openai
Successfully installed annotated-types-0.7.0 distro-1.9.0 jiter-0.10.0 openai-1.97.1 pydantic-2.11.7 pydantic-core-2.33.2 tqdm-4.67.1 typing-inspection-0.4.1


In [None]:
# Setup connection details
NEO4J_URI = "bolt://98.70.123.110:7687"  # Update if using remote DB
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "y?si+:qDV3DK"

# Establish a connection
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))


In [97]:
def fetch_all_calls_map(tx) -> Dict[str, List[str]]:
    query = """
    MATCH p=(caller:Method)-[r:CALLS]->(callee:Method) RETURN caller.name AS caller, callee.name AS callee;
    """
    result = tx.run(query)
    print(result)
    calls_map = {}
    for r in result:
        caller = r["caller"]
        callee = r["callee"]
        calls_map.setdefault(caller, []).append(callee)
    return calls_map

def fetch_all_called_by_map(tx) -> Dict[str, List[str]]:
    query = """
    MATCH p=(caller:Method)-[r:CALLED_BY]->(callee:Method) RETURN caller.name AS caller, callee.name AS callee;
    """
    result = tx.run(query)
    calls_map = {}
    for r in result:
        caller = r["caller"]
        callee = r["callee"]
        calls_map.setdefault(caller, []).append(callee)
    return calls_map

def dfs_calls(method_name: str, calls_map: Dict[str, List[str]], max_depth: int) -> List[Dict[str, Any]]:
    visited = set()
    result = []

    def dfs(node: str, level: int, parent: Optional[str]):
        if level > max_depth or node in visited:
            return
        visited.add(node)
        result.append({
            "method_name": node,
            "depth": level,
            "parent": parent,
            "level": level
        })
        for child in calls_map.get(node, []):
            dfs(child, level + 1, node)

    dfs(method_name, level=1, parent=None)
    return result

def fetch_calls_python(tx, method_name: str, depth: int) -> List[Dict[str, Any]]:
    calls_map = fetch_all_calls_map(tx)
    return dfs_calls(method_name, calls_map, depth)

def fetch_called_by_python(tx, method_name: str, depth: int) -> List[Dict[str, Any]]:
    called_by_map = fetch_all_called_by_map(tx)
    return dfs_calls(method_name, called_by_map, depth)


In [98]:
def fetch_belongs_to(tx, method_name: str) -> Optional[Dict[str, Any]]:
    query = """
    MATCH (m:Method {name: $method_name})-[:BELONGS_TO]->(c:Class)
    RETURN c.name AS class_name, c.depth AS depth
    """
    record = tx.run(query, method_name=method_name).single()
    if record:
        return {
            "class_name": record["class_name"],
            "depth": record["depth"] or 0,
            "parent": method_name,
            "level": 1
        }
    return None


In [99]:
def fetch_uses(tx, method_name: str) -> List[Dict[str, Any]]:
    query = """
    MATCH (m:Method {name: $method_name})-[:USES]->(v:Variable)
    RETURN v.name AS variable_name, v.depth AS depth
    """
    result = tx.run(query, method_name=method_name)
    return [
        {
            "variable_name": r["variable_name"],
            "depth": r["depth"] or 0,
            "parent": method_name,
            "level": 1
        } for r in result
    ]


In [100]:
def retrieve_kg_context(
    method_name: str,
    calls: int = 0,
    called_by: int = 0,
    belongs_to: bool = False,
    uses: bool = False
) -> Dict[str, Any]:
    result = {
        "method_name": method_name,
        "depth": max(calls, called_by),
        "level": 0,
        "CALLS": [],
        "CALLED_BY": [],
        "BELONGS_TO": {},
        "USES": [],
    }

    with driver.session() as session:
        if calls > 0:
            result["CALLS"] = session.execute_read(fetch_calls_python, method_name, calls)
        if called_by > 0:
            result["CALLED_BY"] = session.execute_read(fetch_called_by_python, method_name, called_by)
        if belongs_to:
            belongs = session.execute_read(fetch_belongs_to, method_name)
            if belongs:
                result["BELONGS_TO"] = belongs
        if uses:
            result["USES"] = session.execute_read(fetch_uses, method_name)

    return result


In [None]:
# Replace with actual method name in your Neo4j DB
method = "getPreferences"

kg_data = retrieve_kg_context(
    method_name=method,
    calls=3,
    called_by=1,
    belongs_to=True,
    uses=True
)

# Pretty print
print(json.dumps(kg_data, indent=2))


<neo4j._sync.work.result.Result object at 0x000001ED1832C730>
{
  "method_name": "getPreferences",
  "depth": 3,
  "level": 0,
  "CALLS": [
    {
      "method_name": "getPreferences",
      "depth": 1,
      "parent": null,
      "level": 1
    },
    {
      "method_name": "getName",
      "depth": 2,
      "parent": "getPreferences",
      "level": 2
    },
    {
      "method_name": "getLogger",
      "depth": 2,
      "parent": "getPreferences",
      "level": 2
    },
    {
      "method_name": "initConfig",
      "depth": 2,
      "parent": "getPreferences",
      "level": 2
    },
    {
      "method_name": "log",
      "depth": 3,
      "parent": "initConfig",
      "level": 3
    },
    {
      "method_name": "close",
      "depth": 3,
      "parent": "initConfig",
      "level": 3
    },
    {
      "method_name": "info",
      "depth": 2,
      "parent": "getPreferences",
      "level": 2
    }
  ],
  "CALLED_BY": [
    {
      "method_name": "getPreferences",
      "depth"

In [1]:
import os
from openai import AzureOpenAI

endpoint = "https://openai-hybrid-code-gen.openai.azure.com/"
model_name = "gpt-4.1"
deployment = "gpt-4.1"

subscription_key = ""
api_version = "2024-12-01-preview"

client = AzureOpenAI(
    api_version=api_version,
    azure_endpoint=endpoint,
    api_key=subscription_key,
)

response = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant.",
        },
        {
            "role": "user",
            "content": "I am going to Paris, what should I see?",
        }
    ],
    max_completion_tokens=800,
    temperature=1.0,
    top_p=1.0,
    frequency_penalty=0.0,
    presence_penalty=0.0,
    model=deployment
)

print(response.choices[0].message.content)

APIConnectionError: Connection error.