In [None]:
%load_ext autoreload
%autoreload 2

# Graph Debugger Agent

In this example notebook we are going to demonstrate how Blar's ```GraphTraversalAgent``` can be used to debug a code repository. We'll download a mock repo from github, use ```GraphConstructor``` to build the graph from it and upload it to Neo4j.

This is an introductory example of what can be achieved with Blar.

**NOTE:** Currently, this pack is configured to only work with `OpenAI` LLMs and `Neo4j` database. But feel free to copy/download the source code and edit as needed!

## Installation and Import

### Core dependencies

In [None]:
%pip install blar-graph --quiet

Note: you may need to restart the kernel to use updated packages.


In [None]:
import uuid
from blar_graph.graph_construction.core.graph_builder import GraphConstructor
from blar_graph.db_managers import Neo4jManager
from blar_graph.agents_tools.agents_examples.debug import get_debug_agent

In [None]:
import os

os.environ["NEO4J_URI"] = "neo4j+s://YOUR_NEO4J.databases.neo4j.io"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "YOUR_NEO4J_PASSWORD"
os.environ["OPENAI_API_KEY"] = "sk-..."

### Download the example repo


We will download a previous version of the library that creates a graph from a repo. We purposly introduced an error in format_nodes.format_directory_node.

In [None]:
!git clone https://github.com/blarApp/blar-example-repos.git

Cloning into 'blar-example-repos'...
remote: Enumerating objects: 88, done.[K
remote: Counting objects: 100% (88/88), done.[K
remote: Compressing objects: 100% (67/67), done.[K
remote: Total 88 (delta 25), reused 80 (delta 18), pack-reused 0[K
Receiving objects: 100% (88/88), 82.86 KiB | 1.18 MiB/s, done.
Resolving deltas: 100% (25/25), done.


### Visualization dependencies

In [None]:
%pip install yfiles_jupyter_graphs --quiet
%pip install neo4j graphdatascience --quiet

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


If running in Gooogle Colab run the following

In [None]:
try:
    from google.colab import output

    output.enable_custom_widget_manager()
except:
    print("Error")
    pass

Error


## Create the Graph

We initialize the `Neo4jManager` and pass it to the `GraphConstructor`. Then we create the example repo's graph by calling build_graph method. This method requires the path to the initial directory and the language of the code.

The build_graph method will scan the directory and recursively traverse the directories creating the nodes and relationships between them.

**NOTE:** Only python is supported at the moment


### Create unique Repo id

First we need to create a unique Id for the Repo. This is in case there is more than 1 repo in the same Neo4j DB. This way we can always query the repo we are interested in and not other repos.

In this case we will use uuid to generate a unique repoId. This repoId is later used to initialize the Neo4jManager.

In [None]:
repoId = str(uuid.uuid4())

### Create the graph

In [None]:
graph_manager = Neo4jManager(repoId)
graph_constructor = GraphConstructor(graph_manager, "python")
graph_constructor.build_graph("blar-example-repos/debugger_agent")

Processed blar-example-repos/debugger_agent/src/run.py
Processed blar-example-repos/debugger_agent/src/__init__.py
Processed blar-example-repos/debugger_agent/src/test_documents/test.py
Processed blar-example-repos/debugger_agent/src/graph_construction/db_manager.py
Processed blar-example-repos/debugger_agent/src/graph_construction/graph_file_parser.py
Processed blar-example-repos/debugger_agent/src/graph_construction/__init__.py
Processed blar-example-repos/debugger_agent/src/graph_construction/graph_builder.py
Processed blar-example-repos/debugger_agent/src/utils/language_extensions.py
Processed blar-example-repos/debugger_agent/src/utils/format_nodes.py
Processed blar-example-repos/debugger_agent/src/utils/__init__.py
Processed blar-example-repos/debugger_agent/src/utils/tree_parser.py
Created 47 nodes
Created 82 edges


### Visualize Graph

For visualization we'll use the [yFiles](https://www.yworks.com/products/yfiles) library. This is an awesome interactive library that helps you visualize and explore graph data. Big shoutout to them!

In [None]:
graph = graph_manager.get_whole_graph(result_format="graph")

from yfiles_jupyter_graphs import GraphWidget

w = GraphWidget(graph=graph)


def custom_edge_label_mapping(edge):
    """let the label be the negated index"""
    return ""


w.set_edge_label_mapping(custom_edge_label_mapping)

Here you can see the graph generated from the example repo.


**Fun fact**: This was a previous version of our library

In [None]:
w.show()

GraphWidget(layout=Layout(height='800px', width='100%'))

## Debugger

In this section we will be utilizing the Blar debugger agent to find a bug in the example repo we just loaded to Neo4j. In this repo we have the following bug in blar-example-repos/debugger_agent/src/utils/format_nodes.py

The file looks like this:

```python 

def format_directory_node(path: str, package: bool) -> dict:
    processed_node = {
        "attributes": {
            "path": path + "/",
            "name": os.path.basename(path),
            "node_id": str(uuid.uuid4),
        },
        "type": "PACKAGE" if package else "FOLDER",
    }

    return processed_node

```

But it should look like this:

```python 

def format_directory_node(path: str, package: bool) -> dict:
    processed_node = {
        "attributes": {
            "path": path + "/",
            "name": os.path.basename(path),
            "node_id": str(uuid.uuid4()),
        },
        "type": "PACKAGE" if package else "FOLDER",
    }

    return processed_node

```

Notice the missing brackets () in the `"node_id": str(uuid.uuid4)` line

This has unintended consequences on the code. It's not going to throw an error but rather it would run perfectly fine but generate multiple unintended connections. When generating the graph using that code it looks like a tanggled mess. 

**NOTE**: If you run again the run.py it would still generate a wrong graph but it will not be as taggled. This is because we used a different method of saving the graph, the original graph.json first saves the graph in Neo4j, then we queried it and saved the resultant graph in a JSON file. The code in the repo saves it directly to JSON. This has to do with the way we queried Neo4j to create the edges.

In [None]:
import json

with open("blar-example-repos/debugger_agent/graph.json") as f:
    data = json.load(f)


w = GraphWidget()
w.nodes = data["nodes"]
w.edges = data["edges"]
w.directed = True
w.set_edge_label_mapping(custom_edge_label_mapping)

display(w)

GraphWidget(layout=Layout(height='800px', width='100%'))

Let's run our Blar agent to debug the code and see if it manages to find where the bug is.

We initialize the agent with our db manager. This is the information source our agent will use to traverse the graph.

In [None]:
agent = get_debug_agent(graph_manager)

In [None]:
repoId

'a785dc0b-f486-4d1e-8802-36022e9cc824'

We describe the problem we are seeing and run the agent. We know the flow for generating the graph starts at run.py. The agent will query multiple nodes and traverse the graph till it finds the problem

In [None]:
list(
    agent.stream(
        {
            "input": "The directory nodes generates multiples connections, it doesn't distinguish between different directories, can you fix it? The initial functions is run"
        }
    )
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `keword_search` with `{'query': 'run'}`


[0m

IndexError: list index out of range

As you can see the agent correctly traversed through the graph and found the correct node to fix the bug. It proposed a solution which is:

```python
"node_id": str(uuid.uuid4()),
```

This indeed is the solution we were looking for. 

**Note**: Due to the generative nature of LLMs you may not get exactly the same results.