# Creating a Persitant-Dynamic Backend

Now that we have our graph triplets, we would like to do inference with them! But hold on. In real world applications our knowledge graph may change over time. In this case we will want to be able to handle triplets being both added and deleted. Additionally, a persitent database for this information will be crucial in the case of crashes or other unforseen issues! 

This notebook will show you how to connect a simple knowledge-graph RAG agent to a database that is being actively updated, and we will do this without sacrificing performance! Let's get started.

The first thing we will need to do are a great many imports.

In [1]:
# General imports.
import os 
import re
import time
import timeit
import subprocess
import numpy as np
import getpass

# Imports for both our database and inference.
import cugraph
import networkx as nx
import nx_arangodb as nxadb
import pandas as pd
from nx_arangodb.convert import nxadb_to_nxcg
from arango import ArangoClient

# Langchain related imports.
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from imports.qa_chain_overrides import NXCugraphEntityGraph, GPUGraphQAChain

# Import the threading and multiprocessing toolboxes for parallelism
import threading
import multiprocessing
from multiprocessing import Process

  from .autonotebook import tqdm as notebook_tqdm
[00:18:27 +0000] [INFO]: NetworkX-cuGraph is available.


## Setting-up our Backend.

Now that we have all of those imports done, we can start with the good stuff. We will be using ArangoDB as a backend. ArangoDB is a graph database that works very well with NetworkX, a popular Python library for graph analysis. Better yet, ArangoDB has a cuGraph persistance layer available, which will allow us to accelerate NetworkX on GPU! If you want to learn more about this check out the blog [HERE](https://developer.nvidia.com/blog/accelerated-production-ready-graph-analytics-for-networkx-users/). 

We will get started by launching the database! The following command will launch an arangodb instance on port 8530 with the username "root" and password "ilovekgrag".

In [2]:
!docker run -e ARANGO_ROOT_PASSWORD=ilovekgrag -d --name arangodb -p 8529:8529 arangodb

docker: Error response from daemon: Conflict. The container name "/arangodb" is already in use by container "08c1578a46e47dc4f5708c8e6f222eb6b367f0df6ff84960ddf78ebd80b38207". You have to remove (or rename) that container to be able to reuse that name.
See 'docker run --help'.


Now we need to populate our database. I hope you will forgive us, but we took the liberty of creating some convenient CSV's for reading in the triples from the previous notebook. The details of this process can be found in the file "*data/getcsv.py*" if you are interested! We will first need to set up the structure of our graph database, and then populate with our CSV files. Let's do that.

In [3]:
# First we will set some environment variables. These will come up a few times!
adb_host = "http://172.17.0.1:8529"
#adb_host = "http://localhost:8530"
adb_username = "root"
adb_password = "ilovekgrag"
adb_name = "arangodb"
os.environ["DATABASE_HOST"] = adb_host
os.environ["DATABASE_USERNAME"] = adb_username
os.environ["DATABASE_PASSWORD"] = adb_password
os.environ["DATABASE_NAME"] = adb_name

# Now we need to set up our database structure.
# Set our client.
client = ArangoClient(hosts=adb_host)

# Now we can log in to a user that can control system properties.
system = client.db("_system",username=adb_username,password=adb_password)

# We now want to create our database if it doesnt exist.
if not system.has_database(adb_name):
    system.create_database(adb_name)
    
# Now we can set our ADB database to that.
ADB = client.db(adb_name, username=adb_username, password=adb_password)

# Now that this is done, we need to define the components of our ArangoDB graph.
ADB_graph = None
ADB_vertices = None
ADB_edges =None
if not ADB.has_graph("graph_data"):
    ADB_graph = ADB.create_graph("graph_data")
else:
    ADB_graph = ADB.graph("graph_data")

if not ADB_graph.has_vertex_collection("vertices"):
    ADB_vertices = ADB_graph.create_vertex_collection("vertices")
else:
    ADB_vertices = ADB_graph.vertex_collection("vertices")

if not ADB_graph.has_edge_collection("edges"):
    edge_def = {
                'edge_collection': 'edges',
                'from_vertex_collections': ['vertices'],
                'to_vertex_collections': ['vertices']
            }
    ADB_edges = ADB_graph.create_edge_definition(**edge_def)
else:
    ADB_edges = ADB_graph.edge_collection("edges")

Now the backend is set up to accept our CSVs! Let's place our edges and vertices into the database. These commands get big, and we have variables in Python already defined that will be useful, so let's just define a way to run these commands in Python. This will also be useful later.

In [4]:
# Make sure the collections are cleared.
ADB_vertices.truncate()
ADB_edges.truncate()

# Get dataframes for the data.
vertices_df = pd.read_csv("../data/csvs/vertices.csv")
edges_df = pd.read_csv("../data/csvs/edges.csv")

# Insert vertices into the database
for index, row in vertices_df.iterrows():
    aql = """
        INSERT {
            _key: @key
        } INTO vertices
    """
    bind_vars = {
        "key": row.get("_key"), 
    }
    try:
        ADB.aql.execute(aql, bind_vars=bind_vars)
    except Exception as e:
        pass

# Insert edges into the database
for index, row in edges_df.iterrows():
    aql = """
        INSERT {
            _key: @key,
            _from: @from,
            _to: @to,
            predicate: @predicate,
        } INTO edges
    """
    bind_vars = {
        "key": str(index), 
        "from": row.get("_from"), 
        "to": row.get("_to"), 
        "predicate": row.get("predicate"),
    }
    try:
        ADB.aql.execute(aql, bind_vars=bind_vars)
    except Exception as e:
        pass

Awesome, now we have a database we can really work with! But we still have a problem. This database is not on GPU. If we want to serve end users concurrently this is going to be a problem, and really limit our number of feasible inferences. So we need to read the data from this backend into NetworkX, and then use cuGraph as our backend to perform RAG tasks on GPU! Let's do that.

In [5]:
# First we snag our database as a NetworkX graph.
cpu_graph = nxadb.MultiDiGraph(name="graph_data")
    
# Now we can set this up to use cuGraph as well!
background_graph = nxadb_to_nxcg(cpu_graph)
    
#for edge in cpu_graph.edges():
#    background_graph[edge[0]][edge[1]]['predicate'] = cpu_graph[edge[0]][edge[1]]['predicate']

background_graph = NXCugraphEntityGraph(background_graph)

[00:18:43 +0000] [INFO]: Graph 'graph_data' exists.
[00:18:43 +0000] [INFO]: Default node type set to 'vertices'
[00:18:44 +0000] [INFO]: Graph 'graph_data' load took 0.08841562271118164s
[00:18:44 +0000] [INFO]: NXCG Graph construction took 0.15645480155944824s


You may be curious about the naming convention here. Why are we calling them *cpu_graph* and *background_graph*? Since our ArangoDB backend is not stored on the GPU, it's NetworkX graph goes through an intermediate CPU phase, and is then placed on the GPU. This explains the *cpu_graph* name, but why *background_graph*? Well, it takes time to move memory around. And since we are trying to create a dynamic knowledge graph, it is beneficial to keep two copies of our graph at a time. One copy can be continuously updated as updates are streamed in, while the other can be used for inference in that time. This will require some multi-threading, but we will keep it as simple as possible, and let you imagine more complicated work loads with continuous-asynchronous user queries and updates!  

To make this all work we will need a couple functions. Mainly, a function to handle database alterations, and a function to "swap" our *working_graph* and out *background_graph*. Let's build those.

In [6]:
# Now define a function for adding or removing a list of edges.
def changeADBEdges( edgelist:list[dict], remove:bool=False ):

    def updateDBEdge(subject_:str,relation_:str,object_:str,key_:str,remove:bool):
            
        # Make sure to strip out bad characters.
        subj = re.sub(r'[^a-zA-Z0-9_]', '', subject_)
        obj = re.sub(r'[^a-zA-Z0-9_]', '', object_)
        rel = re.sub(r'[^a-zA-Z0-9_]', '', relation_)
        edge_key = re.sub(r'[^a-zA-Z0-9_]', '', key_)

        # Define our edge_data and insert it.
        edge_data = {
            '_from': f'vertices/{subj}',
            '_to': f'vertices/{obj}',
            'predicate': f'{rel}',
            '_key': edge_key
        }

        # Set up our AQL query to see if the edge exists. NEED: also check for edge attributes.
        aql_query = f"""
        FOR edge IN edges
            FILTER edge._key == @edge_key
            RETURN edge
        """           

        # Check if the edge exists.
        edges = ADB.aql.execute(aql_query, bind_vars={"edge_key":edge_key})
        edges = list(edges)
        
        has_edge = False
        if edges:
            has_edge = True

        # Check if we are adding or removing this edge.
        if not remove:
            if not has_edge:
                try:
                    ADB_edges.insert(edge_data)
                except Exception as e:
                    # Covers some potential misformatting cases.
                    pass
        else:
            if has_edge:
                ADB_edges.delete(edge_data["_key"])
    
    # Launch a process for each edge to update the database.
    for edge in edgelist:
        updateDBEdge(**edge,remove=remove)

Now let's test it!

In [7]:
# First let's get all of our edges.
edge_collection = ADB.collection("edges")
edgelist = list(edge_collection.all())

# Define an edge-sampling function.
def sampleEdges(edgelist:list, sample_size:int=100):

    # First we sample the edges.
    sampled_indices = np.random.choice(np.arange(len(edgelist)), size=sample_size, replace=False)

    # Now turn them into a usable form to return.
    return_edges = []
    for index in sampled_indices:
        edge = edgelist[index]
        return_edges.append({"subject_":edge["_from"].split("/")[1], 
                             "relation_":edge["predicate"], 
                             "object_":edge["_to"].split("/")[1],
                             "key_":edge["_key"]})
    
    # Return the sampled edges.
    return(return_edges)

# Now let's sample!
sampled_edges = sampleEdges(edgelist)

# Let's test to see if it works!
changeADBEdges( edgelist = sampled_edges, remove = True )

Great it works! We will add those edges back later when we try to simulate a real workload. Now the bulk of our backend functionality is all prepared for us. We should probably get around to adding our *working_graph* as well as a function for exchaning it with *background_graph* when the time comes!

In [8]:
# Define our working graph as the datatype we need.
working_graph = NXCugraphEntityGraph()

# Print out information before the swap.
back_nodes = background_graph.get_number_of_nodes()
work_nodes = working_graph.get_number_of_nodes()
print("BEFORE SWAP..")
print(f"background_graph | number of vertices: {back_nodes}")
print(f"working_graph | number of vertices: {work_nodes}")

# Perform a swap.
working_graph, background_graph = background_graph, working_graph

# Print out the contents of our graphs after the swap!
back_nodes = background_graph.get_number_of_nodes()
work_nodes = working_graph.get_number_of_nodes()
print("AFTER SWAP...")
print(f"background_graph | number of vertices: {back_nodes}")
print(f"working_graph | number of vertices: {work_nodes}")

BEFORE SWAP..
background_graph | number of vertices: 6913
working_graph | number of vertices: 0
AFTER SWAP...
background_graph | number of vertices: 0
working_graph | number of vertices: 6913


The last step we need for our dynamic backend to work is the ability to update the backend after making Arango updates! Let's make a function for it, and we can call that from an independent thread later.

In [9]:
# Make a general "updateBackend" function.
# We already did this earlier, but we will be doing it a lot in the future.
def updateBackend():

    # First we snag our database as a NetworkX graph.
    cpu_graph = nxadb.MultiDiGraph(name="graph_data")

    # Now we can set this up to use cuGraph as well!
    background_graph = nxadb_to_nxcg(cpu_graph)
    background_graph = NXCugraphEntityGraph(background_graph)

    # Return our results
    return( cpu_graph, background_graph )

# Check that it works.
cpu_graph, background_graph = updateBackend()
background_graph, working_graph = working_graph, background_graph

# Print out some info.
back_nodes = background_graph.get_number_of_nodes()
work_nodes = working_graph.get_number_of_nodes()
print(f"background_graph | number of vertices: {back_nodes}")
print(f"working_graph | number of vertices: {work_nodes}")

[00:18:44 +0000] [INFO]: Graph 'graph_data' exists.
[00:18:44 +0000] [INFO]: Default node type set to 'vertices'
[00:18:44 +0000] [INFO]: Graph 'graph_data' load took 0.08525204658508301s
[00:18:44 +0000] [INFO]: NXCG Graph construction took 0.0016293525695800781s


background_graph | number of vertices: 6913
working_graph | number of vertices: 6858


And that's that! Notice that our *working_graph* now has less nodes than the *background_graph* since we removed a bunch! We will get a chance to use this more shortly, but first we need to make an agent to actualy perform our inference with.

## Building our RAG Agent

Now we want to put a RAG agent on top of our backend to allow for user inferences. We will use some [NIM endpoints](https://www.nvidia.com/en-us/ai/) for this and connect to them through [LangChain](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/). 

Since this is just a simple example, our agent doesn't need to do too much. Given a knowledge graph and a user query, we want it to retrieve relevent relationships from the graph to return to the user. Fortunately there is a way to do this using the [GraphQAChain](https://python.langchain.com/api_reference/community/chains/langchain_community.chains.graph_qa.base.GraphQAChain.html) in LangChain. Since we are using the GPU, and not the CPU, some small changes to the QA chain need to be altered. This has already been done, and the changes can be found within the *qa_chain_overrides.py* file. We have actally already used some of these earlier in the notebook!

In [10]:
if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvapi_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key

In [11]:
# Let's define a QA Chain to use.
llm_name = "mistralai/mixtral-8x22b-instruct-v0.1"
model = ChatNVIDIA(model=llm_name)
chain = GPUGraphQAChain.from_llm( llm=model, graph=working_graph, verbose=True )

# Let's define a function for performing a query.
def queryGraph( chain:GPUGraphQAChain, query:str ):

    result = chain._call( inputs = { 'query' : query } )

    return(result)

# Let's test out this function and see what we get!
response = queryGraph(chain,"What factors contribute to Uncertainty?")
print(response["result"])


Uncertainty can be contributed by various factors including:

1. Lack of predictability in events and outcomes.
2. Lack of information or limited data.
3. Incomplete understanding of causal relationships.
4. Ambiguity and vagueness.
5. Complexity of a situation or system.
6. Change and volatility in the environment.
7. Uncertainty in human behavior and decision-making.


## Testing the system.

Now that we have an agent for chat completion, and an agent for handling dynamic information, let's see how this comes together in a test enfironment. We will simulate this by looping, and performing queries as often as well can. First let's get some baseline times for inference and background updates!

In [12]:
# Get a couple queries to test with.
queries = [
    f"What does Sailpoint Technologies Holdings Inc do?",
    f"What factors contribute to Market Risk?",
    f"Where is the headquarters of Cornerstone OnDemand Inc?",
    f"What does Five9 Inc sell?",
    f"What industry does TWILIO INC work in?"
]

# Change our chain to not be verbose.
chain = GPUGraphQAChain.from_llm( llm=model, graph=working_graph, verbose=False )

# Time our queries in serial.
average_query_time = 0
for query in queries:

    start = time.monotonic()
    queryGraph(chain,query)
    end = time.monotonic()
    average_query_time += (end-start)

# Print the average.
print(f"Average query time: {average_query_time/len(queries)}")

# Compute the time to make a few backend changes.
average_backend_time = 0
sample_num = 5
remove_bool = False
for _ in range(sample_num):

    start = time.monotonic()
    changeADBEdges( edgelist = sampled_edges, remove = remove_bool )
    cpu_graph, background_graph = updateBackend()
    background_graph, working_graph = working_graph, background_graph
    end = time.monotonic()
    average_backend_time += (end-start)
    remove_bool = not remove_bool

print(f"Time for backend update and swap: {average_backend_time/sample_num}")


[00:18:57 +0000] [INFO]: Graph 'graph_data' exists.
[00:18:57 +0000] [INFO]: Default node type set to 'vertices'


Average query time: 2.1006909924002684


[00:18:57 +0000] [INFO]: Graph 'graph_data' load took 0.09360814094543457s
[00:18:57 +0000] [INFO]: NXCG Graph construction took 0.0019872188568115234s
[00:18:57 +0000] [INFO]: Graph 'graph_data' exists.
[00:18:57 +0000] [INFO]: Default node type set to 'vertices'
[00:18:57 +0000] [INFO]: Graph 'graph_data' load took 0.09091401100158691s
[00:18:57 +0000] [INFO]: NXCG Graph construction took 0.0013082027435302734s
[00:18:57 +0000] [INFO]: Graph 'graph_data' exists.
[00:18:57 +0000] [INFO]: Default node type set to 'vertices'
[00:18:58 +0000] [INFO]: Graph 'graph_data' load took 0.08577418327331543s
[00:18:58 +0000] [INFO]: NXCG Graph construction took 0.0018820762634277344s
[00:18:58 +0000] [INFO]: Graph 'graph_data' exists.
[00:18:58 +0000] [INFO]: Default node type set to 'vertices'
[00:18:58 +0000] [INFO]: Graph 'graph_data' load took 0.08767390251159668s
[00:18:58 +0000] [INFO]: NXCG Graph construction took 0.0012555122375488281s
[00:18:58 +0000] [INFO]: Graph 'graph_data' exists.
[

Time for backend update and swap: 0.2946968696000113


Wow! Our backend changes are pretty fast! Maybe we don't need to multithread anything. To make sure, let's see what happens when we increase the number of edges we are adding and removing.

In [None]:
# Sample a lot of edges.
sampled_edges = sampleEdges(edgelist,sample_size=len(edgelist)//2)

# Compute the time to make a few backend changes.
average_backend_time = 0
sample_num = 5
remove_bool = True
for _ in range(sample_num):

    start = time.monotonic()
    changeADBEdges( edgelist = sampled_edges, remove = remove_bool )
    cpu_graph, background_graph = updateBackend()
    background_graph, working_graph = working_graph, background_graph
    end = time.monotonic()
    average_backend_time += (end-start)
    remove_bool = not remove_bool

print(f"Average time for backend update and swap: {average_backend_time/sample_num}")

[00:19:07 +0000] [INFO]: Graph 'graph_data' exists.
[00:19:07 +0000] [INFO]: Default node type set to 'vertices'
[00:19:07 +0000] [INFO]: Graph 'graph_data' load took 0.08619809150695801s
[00:19:07 +0000] [INFO]: NXCG Graph construction took 0.0021889209747314453s


Unfortunately, having many seconds of down-time where we aren't serving user-queries is not very good. So, let's parallelize this solution. The following code uses a mixture of threading and multiprocessing to achieve parallelism!

In [None]:
# Define the background process function
def backgroundProcessFunction(sampled_edges, remove_bool, shared_data):

    global cpu_graph
    global background_graph

    start = time.monotonic()
    changeADBEdges(edgelist=sampled_edges, remove=remove_bool)
    cpu_graph, background_graph = updateBackend()
    end = time.monotonic()
    print(f"Time to update backend: {end - start}")
    
    # Mark swap as inactive
    shared_data['active_swap'] = False

# Define the query process function
def performQuery(chain, query, shared_data):

    start = time.monotonic()
    queryGraph(chain, query)
    end = time.monotonic()
    print(f"Time for query: {end - start}")
    
    # Mark query as inactive
    shared_data['active_query'] = False

# Create a manager for shared state between processes
manager = multiprocessing.Manager()

# Shared dictionary for global-like state
shared_data = manager.dict({
    'active_swap': False,
    'active_query': False
})

backends_swapped = 0
while backends_swapped < 5:
   
    if not shared_data['active_swap']:

        if not shared_data['active_query']:
            background_graph, working_graph = working_graph, background_graph
            
        shared_data['active_swap'] = True 
        background_process = threading.Thread(
            target=backgroundProcessFunction,
            args=(sampled_edges, remove_bool, shared_data,)
        )
        background_process.start()
        remove_bool = not remove_bool
        backends_swapped += 1

    elif not shared_data['active_query']:
        shared_data['active_query'] = True  
        query_process = multiprocessing.Process(
            target=performQuery,
            args=(chain, queries[0], shared_data)
        )
        query_process.start()

Awesome! That is way better! We are getting slightly longer update times, but achieving the same inference times we were getting before! Users will be far less upset than when we had long dead periods.