# Using Open Source LLMs and a Knowledge Graph to Implement a RAG application

## Background
Retrieval Augmented Generation (RAG) has become the hottest thing in AI. This comes as no surprise since RAG requires minimal code and helps build user trust in using LLM. The challenge when building a great RAG app or chatbot is handling structured text alongside unstructured text. Knowledge graphs can store both structured and unstructured text within a single database, reducing the work required to give LLM the information it needs. [Neo4j](https://neo4j.com/developer-blog/knowledge-graph-rag-application/) has published a blog demonstrating how to use neo4j-based knowledge graph and OpenAI LLM to build a chatbot to answer questions about microservices architecture and ongoing tasks. Since OpenAI LLM is not open source and its API cannot be used for experimenting free of charge anymore, here I will show you have to use the open source LLMs llama-3.1 to do similar work. You can try with llama-3.0 too, but the performance is much worse.

## Neo4j Environment setup
First, you'll need to follow the instruction in the [neo4j blog](https://neo4j.com/developer-blog/knowledge-graph-rag-application/) to set up a Neo4j 5.11 instance, or greater. The easiest way is to start a free cloud instance of the Neo4j database on [Neo4j Aura](https://neo4j.com/cloud/platform/aura-graph-database/): use your Gmail or other email address to sign in and choose the free tier plan. The Neo4j cloud service will pop up a window with the username (usually it is 'neo4j') and password (it will remind you this is the only chance to save the password somewhere for later use). It will take a few minutes to start the free cloud instance of the Neo4j database. Once it is started, copy the URI connection to the code below as the value of url. At the same time, use your username and password as well. If you haven't installed the langchain_community python package, you probably want to create a virtual environment for this project and pip install langchain_community before running the code below.

In [17]:
from langchain_community.graphs import Neo4jGraph

url = "Your neo4j instance URL"
username ="neo4j"
password = "Your neo4j instance password"
graph = Neo4jGraph(
    url=url,
    username=username,
    password=password
)

## Dataset
For the purpose of this demo, I will use the same dataset as used in the neo4j blog so that we can easily compare the performance of the open source LLMs and Open AI LLM. In reality, knowledge graphs are excellent at connecting information from multiple data sources. When developing a DevOps RAG application, you can fetch information from cloud services, task management tools, and more.

This synthetic dataset is a small dataset with only 100 nodes, but enough for this demo. The following code will import the sample graph into Neo4j instance we started above.

In [18]:
import requests
import_url = "https://gist.githubusercontent.com/tomasonjo/08dc8ba0e19d592c4c3cde40dd6abcc3/raw/e90b0c9386bf8be15b199e8ac8f83fc265a2ac57/microservices.json"
import_query = requests.get(import_url).json()['query']
graph.query(
    import_query
)

[]

Once the graph is imported,  click the 'Open' button in the interface of the started cloud Neo4j service, you will be able open Neo4j browser and see the nodes, relationships and visualisation of the graph.

## Neo4j Vector Index
We’ll begin with a simpler job by implementing a vector index search to find relevant tasks by their name and description. If you’re unfamiliar with vector similarity search, here’s a quick refresher. The key idea is to calculate the text embedding values for each task based on their description and name. Then, at query time, find the most similar tasks to the user input using a similarity metric like a cosine distance. The retrieved information from the vector index can then be used as context to the LLM so it can generate accurate and up-to-date answers.

The tasks are already in our knowledge graph. However, we must calculate the embedding values and create the vector index. Here, we’ll use the from_existing_graph method. Before running the code below, you need to follow the instruction in [OllamaEmbeddings](https://python.langchain.com/v0.2/docs/integrations/text_embedding/ollama/) to download and install Ollama onto the available supported platforms. Once it is installed successfully, fetch available LLM model via running Ollama pull <name-of-model> from terminal and then run Ollama serve. You also need to pip install langchain_ollama.

In this example, we use the following graph-specific parameters for the from_existing_graph method.
* index_name: name of the vector index.
* node_label: node label of relevant nodes.
* text_node_properties: properties to be used to calculate embeddings and retrieve from the vector index.
* embedding_node_property: which property to store the embedding values to.

In [19]:
from langchain_community.vectorstores.neo4j_vector import Neo4jVector
from langchain_ollama import OllamaEmbeddings

vector_index = Neo4jVector.from_existing_graph(
    embedding = OllamaEmbeddings(model="llama3"),
    url=url,
    username=username,
    password=password,
    index_name='tasks',
    node_label="Task",
    text_node_properties=['name', 'description', 'status'],
    embedding_node_property='embedding',
)

Now that the vector index is initiated, we can use it as any other vector index in LangChain.

In [20]:
response = vector_index.similarity_search(
    "How will RecommendationService be updated?"
)

In [21]:
for i in response:
    print(i.page_content)


name: Update
description: Update InventoryService to include real-time stock updates, ensuring accurate reflection of the inventory levels and aiding in the efficient management of stock.
status: in progress

name: RecommendationFeature
description: Add a new feature to RecommendationService to provide more personalized and accurate product recommendations to the users, leveraging user behavior and preference data.
status: in progress

name: FeatureAdd
description: Implement a new feature in OrderService to facilitate bulk orders, ensuring the features seamless integration with existing functionalities and maintaining the overall stability and performance of the service.
status: in progress

name: Refactor
description: Refactor the UserService codebase to enhance its readability, maintainability, and scalability, focusing primarily on modularization and optimization of existing functionalities.
status: completed


We can see that we construct a response of a map or dictionary-like string with defined properties in the text_node_properties parameter.

Now, we can easily create a chatbot response by wrapping the vector index into a RetrievalQA module. Before doing so, you need to create a free account at [groqcloud](https://console.groq.com/) and create an API key. You also need to pip install langchain_groq and insert your own GROQ_API_KEY.

In [22]:
import os
from langchain.chains import RetrievalQA
from langchain_groq import ChatGroq
os.environ['GROQ_API_KEY'] = 'Your Groq API key'

vector_qa = RetrievalQA.from_chain_type(
    llm=ChatGroq(model_name = 'llama-3.1-70b-versatile'), chain_type="stuff", retriever=vector_index.as_retriever())

In [23]:
vector_qa.invoke(
    {"query": "How will recommendation service be updated?"}
)

{'query': 'How will recommendation service be updated?',
 'result': 'The Recommendation Service will be updated to provide more personalized and accurate product recommendations to users, leveraging user behavior and preference data.'}

A general limitation of vector indexes is they don’t provide the ability to aggregate information like you would be using a structured query language like Cypher. Consider the following example:

In [24]:
vector_qa.invoke(
    "How many open tickets are there?"
)

{'query': 'How many open tickets are there?',
 'result': 'There are 3 open tickets: Update, RecommendationFeature, and FeatureAdd. The 4th ticket, Refactor, is completed.'}

The response seems valid, in part because the LLM uses assertive language. However, the response directly correlates to the number of retrieved documents from the vector index, which is three by default. So when the vector index retrieves three open tickets, the LLM unquestioningly believes there are no additional open tickets. However, we can validate whether this search result is true or not using a Cypher statement as below:

In [25]:
graph.query(
    "MATCH (t:Task {status:'open'}) RETURN count(*)"
)

[{'count(*)': 5}]

There are actually five open tasks in our toy graph. Vector similarity search is excellent for sifting through relevant information in unstructured text, but lacks the capability to analyse and aggregate structured information. Using Neo4j, this problem is easily solved by employing Cypher, a structured query language for graph databases. We need to explore whether LLM can translate our natural language question to a Cypher query and then give us the right answer.

## Graph Cypher Search
The good news is LangChain provides a GraphCypherQAChain, which generates the Cypher queries for you, so you don’t have to learn Cypher syntax to retrieve information from a graph database like Neo4j.

The following code will refresh the graph schema and instantiate the Cypher chain. Generating valid Cypher statement according to the natural language prompt is a complex task. Therefore, it is recommended to use state-of-the-art LLMs like llama3.1. If you try llama3.0, you will see that it cannot understand the question properly and generate a wrong Cypher query.

In [26]:
from langchain.chains import GraphCypherQAChain

graph.refresh_schema()

cypher_chain = GraphCypherQAChain.from_llm(
    cypher_llm = ChatGroq(temperature=0, model_name='llama-3.1-70b-versatile'),
    qa_llm = ChatGroq(temperature=0), graph=graph, verbose=True,
)

Let's have a look at the results. It can be seen that it generates the exact Cypher query we expect and thus return the correct answer.

In [27]:
cypher_chain.invoke(
    "How many open tickets are there?"
)



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (t:Task {status: 'open'}) RETURN COUNT(t)[0m
Full Context:
[32;1m[1;3m[{'COUNT(t)': 5}][0m

[1m> Finished chain.[0m


{'query': 'How many open tickets are there?',
 'result': 'There are 5 open tickets.'}

We can also ask the chain to aggregate the data using various grouping keys, like the following example.

In [28]:
cypher_chain.invoke(
    "Which Team has the most open Tasks?"
)



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (t:Team)-[:ASSIGNED_TO]-(ts:Task {status: 'open'}) 
RETURN t.name, COUNT(ts) AS count 
ORDER BY count DESC 
LIMIT 1[0m
Full Context:
[32;1m[1;3m[{'t.name': 'TeamA', 'count': 3}][0m

[1m> Finished chain.[0m


{'query': 'Which Team has the most open Tasks?',
 'result': 'TeamA has the most open tasks, with a count of 3.'}

You might say these aggregations are not graph-based operations, and that’s correct. We can, of course, perform more graph-based operations like traversing the dependency graph of microservices.

In [29]:
cypher_chain.invoke(
    "Which services depend on Database directly?"
)



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Microservice {name: 'Database'})<-[:DEPENDS_ON]-(s:Microservice) RETURN s.name[0m
Full Context:
[32;1m[1;3m[{'s.name': 'CatalogService'}, {'s.name': 'OrderService'}, {'s.name': 'UserService'}, {'s.name': 'PaymentService'}, {'s.name': 'InventoryService'}, {'s.name': 'AuthService'}][0m

[1m> Finished chain.[0m


{'query': 'Which services depend on Database directly?',
 'result': 'Based on the information provided, CatalogService, OrderService, UserService, PaymentService, InventoryService, and AuthService all depend on the Database directly.'}

Of course, you can also ask the chain to produce variable-length path traversals by asking questions like:

In [30]:
cypher_chain.invoke(
    "Which services depend on Database indirectly?"
)



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH p=(m:Microservice {name: 'Database'})<-[:DEPENDS_ON*2..]-(s:Microservice) RETURN DISTINCT s[0m
Full Context:
[32;1m[1;3m[{'s': {'name': 'ShippingService', 'technology': 'Python'}}, {'s': {'name': 'OrderService', 'technology': 'Python'}}, {'s': {'name': 'PaymentService', 'technology': 'Node.js'}}, {'s': {'name': 'UserService', 'technology': 'Go'}}][0m

[1m> Finished chain.[0m


{'query': 'Which services depend on Database indirectly?',
 'result': "I don't have enough information to determine which services depend on a database indirectly. However, I can tell you that ShippingService and OrderService use Python as their technology, PaymentService uses Node.js, and UserService uses Go."}

Some of the mentioned services are the same as in the directly dependent question. The reason is the structure of the dependency graph and not the invalid Cypher statement.

## Knowledge Graph Agent
We’ve implemented separate tools for the structured and unstructured parts of the knowledge graph. Now we can add an agent to use these tools to explore the knowledge graph. In the code below, it is critical to set up the prompt template properly. For example, in "Question", we have to add "ignore the question or query in the parse-able action", and in "Observation", we have to add "ignore the query part". Otherwise, the internal reasoning will repeat the same thing again and again even the agent can get the final answer after the first iteration. This is because the agent thinks the question appearing in the parse-able action needs to be answered.

In [77]:
from langchain.agents import AgentExecutor, Tool, create_react_agent

tools = [
    Tool(
        name="Tasks",
        func=vector_qa.invoke,
        description="""Useful when you need to answer questions about descriptions of tasks.
        Not useful for counting the number of tasks.
        Use full question as input.
        """,
    ),
    Tool(
        name="Graph",
        func=cypher_chain.invoke,
        description="""Useful when you need to answer questions about microservices,
        their dependencies or assigned people. Also useful for any sort of 
        aggregation like counting the number of tasks, etc.
        Use full question as input.
        """,
    ),
]

from langchain_core.prompts import PromptTemplate

template = '''Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer, ignore the question or query in the parse-able action
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action, ignore the query part
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}'''

prompt = PromptTemplate.from_template(template)

llm = ChatGroq(temperature=0, model_name='llama-3.1-70b-versatile')
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt, stop_sequence=True)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=2)
response = agent_executor.invoke({"input": "Which team is assigned to maintain PaymentService?"})
print("response:", response)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: To find the team assigned to maintain PaymentService, I need to look at the dependencies and assigned people for the microservice. This can be done using the Graph action.

Action: Graph
Action Input: {"question": "Which team is assigned to maintain PaymentService?"}[0m

[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (ms:Microservice {name: 'PaymentService'})-[:MAINTAINED_BY]->(t:Team) RETURN t.name[0m
Full Context:
[32;1m[1;3m[{'t.name': 'TeamD'}][0m

[1m> Finished chain.[0m
[33;1m[1;3m{'query': '{"question": "Which team is assigned to maintain PaymentService?"}', 'result': 'TeamD is assigned to maintain PaymentService.'}[0m[32;1m[1;3mFinal Answer: TeamD is assigned to maintain PaymentService.[0m

[1m> Finished chain.[0m
response: {'input': 'Which team is assigned to maintain PaymentService?', 'output': 'TeamD is assigned to maintain PaymentService.'}


Now let's try to invoke the Tasks tool.

In [80]:
response = agent_executor.invoke({"input": "What tasks have optimization in their description?"})
print("response:", response)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: To answer this question, I need to find tasks that have the word "optimization" in their description. This requires searching through task descriptions, which is best handled by the Tasks function.

Action: Tasks
Action Input: {"query": "What tasks have optimization in their description?"}[0m[36;1m[1;3m{'query': '{"query": "What tasks have optimization in their description?"}', 'result': 'Based on the given context, the task with optimization in its description is:\n\n1. Refactor - Refactor the UserService codebase to enhance its readability, maintainability, and scalability, focusing primarily on modularization and optimization of existing functionalities.'}[0m[32;1m[1;3mFinal Answer: Based on the given context, the task with optimization in its description is: Refactor - Refactor the UserService codebase to enhance its readability, maintainability, and scalability, focusing primarily on modularization and opt

If we compare the results above with those in this [neo4j blog](https://neo4j.com/developer-blog/knowledge-graph-rag-application/), we can see that the open source LLM either gives the same good results (the agent using Graph tool) or better one (the agent using Tasks tool). We consider the answer about the task is better because the llama model checks the exact word "optimization", while the gpt-4 model checks the word "optimize" and ignores "optimization".

## Conclusion

In conclusion, this notebook has demonstrated how to use open source LLM and knowledge graph to build RAG applications and AI agents. The performance of llama3.1 is shown to be very good.