# Graph RAG with OpenAI / Google Generative AI

Graph retrieval augmented generation (Graph RAG) is emerging as a powerful technique for generative AI applications to use domain-specific knowledge and relevant information. Graph RAG is an alternative to vector search methods that use a vector database. Knowledge graphs are knowledge systems where graph databases such as Neo4j or Amazon Neptune can represent structured data. In a knowledge graph, the relationships between data points, called edges, are as meaningful as the connections between data points, called vertices or sometimes nodes. A knowledge graph makes it easy to traverse a network and process complex queries about connected data. Knowledge graphs are especially well suited for use cases involving chatbots, identity resolution, network analysis, recommendation engines, customer 360 and fraud detection.

A Graph RAG approach leverages the structured nature of graph databases to give greater depth and context of retrieved information about networks or complex relationships. When a graph database is paired with a large language model (LLM), a developer can automate significant parts of the graph creation process from unstructured data like text. An LLM can process text data and identify entities, understand their relationships and represent them in a graph structure.

There are many ways to create a Graph RAG application, for instance Microsoft's GraphRAG, or pairing GPT4 with LlamaIndex. **For this tutorial you'll use Memgraph, an open source graph database solution to create a rag system by using a Large Language Model from OpenAI or Google Generative AI.** Memgraph uses Cypher, a declarative query language. It shares some similarities with SQL but focuses on nodes and relationships rather than tables and rows. You'll have the LLM both create and populate your graph database from unstructured text and query information in the database.

## Step 1

**This tutorial uses OpenAI or Google Generative AI. You will need to obtain an API key from the provider of your choice.**

a. Obtain an [OpenAI API Key](https://platform.openai.com/api-keys).

b. Or, obtain a [Google API Key](https://aistudio.google.com/app/apikey).

## Step 2

Now, you'll need to install Docker from [https://www.docker.com/products/docker-desktop/](https://www.docker.com/products/docker-desktop/)

Once you've installed Docker, install Memgraph using their Docker container. On OSX or Linux, you can use this command in a terminal:

    curl https://install.memgraph.com | sh

On a Windows computer use:

    iwr https://windows.memgraph.com | iex

Follow the installation steps to get the Memgraph engine and Memgraph lab up and running.

## Step 3

On your computer, create a fresh virtualenv for this project:

    python -m venv graphrag-env
    source graphrag-env/bin/activate  # On Windows: graphrag-env\Scripts\activate

In the Python environment for your notebook, install the required Python libraries:

    pip install -r requirements.txt

Now you're ready to connect to Memgraph.

## Step 4 - Memgraph接続設定

### 🔐 認証について

**重要**: このチュートリアルでは、Memgraphの認証は無効になっています。

- **ユーザーネーム・パスワードは不要**です
- アカウント登録やサインアップは必要ありません  
- 空の認証情報（空文字列）でそのまま接続できます

これはローカル開発環境用の設定で、機密データを扱わない場合に適しています。本格的な運用環境では認証を有効にすることを推奨します。

If you've configured Memgraph to use a username and password, set them here, otherwise you can use the defaults of having neither. It's not good practice for a production database but for a local development environment that doesn't store sensitive data, it's not an issue.

In [None]:
import os
 
from langchain_community.chains.graph_qa.memgraph import MemgraphQAChain
from langchain_community.graphs import MemgraphGraph
 
url = os.environ.get("MEMGRAPH_URI", "bolt://localhost:7687")
username = os.environ.get("MEMGRAPH_USERNAME", "")
password = os.environ.get("MEMGRAPH_PASSWORD", "")

# initialize memgraph connection
graph = MemgraphGraph(
    url=url, username=username, password=password, refresh_schema=True
)

## Step 5

Now create a sample string that describes a dataset of relationships that you can use to test the graph generating capabilities of your LLM system. You can use more complex data sources but this simple example helps us demonstrate the algorithm.

In [None]:
graph_text = """
John's title is Director of the Digital Marketing Group. John works with Jane whose title is Chief Marketing Officer. Jane works in the Executive Group. Jane works with Sharon whose title is the Director of Client Outreach. Sharon works in the Sales Group. John belongs to the Digital Marketing Group.
"""

Enter the API key you created in the first step. Choose either OpenAI or Google.

In [None]:
from getpass import getpass
import os

# Uncomment the API key you want to use

# For OpenAI
api_key = getpass("Enter your OpenAI API Key: ")
os.environ["OPENAI_API_KEY"] = api_key

# For Google Generative AI
# api_key = getpass("Enter your Google API Key: ")
# os.environ["GOOGLE_API_KEY"] = api_key

Now configure an LLM instance to generate text. The temperature should be set to 0 to encourage the model to generate factual details based on the input text without hallucinating entities or relationships that aren't present.

In [None]:
# Import the LLM class you want to use
from langchain_openai import ChatOpenAI
from langchain_google_genai import ChatGoogleGenerativeAI

# Select and initialize the LLM for graph generation.
# We use a temperature of 0 to get deterministic and factual results.

# Using OpenAI's GPT-4o-mini (supports structured output)
llm_for_graph_generation = ChatOpenAI(temperature=0, model_name="gpt-4o-mini")

# Using Google's Gemini Pro
# llm_for_graph_generation = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0, convert_system_message_to_human=True)

The `LLMGraphTransformer` allows you to set what kinds of nodes and relationships you'd like the LLM to generate. In your case, the text describes employees at a company, the groups they work in and their job titles. Restricting the LLM to just those entities makes it more likely that you'll get a good representation of the knowledge in a graph.

The call to `convert_to_graph_documents` has the LLMGraphTransformer create a knowledge graph from the text. This step generates the correct Cypher syntax to insert the information into the graph database to represent the relevant context and relevant entities.

In [None]:
from langchain_experimental.graph_transformers.llm import LLMGraphTransformer
from langchain_core.documents import Document

llm_transformer = LLMGraphTransformer(
    llm=llm_for_graph_generation, 
    allowed_nodes=["Person", "Title", "Group"],
    allowed_relationships=["TITLE", "COLLABORATES", "GROUP"]
)
documents = [Document(page_content=graph_text)]
graph_documents = llm_transformer.convert_to_graph_documents(documents)

Now clear any old data out of the Memgraph database and insert the new nodes and edges.

In [None]:
# make sure the database is empty
# Note: The original tutorial had specific storage mode commands which might not be necessary 
# for a simple setup. We'll proceed by clearing the graph directly.
graph.query("MATCH (n) DETACH DELETE n")
 
# create knowledge graph
graph.add_graph_documents(graph_documents)

The generated Cypher syntax is stored in the `graph_documents` objects. You can inspect it simply by printing it as a string.

In [None]:
print(f"{graph_documents}")

The schema and data types created by the Cypher can be seen in the graphs `get_schema` property.

In [None]:
graph.refresh_schema()
print(graph.get_schema)

In [None]:
# Debug: Let's see what data was actually created
print("🔍 Checking actual data in the graph...")

try:
    # Check Person nodes
    person_nodes = graph.query("MATCH (p:Person) RETURN p.id as person_id")
    print(f"👥 Person nodes: {person_nodes}")
    
    # Check Title nodes  
    title_nodes = graph.query("MATCH (t:Title) RETURN t.id as title_id")
    print(f"💼 Title nodes: {title_nodes}")
    
    # Check Group nodes
    group_nodes = graph.query("MATCH (g:Group) RETURN g.id as group_id") 
    print(f"🏢 Group nodes: {group_nodes}")
    
    # Check relationships
    relationships = graph.query("MATCH (n)-[r]->(m) RETURN type(r), n.id, m.id LIMIT 10")
    print(f"🔗 Relationships: {relationships}")
    
except Exception as e:
    print(f"❌ Debug queries failed: {e}")

You can also see the graph structure in the Memgraph labs viewer by connecting to your local instance and running the query `MATCH (n)-[r]->(m) RETURN n,r,m`:

The LLM has done a reasonable job of creating the correct nodes and relationships. Now it's time to query the knowledge graph.

## Step 6

Prompting the LLM correctly requires some prompt engineering. LangChain provides a FewShotPromptTemplate that can be used to give examples to the LLM in the prompt to ensure that it writes correct and succinct Cypher syntax. The following code gives several examples of questions and queries that the LLM should use. It also shows constraining the output of the model to only the query. An overly chatty LLM might add in extra information that would lead to invalid Cypher queries, so the prompt template asks the model to output only the query itself.

Adding an instructive prefix also helps to constrain the model behavior and makes it more likely that the LLM will output correct Cypher syntax. **Note that the Llama-3 specific tokens have been removed from the examples.**

In [None]:
from langchain_core.prompts import PromptTemplate

# The examples are updated to match actual data structure
examples = [
    {
        "question": "What is John's title?",
        "query": "MATCH (p:Person {id: 'John'})-[:TITLE]->(t:Title) RETURN t.id",
    },
    {
        "question": "Who does John collaborate with?",
        "query": "MATCH (p:Person {id: 'John'})-[:COLLABORATES]->(c:Person) RETURN c.id",
    },
    {
        "question": "What group is Jane in?",
        "query": "MATCH (p:Person {id: 'Jane'})-[:GROUP]->(g:Group) RETURN g.id",
    }
]

# Create a simpler, safer prompt template approach
# Build the examples string manually to avoid template conflicts
examples_text = ""
for example in examples:
    examples_text += f"User input: {example['question']}\nCypher query: {example['query']}\n\n"

# Create a simple prompt template that avoids conflicts with schema content
cypher_prompt_template = """You are a Cypher query expert. Given a schema and a question, you must create a syntactically correct Cypher query to answer the question.
You must respond with ONLY the query, with no other text, explanation, or context.
You must use the provided node and relationship labels and property names from the schema.

Here is the schema:
{{schema}}

Here are some examples:

{examples_text}User input: {{question}}
Cypher query: """

cypher_prompt = PromptTemplate(
    input_variables=["schema", "question"],
    template=cypher_prompt_template.format(examples_text=examples_text)
)

Next, you'll create a prompt to control how the LLM answers the question with the information returned from Memgraph. We'll give the LLM several examples and instructions on how to respond once it has context information back from the graph database.

In [None]:
from langchain.prompts.prompt import PromptTemplate

qa_template = """
You are a helpful assistant that answers user questions based on the context provided.
If the context is empty, say you don't know the answer.
Use only the information provided in the context to answer the question.
Your answer should be concise and directly answer the question.

Context:
{context}

Question: {question}

Answer:
"""

qa_prompt = PromptTemplate.from_template(qa_template)

Now it's time to create the question answering chain. The `MemgraphQAChain` allows you to set which LLM you'd like to use and the graph schema to be used. We will reuse the same LLM instance or create a new one for this chain.

In [None]:
# Select and initialize the LLM for the QA part of the chain.
# It can be the same as the one used for graph generation.

# Using OpenAI's GPT-4o-mini (supports structured output)
llm_for_qa = ChatOpenAI(temperature=0, model_name="gpt-4o-mini")

# Using Google's Gemini Pro
# llm_for_qa = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0, convert_system_message_to_human=True)

# Note: allow_dangerous_requests=True is required for security acknowledgment
# This chain can potentially execute dangerous Cypher queries
chain = MemgraphQAChain.from_llm(
    llm = llm_for_qa,
    graph=graph,
    verbose=True,
    return_intermediate_steps=True,
    cypher_prompt=cypher_prompt,
    qa_prompt=qa_prompt,
    allow_dangerous_requests=True  # Required for LangChain security
)

Now you can invoke the chain with a natural language question (note that your responses might be slightly different because LLMs are not purely deterministic, but should be consistent with a temperature of 0).

In [None]:
chain.invoke("What is Johns title?")

In the next question, ask the chain a slightly more complex question:

In [None]:
chain.invoke("Who does John collaborate with?")

You can ask the Memgraph chain about Group relationships:

In [None]:
chain.invoke("What group is Jane in?")

The chain correctly identifies the relationship.

# Conclusion

In this tutorial, you built a Graph RAG application using Memgraph and a large language model from OpenAI or Google Generative AI to generate the graph data structures and query them. Using an LLM, you extracted node and edge information from natural language source text and generated Cypher query syntax to populate a graph database. You then used the same or another LLM to turn natural language questions about that source text into Cypher queries that extracted information from the graph database. Using prompt engineering, the LLM turned the results from the Memgraph database into natural language responses.