# An interesting knowledge graph + RAG implementation
https://blog.langchain.dev/enhancing-rag-based-applications-accuracy-by-constructing-and-leveraging-knowledge-graphs/

Neo4j Environment Setup
You can also set up a local instance of the Neo4j database by downloading the Neo4j Desktop application and creating a local database instance.

In [1]:
# Start a neo4j server 
# https://neo4j.com/docs/operations-manual/current/docker/introduction/
#!docker run --restart always --publish=7474:7474 --publish=7687:7687 --env NEO4J_AUTH=neo4j/admin1234 neo4j:5.19.0

In [2]:
# Install dependancies
#!pip install graphdatascience openai neo4j wikipedia langchain matplotlib networkx langchain_experimental

In [3]:
import os
import neo4j
from neo4j import GraphDatabase
import openai

# Setting environment variables
os.environ["NEO4J_URI"] = "bolt://localhost:7687" #http://localhost:7474/browser/
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "admin1234"

# Initialize OpenAI client
# Retrieve the OpenAI API key from environment variables
openai_api_key = os.getenv("OPENAI_API_KEY")

In [4]:
graph = Neo4jGraph()

NameError: name 'Neo4jGraph' is not defined

For this demonstration, we will use Elizabeth I’s Wikipedia page. We can utilize LangChain loaders to fetch and split the documents from Wikipedia seamlessly.

In [5]:
import os
import networkx as nx
import matplotlib.pyplot as plt
from langchain import OpenAI
from langchain.graphs.neo4j_graph import Neo4jGraph
from langchain_experimental.graph_transformers import LLMGraphTransformer
from neo4j import GraphDatabase
from langchain_community.document_loaders import WikipediaLoader


# Read the wikipedia article
raw_documents = WikipediaLoader(query="Elizabeth I").load()

# Define chunking strategy
text_splitter = TokenTextSplitter(chunk_size=512, chunk_overlap=24)
documents = text_splitter.split_documents(raw_documents[:3])

NameError: name 'WikipediaLoader' is not defined

Now it’s time to construct a graph based on the retrieved documents. For this purpose, we have implemented an LLMGraphTransformermodule that significantly simplifies constructing and storing a knowledge graph in a graph database.

In [None]:
llm=ChatOpenAI(temperature=0, model_name="gpt-4-0125-preview")
llm_transformer = LLMGraphTransformer(llm=llm)

# Extract graph data
graph_documents = llm_transformer.convert_to_graph_documents(documents)

# Store to neo4j
graph.add_graph_documents(
  graph_documents, 
  baseEntityLabel=True, 
  include_source=True
)

In [None]:
# Connect to Neo4j database
neo4j_graph = Neo4jGraph(
    uri=os.environ["NEO4J_URI"],
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"]
)

# Add graph documents to Neo4j
neo4j_graph.add_graph_documents(
    graph_documents, 
    baseEntityLabel=True, 
    include_source=True
)

# Query the graph data for visualization
driver = GraphDatabase.driver(
    os.environ["NEO4J_URI"],
    auth=(os.environ["NEO4J_USERNAME"], os.environ["NEO4J_PASSWORD"])
)

def query_graph(tx):
    query = """
    MATCH (a)-[r]->(b)
    RETURN a.name AS from, b.name AS to, type(r) AS relationship
    """
    result = tx.run(query)
    return list(result)

with driver.session() as session:
    graph_data = session.read_transaction(query_graph)

# Create a NetworkX graph
G = nx.DiGraph()
for record in graph_data:
    G.add_edge(record["from"], record["to"], relationship=record["relationship"])

# Plot the graph using NetworkX and Matplotlib
pos = nx.spring_layout(G)
plt.figure(figsize=(10, 7))
nx.draw(G, pos, with_labels=True, node_color="lightblue", font_size=12, font_weight="bold", edge_color="gray")
edge_labels = nx.get_edge_attributes(G, "relationship")
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)
plt.title("Neo4j Graph Visualization")
plt.show()

# Close Neo4j connection
driver.close()