Vector RAG limitations
- Document embedding captures semantic meaning but struggles to capture themes and relationships between entities in the document corpus.
- As the volume of the database grows, the retrieval process can become less efficient, as the computational load increases with the search space.
- Vector RAG systems don't easily accommodate structured or diverse data, which are harder to embed.

Graph databases
- Graphs are great at representing and storing diverse and interconnected information in a structured manner.
- Graphs are represented as nodes and edges, which can capture complex relationships and hierarchies.

-----

Creating Graph Components

In [None]:
from langchain_community.document_loaders import WikipediaLoader 
from langchain_text_splitters import TokenTextSplitter

raw_documents = WikipediaLoader(query="large language model").load()
text_splitter = TokenTextSplitter (chunk_size=100, chunk_overlap=20)
documents = text_splitter.split_documents(raw_documents[:3])

print (documents [0])

In [None]:
from langchain_openai import ChatOpenAI
from langchain_experimental.graph_transformers import LLMGraphTransformer

llm = ChatOpenAI(api_key="...", temperature=0, model_name="gpt-40-mini")
Llm_transformer = LLMGraphTransformer(llm=llm)

graph_documents = llm_transformer.convert_to_graph_documents(documents)
print (graph_documents)

-----

Instantiating the Neo4j database 

In [None]:
from langchain_community.graphs import Neo4jGraph

graph = Neo4jGraph(url="bolt://Localhost:7687", username="neo4j", password="...")

import os

url = os. environ["NE04J_URI"]
user = os. environ ["NE04J_USERNAME"]
password = os.environ["NE04J_PASSWORD" ]
graph = Neo4jGraph(url=url, username=user, password=password)

In [None]:
graph.add_graph_documents(
  graph_documents, 
  include_source=True, # Link nodes to their source documents with MENTIONS edge
  baseEntityLabel=True, # add __Entity__ label to each node
)

In [None]:
print(graph.get_schema)

Quering

<image src="./images/quering_with_neo4js.png" alt="RAG Workflow" width="600">

In [None]:
results = graph. query ("""
MATCH (gpt4:Model {id: "Gpt-4"})-[:DEVELOPED_BY]->(org:0rganization)
RETURN org
""")

print(results)

-----

Combining everythig together

In [None]:
from langchain_community.chains.graph_qa.cypher import GraphCypherQAChain

chain = GraphCypherQAChain.from_llm(
  llm=llm,
  graph=graph, 
  verbose=True
)

result = chain.invoke({"query": "What is the most accurate model?"})

print(f"Final answer: {result['result']}")

Improving graph retrieval

In [None]:
from langchain_community.chains.graph_qa.cypher import GraphCypherQAChain

llm = ChatOpenAI (api_key="...", model="gpt-40-mini", temperature=0)

chain = GraphCypherQAChain.from_llm(
  graph=graph, 
  llm=llm, 
  exclude_types=["Concept"], 
  verbose=True,
  validate_cypher=True # Detects nodes and relationships + Determines the directions of a relationship + checks the graph schema + update the direction of relationships if needed
)

print(graph.get_schema)