**Step 1: Load and preprocess text data**

 The first step is to load and preprocess the text data from which we'll extract the knowledge
 graph. In this example, we'll use a text snippet describing a technology company called
 prismaticAI, its employees, and their roles.


In [1]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter

# Load text from a file
file_path = "text.txt"  # Replace with your actual file path
loader = TextLoader(file_path)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=20)
texts = text_splitter.split_documents(documents)

1️⃣ documents = loader.load()
What it does:
loader.load() reads the contents of the text file and stores them as a list of Document objects.
Each Document contains a .page_content attribute that holds the text and a .metadata attribute that stores metadata (like file path).
output exmpl : [Document(page_content="This is the text from your file...", metadata={"source": "your_text_file.txt"})]

Why Use loader.load()?
loader.load() is necessary because it reads the text file and converts it into a Document object that LangChain can process.
Without calling loader.load(), you won't have any text data to work with inside LangChain.



2️⃣ text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=20)
What it does:
Creates a CharacterTextSplitter object that will be used to split the text into chunks.
chunk_size=200 → Each chunk will have a maximum of 200 characters.
chunk_overlap=20 → Each chunk will overlap by 20 characters with the previous one.
Why is chunking needed?
Large texts can be too big for processing in memory or passing to models.
Splitting keeps context while preventing input limits from being exceeded.

3️⃣ texts = text_splitter.split_documents(documents)
What it does:

Splits the text inside each Document into multiple smaller chunks.
Returns a list of new Document objects, each containing a chunk of text.

output exmpl : Original text: "This is an example of a long text document that needs to be split..."

Chunk 1: "This is an example of a long text document that needs to be split..."
Chunk 2 (overlapping): "...document that needs to be split into smaller parts..."


**Step 2: Initialize language model and extract knowledge graph**

After loading and preprocessing the text data, the next step is to initialize a language model and use it to extract a knowledge graph from the text chunks

In [2]:
from langchain_ollama import OllamaLLM
from langchain_experimental.graph_transformers.llm import LLMGraphTransformer

# Instantiate the Ollama LLM with Llama 3.2
model = OllamaLLM(model="llama3.2")

# Extract Knowledge Graph
llm_transformer = LLMGraphTransformer(llm=model)
graph_documents = llm_transformer.convert_to_graph_documents(texts)

**How it works**

1) Processing: LLM-Based Graph Extraction
LLMGraphTransformer reads the text and identifies entities (people, companies, roles, etc.).
It extracts relationships (who works where, what a company does, etc.).
It converts the extracted knowledge into graph documents.

2) Output: Graph Representation
After transformation, the knowledge graph is stored in graph_documents, structured as nodes and edges.

🔹 Nodes (Entities)
Sarah
Michael
PrismaticAI
Software Engineer
Data Scientist
AI Solutions
🔹 Edges (Relationships)
Sarah works at PrismaticAI
Michael works at PrismaticAI
Michael has role Data Scientist
Sarah has role Software Engineer
PrismaticAI develops AI Solutions




**📊 Schema of the Knowledge Graph**

        +----------+        works at        +-------------+
        |  Sarah   | ---------------------> | PrismaticAI |
        +----------+                        +-------------+
             |                                      |
       has role                                develops
             |                                      |
    +----------------+                      +----------------+
    | Software Eng. |                      | AI Solutions  |
    +----------------+                      +----------------+

        +----------+        works at        +-------------+
        | Michael  | ---------------------> | PrismaticAI |
        +----------+                        +-------------+
             |
       has role
             |
    +----------------+
    | Data Scientist |
    +----------------+



**Step 3: Store knowledge graph in a database**

After extracting the knowledge graph from the text data, it's important to store it in a persistent and queryable format. In this tutorial, we'll use Neo4j to store the knowledge graph. 


In [None]:
from langchain_community.graphs.neo4j_graph import Neo4jGraph
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Now you can access the variables like this:
neo4j_username = os.getenv("NEO4J_USERNAME")
neo4j_password = os.getenv("NEO4J_PASSWORD")

# Store Knowledge Graph in Neo4j
graph_store = Neo4jGraph(url="neo4j+s://a737671e.databases.neo4j.io", username=neo4j_username, password=neo4j_password)
graph_store.add_graph_documents(graph_documents)

**Step 4: Retrieve knowledge for RAG**

Now that we have stored the knowledge graph in a database, we can set up the components for retrieving relevant knowledge from the graph based on user queries and generating responses using the retrieved context. This is the core functionality of a RAG application.


In [None]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

# Retrieve Knowledge for RAG
graph_rag_retriever = KnowledgeGraphRAGRetriever(storage_context=graph_store.storage_context, verbose=True)
query_engine = RetrieverQueryEngine.from_args(graph_rag_retriever)

#Problem : 'Neo4jGraph' object has no attribute 'storage_context'

**Step 5: Query the knowledge graph and generate a response**
Finally, we can query the knowledge graph and generate responses using the retrieved context.


In [None]:
from llama_index.core.response_synthesis import ResponseSynthesizer

def query_and_synthesize(query):
    retrieved_context = query_engine.query(query)
    response = response_synthesizer.synthesize(query, retrieved_context)
    print(f"Query: {query}")
    print(f"Answer: {response}\n")

# Initialize the ResponseSynthesizer instance
response_synthesizer = ResponseSynthesizer(model)

# Query 1
query_and_synthesize("Where does Sarah work?")

# Query 2
query_and_synthesize("Who works for prismaticAI?")

# Query 3
query_and_synthesize("Does Michael work for the same company as Sarah?")