In [1]:
# a Mini-RAG (Retrieval-Augmented Generation) with LlamaIndex and OpenAI

In [2]:
!pip install llama-index-llms-openai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
pip install --upgrade pip

Note: you may need to restart the kernel to use updated packages.


In [4]:
import os
from dotenv import load_dotenv

load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

In [5]:
import importlib.metadata
print(importlib.metadata.version("llama-index"))

0.12.35


In [6]:
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    Settings,
)
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import CompactAndRefine

In [7]:
# Set global configuration using Settings
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=100)


In [15]:
# Load files from the "data" folder
documents = SimpleDirectoryReader("data").load_data()

# Automatically uses the splitter defined in Settings
# Splits into "nodes" (chunks of documents)
# Check how many documents were loaded
print(f"\n📄 Number of documents loaded: {len(documents)}")

# Preview the first 500 characters of the first document
if documents:
    print("\n🔍 Document Preview:\n")
    print(documents[0].text[:200])
else:
    print("⚠️ No documents loaded. Check file format and content.")




📄 Number of documents loaded: 432

🔍 Document Preview:

Historical Dictionary
of the Zulu Wars
John Laband
Historical Dictionaries of War, 
Revolution, and Civil Unrest, No. 37
The Scarecrow Press, Inc.
Lanham, Maryland • Toronto • Oxford
2009


In [9]:
# Build the index from chunked nodes 
index = VectorStoreIndex.from_documents(documents)

# Create a query engine
query_engine = index.as_query_engine()

# test query
response = query_engine.query("What is the document about?")
print("\nResponse:\n", response)



Response:
 The document is about the Zulu Kingdom.


In [16]:
# Manually split into nodes using the parser
nodes = Settings.node_parser.get_nodes_from_documents(documents)

# Preview a few chunks
for i, node in enumerate(nodes[:6]):
    print(f"\n--- Chunk {i+1} ---")
    print(node.text[:300])  # Show first 300 characters of the chunk



--- Chunk 1 ---
Historical Dictionary
of the Zulu Wars
John Laband
Historical Dictionaries of War, 
Revolution, and Civil Unrest, No. 37
The Scarecrow Press, Inc.
Lanham, Maryland • Toronto • Oxford
2009

--- Chunk 2 ---
vii
If you like your wars nice and neat, one side against the other, or just the 
“good guys” beating the “bad guys,” this is not the book for you. In its 
simplest form, the Zulu Wars can be regarded as a three-way struggle 
between the Zulus, the Boers, and the British, in various combinations 
an

--- Chunk 3 ---
about the Zulu Wars extensively, including several books and numer-
ous articles. He has also shown increasing interest in the Zulu people 
themselves, having coedited Zulu Identities: Being Zulu, Past and 
Present. This Historical Dictionary of the Zulu Wars thus benefits from 
Professor Laband’s c

--- Chunk 4 ---
ix
Acknowledgments
A historical dictionary covering 50 years of conflict in 19th-century Zu-
luland and its neighboring states must owe an eno

In [11]:
index = VectorStoreIndex.from_documents(documents)


In [12]:
# Use top-3 most similar chunks
retriever = VectorIndexRetriever(index=index, similarity_top_k=3)

synthesizer = CompactAndRefine()
# Put everything together
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=synthesizer,
)


In [13]:
response = query_engine.query( "tell me about the Zulu kingdom wars")
print(response)

The Zulu kingdom wars encompassed a series of conflicts between the Zulu kingdom and advancing colonial forces, starting with the Voortrekker invasion in 1838 and ending in 1888 with the failure of resistance to newly imposed British rule. These wars excluded earlier Zulu campaigns against neighboring African polities and the participation of Zulu people as British subjects in the Anglo-Boer War of 1899–1902. The Zulu Wars also involved civil wars within the Zulu kingdom triggered by destabilization, making it vulnerable to partition by colonial neighbors.


In [14]:
for i, node in enumerate(response.source_nodes):
    print(f"\n🔖 Source {i+1}:\n{node.node.get_content()[:200]}")



🔖 Source 1:
ZULU KINGDOM. The Zulu kingdom lasted only a little over six 
decades in the 19th century before being overthrown in war, broken 
into pieces, consigned to civil war, and eventually annexed piecemeal 

🔖 Source 2:
xli
Introduction
The term “Zulu Wars” is very imprecise, there being no single, gen-
erally accepted understanding of what it encompasses. Most often it 
is sloppily applied to the Anglo-Zulu War of 1

🔖 Source 3:
Only 
about 2,000 of the iziGqoza warriors escaped to Natal. The uSuthu 
casualties are unknown, though their right horn suffered heavily 
from gunfire. The battle decided the Zulu succession in Cetsh
