In [1]:
# a Mini-RAG (Retrieval-Augmented Generation) with LlamaIndex and OpenAI

In [2]:
!pip install llama-index-llms-openai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
pip install --upgrade pip

Note: you may need to restart the kernel to use updated packages.


In [4]:
import os
from dotenv import load_dotenv

load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

In [5]:
import importlib.metadata
print(importlib.metadata.version("llama-index"))

0.12.35


In [6]:
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    Settings,
)
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import CompactAndRefine

In [7]:
# Set global configuration using Settings
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=100)


In [8]:
# Load files from the "data" folder
documents = SimpleDirectoryReader("data").load_data()

# Automatically uses the splitter defined in Settings
# Splits into "nodes" (chunks of documents)
# Check how many documents were loaded
print(f"\n📄 Number of documents loaded: {len(documents)}")

# Preview the first 500 characters of the first document
if documents:
    print("\n🔍 Document Preview:\n")
    print(documents[0].text[:200])
else:
    print("⚠️ No documents loaded. Check file format and content.")




📄 Number of documents loaded: 473

🔍 Document Preview:

Www.Medicalstudyzone.com


In [9]:
# Build the index from chunked nodes 
index = VectorStoreIndex.from_documents(documents)

# Create a query engine
query_engine = index.as_query_engine()

# test query
response = query_engine.query("What is the document about?")
print("\nResponse:\n", response)



Response:
 The document is about "HC Verma Concepts of Physics Volume 1" and it is dedicated to Indian Philosophy & Way of Life, emphasizing the integral part that the author's parents played in it.


In [10]:
# Manually split into nodes using the parser
nodes = Settings.node_parser.get_nodes_from_documents(documents)

# Preview a few chunks
for i, node in enumerate(nodes[:6]):
    print(f"\n--- Chunk {i+1} ---")
    print(node.text[:300])  # Show first 300 characters of the chunk



--- Chunk 1 ---
Www.Medicalstudyzone.com

--- Chunk 2 ---
CONCEPTS OF PHYSICS
[VOLUME 1]
H C VERMA, PhD
Retired Professor
Department of Physics
IIT, Kanpur
Www.Medicalstudyzone.com

--- Chunk 3 ---
Dedicated to
Indian Philosophy & Way of Life
of which
my parents were
an integral part
Www.Medicalstudyzone.com

--- Chunk 4 ---
FOREWORD
A few years ago I had an occasion to go through the book Calculus  by L V Terasov. It unravels intricacies
of the subject through a dialogue between Teacher and Student. I thoroughly enjoyed reading it. For me this
seemed to be one of the few books which teach a difficult subject through in

--- Chunk 5 ---
An extremely methodical, sincere person as a student, he has devoted himself to the task
of educating young minds and inculcating scientific temper amongst them. The present venture in the form of
these two volumes is another attempt in that direction. I am sure that young minds who would like to le

--- Chunk 6 ---
P R E F A C E
Why a new book ?
Excel

In [11]:
index = VectorStoreIndex.from_documents(documents)


In [12]:
# Use top-3 most similar chunks
retriever = VectorIndexRetriever(index=index, similarity_top_k=3)

synthesizer = CompactAndRefine()
# Put everything together
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=synthesizer,
)


In [13]:
response = query_engine.query( "What is projectile motion??")
print(response)

Projectile motion is the motion of an object thrown or projected into the air, moving along a curved path under the influence of gravity. It involves constant acceleration, typically in the vertical downward direction, and can be analyzed separately for its horizontal and vertical components. The object follows a trajectory that is a combination of horizontal motion at a constant velocity and vertical motion under the influence of gravity.


In [14]:
for i, node in enumerate(response.source_nodes):
    print(f"\n🔖 Source {i+1}:\n{node.node.get_content()[:200]}")



🔖 Source 1:
The velocity makes an angle θ with the X-axis where
        t a n θ = 
vy
vx
 = 3.6 m/s
12.8 m/s = 9
32 ⋅
The x-coordinate at t = 4.0 s is
       x = ux t + 1
2 ax t 
2
        = (8.0 m/s) (4.0 s) + 1

🔖 Source 2:
Figure
(3.10) shows a particle projected from the point O with
an initial velocity u at an angle θ with the horizontal.
It goes through the highest point A and falls at B on
the horizontal surface thr

🔖 Source 3:
If the height
from which the goli is projected is 19 .6 cm from the
ground and the goli is to be projected horizontally, with
what speed should it be projected so that it directly hits
the stationary 
