<a href="https://colab.research.google.com/github/Khushwant-singh/sample-rag-learning/blob/main/first_rag_llamaindex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

⭐ Minimal RAG Project — Summary

This notebook implements a simple Retrieval-Augmented Generation (RAG) pipeline using LlamaIndex, open-source embeddings, and a local HuggingFace LLM.

✅ Objective

Build a minimal end-to-end RAG system that can answer questions grounded in a small document corpus.

🧠 Pipeline Overview

The system follows this architecture:

Documents → Chunking → Embeddings → Vector Index → Retrieval → LLM → Answer
⚙️ Steps Implemented

1. Data ingestion

Text documents were uploaded into a data/ folder

Documents loaded using SimpleDirectoryReader

2. Chunking configuration

Chunk size and overlap configured to preserve semantic context

3. Embedding generation

HuggingFace embedding model (BAAI/bge-small-en-v1.5) used

Each chunk converted into a vector representation

4. Index creation

VectorStoreIndex built from document chunks

This acts as an in-memory vector database

5. LLM configuration

TinyLlama chat model used via HuggingFace

GPU runtime enabled for faster inference

6. Query engine creation

Query engine constructed from index

Combines retriever + response synthesizer

7. Question answering

User query embedded

Similar chunks retrieved from vector index

Retrieved context passed to LLM

LLM generates grounded answer

🎯 Key Learnings

RAG quality strongly depends on ingestion quality

Chunking affects retrieval effectiveness

Retrieval selects knowledge, LLM synthesizes answers

Hardware (GPU vs CPU) significantly impacts latency

LlamaIndex connects components via global Settings

🚀 Outcome

A fully functional minimal RAG system capable of answering questions using local documents.

In [1]:
#✅ Cell 1 — Install dependencies
!pip install llama-index
!pip install pypdf
!pip install llama-index-embeddings-huggingface
!pip install llama-index-llms-huggingface
!pip install transformers accelerate sentence-transformers

Collecting llama-index
  Downloading llama_index-0.14.15-py3-none-any.whl.metadata (13 kB)
Collecting llama-index-cli<0.6,>=0.5.0 (from llama-index)
  Downloading llama_index_cli-0.5.3-py3-none-any.whl.metadata (1.4 kB)
Collecting llama-index-core<0.15.0,>=0.14.15 (from llama-index)
  Downloading llama_index_core-0.14.15-py3-none-any.whl.metadata (2.6 kB)
Collecting llama-index-embeddings-openai<0.6,>=0.5.0 (from llama-index)
  Downloading llama_index_embeddings_openai-0.5.1-py3-none-any.whl.metadata (400 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.9.4-py3-none-any.whl.metadata (3.7 kB)
Collecting llama-index-llms-openai<0.7,>=0.6.0 (from llama-index)
  Downloading llama_index_llms_openai-0.6.21-py3-none-any.whl.metadata (3.0 kB)
Collecting llama-index-readers-file<0.6,>=0.5.0 (from llama-index)
  Downloading llama_index_readers_file-0.5.6-py3-none-any.whl.metadata (5.7 kB)
Collecting llama-

Collecting llama-index-embeddings-huggingface
  Downloading llama_index_embeddings_huggingface-0.6.1-py3-none-any.whl.metadata (458 bytes)
Downloading llama_index_embeddings_huggingface-0.6.1-py3-none-any.whl (8.9 kB)
Installing collected packages: llama-index-embeddings-huggingface
Successfully installed llama-index-embeddings-huggingface-0.6.1
Collecting llama-index-llms-huggingface
  Downloading llama_index_llms_huggingface-0.6.1-py3-none-any.whl.metadata (2.5 kB)
Collecting transformers<5,>=4.37.0 (from transformers[torch]<5,>=4.37.0->llama-index-llms-huggingface)
  Downloading transformers-4.57.6-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.34.0 (from transformers<5,>=4.37.0->transformers[torch]<5,>=4.37.0->llama-index-llms-huggingface)
  Downloading huggingface_hub-0.36.2-py3-none-any.whl.metadata (15 kB)
Downloading llama_index_llms_hugg

In [2]:
#✅ Cell 2 — Create data folder

In [3]:
!mkdir -p data

In [4]:
#✅ Cell 3 — Upload text file

In [5]:
from google.colab import files
files.upload()

Saving denmark.txt to denmark.txt


{'denmark.txt': b'rom Wikipedia, the free encyclopedia\nThis article is about metropolitan Denmark. For the sovereign state, see Danish Realm. For other uses, see Denmark (disambiguation).\nDenmark\nDanmark (Danish)\nConstituent part of the Kingdom of Denmark\nFlag of Denmark\nFlag\tOfficial seal of Denmark\nNational coat of arms\nMotto: Forbundne, forpligtet, for kongeriget Danmark[a]\n(United, committed, for the Kingdom of Denmark)\nAnthem: Der er et yndigt land (Danish)\n(English: "There is a lovely country")\nDuration: 1 minute and 20 seconds.1:20\nRoyal Anthem: Kong Christian stod ved h\xc3\xb8jen mast (Danish)[N 1]\n(English: "King Christian stood by the lofty mast")\nDuration: 1 minute and 11 seconds.1:11\n\nShow globe\nShow map of Europe\nShow both\nLocation of metropolitan Denmark (dark green)\n\xe2\x80\x93 in Europe (light green & dark grey)\n\xe2\x80\x93 in the European Union (light green)\n\nSovereign state\tKingdom of Denmark\nConsolidation\tc. 8th century[2]\nConstitution

In [6]:
!mv denmark.txt data/

✅ Cell 4 — Load documents

In [7]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
print(documents[0].text[:500])

rom Wikipedia, the free encyclopedia
This article is about metropolitan Denmark. For the sovereign state, see Danish Realm. For other uses, see Denmark (disambiguation).
Denmark
Danmark (Danish)
Constituent part of the Kingdom of Denmark
Flag of Denmark
Flag	Official seal of Denmark
National coat of arms
Motto: Forbundne, forpligtet, for kongeriget Danmark[a]
(United, committed, for the Kingdom of Denmark)
Anthem: Der er et yndigt land (Danish)
(English: "There is a lovely country")
Duration: 1 


✅ Cell 5 — Configure chunking

In [8]:
from llama_index.core import Settings

Settings.chunk_size = 512
Settings.chunk_overlap = 50

✅ Cell 6 — Configure embeddings

In [9]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

✅ Cell 7 — Configure LLM (TinyLlama)

In [10]:
from llama_index.llms.huggingface import HuggingFaceLLM
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

Settings.llm = HuggingFaceLLM(
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=256,
)

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]



✅ Cell 8 — Create index

In [11]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)

✅ Cell 9 — Create query engine

In [12]:
query_engine = index.as_query_engine(similarity_top_k=1)

✅ Cell 10 — Query

In [13]:
response = query_engine.query("What is Denmark known for?")
print(response)


Denmark is known for its metropole, most populous constituent, and constitutionally unitary state, which includes the autonomous territories of the Faroe Islands and Greenland in the north Atlantic Ocean.


In [15]:
response = query_engine.query("Official language of Denmark?")
print(response)

83.7% Danish is the official language of Denmark.


In [16]:
query_engine.query("Who is the president of USA?")

Response(response='45th President of the United States, Joe Biden.', source_nodes=[NodeWithScore(node=TextNode(id_='0b9a82e5-57a9-4d91-ae65-1d50102fbc86', embedding=None, metadata={'file_path': '/content/data/denmark.txt', 'file_name': 'denmark.txt', 'file_type': 'text/plain', 'file_size': 96469, 'creation_date': '2026-02-26', 'last_modified_date': '2026-02-26'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='7a944d5b-898a-473f-a482-572b459747d4', node_type='4', metadata={'file_path': '/content/data/denmark.txt', 'file_name': 'denmark.txt', 'file_type': 'text/plain', 'file_size': 96469, 'creation_date': '2026-02-26', 'last_modified_date': '2026-02-26'}, hash='d90bac3c6b87d95ec77d08746d10231148763d661d5976f

In [17]:
query_engine = index.as_query_engine(similarity_top_k=3)

In [18]:
response = query_engine.query("What is Denmark known for?")
print(response)


Denmark is known for its history, culture, and society, including its democratic system, liberal values, and progressive policies.


In [19]:
query_engine = index.as_query_engine(similarity_top_k=3)

In [20]:
response = query_engine.query("What is Denmark known for?")
print(response)


Denmark is known for its history, culture, and society, including its democratic system, liberal values, and progressive policies.
