## Part 3: Build a Simple RAG System
Objective: Create a minimal RAG pipeline using Milvus as the vector database.
Load 3â€“5 small text files (or web paragraphs) as your dataset.
Generate embeddings using SentenceTransformer("all-MiniLM-L6-v2").
Store them in Milvus.
Write a query function that takes a user question, retrieves the top 3 most similar chunks, and prints them.
Pass the retrieved context to an OpenAI model (or a local LLM) and generate an answer.


In [4]:
## 1.Read .txt files from a directory
TXT_DIR = "documents"
def load_txt_files(txt_dir):
    import os
    texts = []
    for filename in os.listdir(txt_dir):
        if filename.endswith(".txt"):
            with open(os.path.join(txt_dir, filename), 'r', encoding='utf-8') as f:
                texts.append(f.read())
    return texts

texts = load_txt_files(TXT_DIR)
print(f"Loaded {len(texts)} text files.")



Loaded 3 text files.


In [5]:
##   2. Chunk text into smaller pieces
def chunk_text(text, chunk_size=500, overlap=50):
    chunks = []
    start = 0
    while start < len(text):
        end = min(start + chunk_size, len(text))
        chunks.append(text[start:end])
        start += chunk_size - overlap
    return chunks   

In [6]:
## 3. prepare data for embedding
from sentence_transformers import SentenceTransformer
EMBED_MODEL = "all-MiniLM-L6-v2"
MILVUS_HOST = "127.0.0.1"
MILVUS_PORT = "19530"

embed_model = SentenceTransformer(EMBED_MODEL)
def embed_texts(texts):
    embd = embed_model.encode(texts, convert_to_numpy=True)
    return embd


  from .autonotebook import tqdm as notebook_tqdm


In [11]:
##  4. Connect to Milvus and create collection
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection   
connections.connect("default", host="127.0.0.1", port="19530")

In [12]:
## 5. Define schema and create collection
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=384),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535)
]
schema = CollectionSchema(fields, "Text embedding collection")
collection = Collection("text_embedding_collection", schema)    

In [13]:
## 6. Insert data into Milvus
for text in texts:
    chunks = chunk_text(text)
    embeddings = embed_texts(chunks)
    entities = [
        embeddings.tolist(),
        chunks
    ]
    collection.insert(entities) 

In [14]:
## 7. create index and load collection to memory
index_params = {    
    "index_type": "IVF_FLAT",
    "params": {"nlist": 128},
    "metric_type": "L2"
}
collection.create_index("embedding", index_params)
collection.load() 
print("Index created and collection loaded")  

Index created and collection loaded


In [15]:
## 8.  Retrieve similar texts
def search_similar_texts(query, top_k=5):
    query_embedding = embed_texts([query])[0].tolist()
    search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
    results = collection.search(
        data=[query_embedding],
        anns_field="embedding",
        param=search_params,
        limit=top_k,
        output_fields=["text"]
    )
    similar_texts = [hit.entity.get("text") for hit in results[0]]
    return similar_texts

In [20]:
## 9. Example query
query = "How is the market?"
results = search_similar_texts(query, top_k=3)
print("\nTop 3 similar chunks:")
for i, text in enumerate(results):
    print(f"{i+1}. {text}")  


Top 3 similar chunks:
1. ccording to the report.

Amazon stock inched higher by 0.3% in Fridayâ€™s premarket.

Get updates to this developing story directly on Stocktwits.
2. AI bubble won't burst for one or two years: Kirk Yang
Kirk Yang, Adjunct Finance Professor at National Taiwan University, says that he expect a strong correction in the AI markets in the next one to two years. He explains why he expects strong AI companies to survive the bubble burst, similar to the tech giants that surfaced after the Dot-com era.
3. Why Gen Z Graduates Are Facing a Crisis Explained
The curious minds at ColdFusion explain why Gen Z graduates are facing a crisis. This sheds light on structural challenges affecting employment, debt, and life stability.


In [21]:
## 10. Setup OpenAI client
import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv(override=True, dotenv_path="../.env")
api_key = os.getenv("OPENAI_API_KEY")
client = OpenAI(api_key=api_key)

print("OpenAI client initialized")

OpenAI client initialized


In [24]:
## 11. Function to pass retrieved context to OpenAI
def answer_query_with_context(query, top_k=3, model="gpt-4o-mini"):
    # Retrieve similar texts from Milvus
    retrieved_texts = search_similar_texts(query, top_k=top_k)
    
    # Combine retrieved texts as context
    context = "\n\n".join(retrieved_texts)
    
    # Create prompt with context
    prompt = f"""You are a helpful assistant. Use the following context extracted from documents to answer the user's question.
If the answer is not in the context, say you don't know.

Context:
{context}

Question:
{query}

Answer:"""
    
    # Call OpenAI API
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        max_completion_tokens=300,
        temperature=0.0
    )
    
    answer = response.choices[0].message.content
    return answer, retrieved_texts

print("Function defined: answer_query_with_context()")

Function defined: answer_query_with_context()


In [25]:
## 12. Run full RAG pipeline (retrieve + answer)
query = "How is the market?"

print(f"Query: {query}\n")
print("=" * 60)

# Get answer with context
answer, context_chunks = answer_query_with_context(query, top_k=3)

print("\nðŸ“„ Retrieved Context (Top 3 chunks):")
print("-" * 60)
for i, chunk in enumerate(context_chunks, 1):
    print(f"\n{i}. {chunk[:200]}..." if len(chunk) > 200 else f"\n{i}. {chunk}")

print("\n" + "=" * 60)
print("\nðŸ¤– AI Answer:")
print("-" * 60)
print(answer)

Query: How is the market?


ðŸ“„ Retrieved Context (Top 3 chunks):
------------------------------------------------------------

1. ccording to the report.

Amazon stock inched higher by 0.3% in Fridayâ€™s premarket.

Get updates to this developing story directly on Stocktwits.

2. AI bubble won't burst for one or two years: Kirk Yang
Kirk Yang, Adjunct Finance Professor at National Taiwan University, says that he expect a strong correction in the AI markets in the next one to t...

3. Why Gen Z Graduates Are Facing a Crisis Explained
The curious minds at ColdFusion explain why Gen Z graduates are facing a crisis. This sheds light on structural challenges affecting employment, debt,...


ðŸ¤– AI Answer:
------------------------------------------------------------
The market is showing a slight increase, with Amazon stock inching higher by 0.3% in Fridayâ€™s premarket. However, there are concerns about a potential correction in the AI markets within the next one to two years, as noted b