# RAG(Retrieval-Augmented Generation)


# Document-Based Question Answering with Ollama & ChromaDB
## Overview
This project enables you to ask questions and get precise answers based on your own document files. It works by:

- Loading your text files from a folder.
- Splitting documents into smaller chunks.
- Generating vector embeddings for each chunk using the free Ollama model.
- Storing these embeddings efficiently in ChromaDB (a free vector database).
- Querying the database to find relevant document chunks matching your question.
- Using Ollama Llama3.1 chat model to generate a short, precise answer based on retrieved context.

# Models Used
- Embedding model: __mxbai-embed-large:latest__ (pulled from Ollama for generating embeddings)

- Chat/Response model: __llama3.1__ (pulled from Ollama for generating answers)

Make sure these models are pulled and available locally in your Ollama environment.

langchain-ollama==0.3.3

chromadb==1.0.9

langchain==0.3.25

ollama==0.4.8


In [18]:
import os
from langchain_ollama import ChatOllama
import chromadb
from chromadb.utils import embedding_functions
from ollama_embed_utils import OllamaEmbeddingFunction
from langchain.schema import HumanMessage, SystemMessage

In [19]:
ollama_ef = OllamaEmbeddingFunction()

chroma_client = chromadb.PersistentClient(path="chroma_persistent_storage")
collection = chroma_client.get_or_create_collection(
    name="document_qa_collection",
    embedding_function=ollama_ef
)


In [22]:
client = ChatOllama(
    model="llama3.1",
    temperature=0.0,
    streaming=True
)

In [23]:
for chunk in client.stream([
    {"role": "system", "content": "You are a helpful assistant. You have to give short and precise answers"},
    {"role": "user", "content": "What is human life expectancy in the Pakistan?"}
]):
    print(chunk.content, end="", flush=True)

According to World Health Organization (WHO), average life expectancy at birth in Pakistan is approximately 67 years (2019 data).

In [24]:
#Load pdf docs
def load_documents_from_directory(directory_path):
    print("==== Loading documents form directory ====")
    documents = []
    for filename in os.listdir(directory_path):
        if filename.endswith(".txt"):
            with open(
                os.path.join(directory_path, filename), "r", encoding="utf-8"
                ) as file:
                documents.append({"id": filename, "text": file.read()})
    return documents

In [25]:
#Split docs after loading
def split_text(text, chunk_size=1000, chunk_overlap=20):
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start = end - chunk_overlap
    return chunks

In [26]:
directory_path = "D:/AI/RAG/news_articles"
documents = load_documents_from_directory(directory_path)

print(f"Loaded {len(documents)} documents")

==== Loading documents form directory ====
Loaded 21 documents


In [27]:
#Split documents into chunks
chunked_documents = []
print("==== Splitting docs into chunks ====")
for doc in documents:
    chunks = split_text(doc["text"])
    for i, chunk in enumerate(chunks):
        chunked_documents.append({
            "id": f"{doc['id']}_{i}",
            "text": chunk
        })
print(f"Split documents into {len(chunked_documents)} chunks")

==== Splitting docs into chunks ====
Split documents into 184 chunks


In [9]:
from tqdm import tqdm

print("=== Generating embeddings... ====")
for doc in tqdm(chunked_documents, desc="Embedding progress"):
    emb = ollama_ef(doc["text"])  # this returns a list of lists, since __call__ returns List[List[float]]
    # emb looks like [[float, float, ...]], so get first element
    emb = emb[0]
    if hasattr(emb, "tolist"):
        emb = emb.tolist()  # convert numpy array to plain list
    doc["embedding"] = emb

print(chunked_documents[-1]["embedding"])


=== Generating embeddings... ====


Embedding progress: 100%|████████████████████████████████████████████████████████████████| 3/3 [00:08<00:00,  2.75s/it]

[-0.2743028402328491, 0.0571238212287426, -1.1270285844802856, 0.14380596578121185, -0.3818082809448242, 0.40361958742141724, 0.7806822657585144, -0.5700803399085999, 0.7917203307151794, 0.6183796525001526, -0.5715880393981934, 0.5295459628105164, 0.27168577909469604, 0.3384944796562195, -0.05579017847776413, 0.06897231936454773, -0.33749496936798096, -0.5190558433532715, -0.2213677316904068, 0.681004524230957, -0.24287042021751404, 0.4305674135684967, -0.3479507267475128, -0.7173733115196228, -0.4294317364692688, 0.3634388744831085, -0.5859798789024353, -0.5411979556083679, 1.5197324752807617, 0.7637106776237488, -0.9743333458900452, 0.7192009091377258, 0.08342567831277847, -0.7340049743652344, 0.3009255826473236, -0.5435150861740112, -0.10120236873626709, -0.10313581675291061, -0.8538109064102173, -1.0630824565887451, -0.2320588082075119, 0.7455339431762695, 0.6396477222442627, -0.959283709526062, -0.5961878299713135, -0.215089812874794, 0.45603013038635254, -0.6552021503448486, 0.02




In [10]:
# Upsert documents with embeddings into Chroma
print("==== Inserting chunks into db;;; ====")
for doc in chunked_documents:
    collection.upsert(
        ids=[doc["id"]], documents=[doc["text"]], embeddings=[doc["embedding"]]
    )

==== Inserting chunks into db;;; ====


In [43]:
# Function to query documents
def query_documents(question, n_results=2):
    # query_embedding = get_openai_embedding(question)
    results = collection.query(query_texts=question, n_results=n_results)

    # Extract the relevant chunks
    relevant_chunks = [doc for sublist in results["documents"] for doc in sublist]
    print("==== Returning relevant chunks ====")
    return relevant_chunks

In [44]:
def generate_response(question, relevant_chunks, model="llama3.1"):
    context = "\n\n".join(relevant_chunks)
    system_prompt = (
    "You are an assistant for question-answering tasks. Use the following pieces of "
    "retrieved context to answer the question. If you don't know the answer, say that you "
    "don't  know. Use three sentences maximum and keep the answer concise."
    "\n\nContext:\n" + context + "\n\nQuestion:\n" + question
    )

    response = client.invoke([
        SystemMessage(content=system_prompt),
        HumanMessage(content=question)
    ])
    return response.content

In [45]:
# Example usage
question = "tell me about AI replacing TV writers strike."
relevant_chunks = query_documents(question)
answer = generate_response(question, relevant_chunks)

print(answer)


==== Returning relevant chunks ====
The Writers Guild of America is on strike, demanding regulation of AI use in writers' rooms due to concerns that it could undermine their working conditions and replace human writers. The Alliance of Motion Picture and Television Producers (AMPTP) has refused to engage with the proposal to regulate AI output as writers' work. This issue remains a legal grey area, with unclear copyright implications for studios.


In [36]:
# Example usage
question = "tell me about lahore pakistan"
relevant_chunks = query_documents(question) 
answer = generate_response(question, relevant_chunks)

print(answer)


==== Returning relevant chunks ====
I don't know. The context provided does not mention Lahore, Pakistan. It appears to be related to TechCrunch's events and investments in India.
