
# End-to-End RAG Pipeline: Retrieval + LLM for Contextual Question Answering




## RAG Architecture (High Level)

```
User Query
    │
    ▼
Retriever  ──>  Top‑K Relevant Chunks  ──┐
                                         │ (as context)
                                         ▼
                                   Prompt Builder
                                         │
                                         ▼
                                   Language Model
                                         │
                                         ▼
                                   Final Answer
```

**Key Components**
- **Embeddings:** turn text into vectors for similarity search.
- **Vector Store:** index + retrieve nearest neighbors fast.
- **LLM:** generates answers grounded in retrieved context.



## Contents
- Installation
- Imports & Configuration
- Data Loading
- Preprocessing & Chunking
- Embeddings
- Vector Store
- Retriever
- Generation
- Evaluation (if present)
- Demo App / UI (if present)
- Results & Next Steps


### Imports & Configuration

Imports Python libraries and sets up configuration (keys, paths, constants). Review and update any paths or environment variables here.

In [1]:
# imports
import os
import glob
import chromadb
from dotenv import load_dotenv
import gradio as gr

In [2]:
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document
from langchain_chroma import Chroma
import numpy as np
from sklearn.manifold import TSNE
import plotly.graph_objects as go
from chromadb.utils.embedding_functions import OllamaEmbeddingFunction
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma
from langchain_ollama import OllamaLLM  
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

### Data Loading

Loads the source knowledge base (files or text). Replace with your own data as needed.

In [6]:
# Read in documents using LangChain's loaders
# Take everything in all the sub-folders of our knowledgebase

folders = glob.glob("pineapple/*")

# With thanks to CG and Jon R, students on the course, for this fix needed for some users 
text_loader_kwargs = {'encoding': 'utf-8'}
# If that doesn't work, some Windows users might need to uncomment the next line instead
# text_loader_kwargs={'autodetect_encoding': True}

documents = []
for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs = loader.load()
    for doc in folder_docs:
        doc.metadata["doc_type"] = doc_type
        documents.append(doc)

In the next cell, we split the text into chunks.

2 students let me know that the next cell crashed their computer.
They were able to fix it by changing the chunk_size from 1,000 to 2,000 and the chunk_overlap from 200 to 400.
This shouldn't be required; but if it happens to you, please make that change!
(Note that LangChain may give a warning about a chunk being larger than 1,000 - this can be safely ignored).

### Preprocessing & Chunking

Cleans and splits raw documents into bite-sized chunks to improve retrieval quality.

In [7]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

In [8]:
len(chunks)

12

In [9]:
doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
print(f"Document types found: {', '.join(doc_types)}")

Document types found: contracts, products, company, employees


### Embeddings

Builds numeric vector representations of text chunks (the backbone of semantic search).

In [10]:
# Check if a Chroma Datastore already exists - if so, delete the collection to start from scratch
db_name = "db_name"
ollama_ef = OllamaEmbeddings(model="mxbai-embed-large")
if os.path.exists(db_name):
    Chroma(persist_directory=db_name, embedding_function=ollama_ef).delete_collection()

In [8]:
# # Set the OLLAMA_HOST environment variable
# os.environ["OLLAMA_HOST"] = "http://localhost:11434"

# # Assuming 'chunks' is a list of Document objects
# document_strings = [doc.page_content for doc in chunks]

# ollama_ef = OllamaEmbeddingFunction(
#     model_name="mxbai-embed-large"
# ) 
# # Initialize ChromaDB
# db_name = "db_name"
# vectorstore = chromadb.PersistentClient(path=db_name)

# vectorstore.delete_collection(name="my_collection")

# # Create a collection with the Ollama embedding function
# collection = vectorstore.get_or_create_collection(
#     name="my_collection",
#     embedding_function=ollama_ef,
# )

# # Add the extracted string documents to the collection
# collection.add(
#     documents=document_strings,
#     ids=[f"id_{i}" for i in range(len(document_strings))]
# )

# print(f"Vectorstore created with {collection.count()} documents")

In [11]:
os.environ["OLLAMA_HOST"] = "http://localhost:11434"

# Assuming 'chunks' is a list of Document objects
document_strings = [doc.page_content for doc in chunks]

# Initialize the updated embedding class from the new package
ollama_ef = OllamaEmbeddings(model="mxbai-embed-large")

# Define the database name
db_name = "db_name"

# Create the Chroma vectorstore directly from documents
vectorstore = Chroma.from_documents(documents=chunks, embedding=ollama_ef, persist_directory=db_name)

print(f"Vectorstore created with {vectorstore._collection.count()} documents")

Vectorstore created with 12 documents


In [13]:
# Get one vector and find how many dimensions it has
collection = vectorstore._collection
sample_embedding = collection.get(limit=1, include=["embeddings"])["embeddings"][0]
dimensions = len(sample_embedding)
print(f"The vectors have {dimensions:,} dimensions")

The vectors have 1,024 dimensions


In [11]:
# # Assuming your collection was created with the name "my_collection"
# collection = vectorstore.get_collection(name="my_collection")

# # Get one vector and find how many dimensions it has
# sample_embedding = collection.get(limit=1, include=["embeddings"])["embeddings"][0]
# dimensions = len(sample_embedding)
# print(f"The vectors have {dimensions:,} dimensions")

## Visualizing the Vector Store

Let's take a minute to look at the documents and their embedding vectors to see what's going on.

In [15]:
# Prework

result = collection.get(include=['embeddings', 'documents', 'metadatas'])
vectors = np.array(result['embeddings'])
documents = result['documents']
doc_types = [metadata['doc_type'] for metadata in result['metadatas']]
colors = [['blue', 'green', 'red', 'orange'][['products', 'employees', 'contracts', 'company'].index(t)] for t in doc_types]

In [16]:
import numpy as np

# Retrieve the documents and their metadata
result = collection.get(include=['embeddings', 'documents', 'metadatas'])
vectors = np.array(result['embeddings'])
documents = result['documents']

# Safely extract 'doc_type' from metadata, handling None values and missing keys
doc_types = [
    metadata.get('doc_type') if metadata else None 
    for metadata in result['metadatas']
]

# Map doc_types to colors, handling None values gracefully
type_to_color = {
    'products': 'blue',
    'employees': 'green',
    'contracts': 'red',
    'company': 'orange',
    None: 'grey' # Assign a default color for documents with no doc_type
}

colors = [type_to_color.get(t, 'grey') for t in doc_types]

print("Documents processed and colors assigned.")

Documents processed and colors assigned.


In [30]:
# We humans find it easier to visalize things in 2D!
# Reduce the dimensionality of the vectors to 2D using t-SNE
# (t-distributed stochastic neighbor embedding)

tsne = TSNE(n_components=2, random_state=42, perplexity=5)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 2D scatter plot
fig = go.Figure(data=[go.Scatter(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='2D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x',yaxis_title='y'),
    width=800,
    height=600,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

### Vector Store

Indexes embeddings in a vector database for fast similarity search.

In [26]:
# Let's try 3D!

tsne = TSNE(n_components=3, random_state=42, perplexity=5)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 3D scatter plot
fig = go.Figure(data=[go.Scatter3d(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    z=reduced_vectors[:, 2],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='3D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),
    width=900,
    height=700,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

In [19]:
# Make sure the model is pulled and running in your Ollama console
llm = OllamaLLM(model="gemma3:1b") 

# Set up the conversation memory for the chat
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# The retriever is an abstraction over the VectorStore that will be used during RAG
retriever = vectorstore.as_retriever()

# Put it all together: set up the conversation chain
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

print("Conversation chain successfully created.")

Conversation chain successfully created.


  memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)


### Generation

Composes the final Retrieval-Augmented Generation (RAG) pipeline: retrieve context → prompt the LLM → generate grounded answers.

In [21]:
query = "Can you describe pineapple in a few sentences"
result = conversation_chain.invoke({"question":query})
print(result["answer"])

Pineapple is a technology company transforming the organic food industry by connecting farmers, retailers, and consumers. Their platform aims to create a more efficient and sustainable food system, focusing on making organic food accessible to everyone. They initially launched Grocellm, an organic grocery delivery service that partners with local farms to provide a wide selection of groceries online.


### Retriever

Creates a retriever that fetches the most relevant chunks for a user query.

In [41]:
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

In [42]:
def chat(message, history):
    result = conversation_chain.invoke({"question": message})
    return result["answer"]

In [43]:
# And in Gradio:

view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)

* Running on local URL:  http://127.0.0.1:7863
* To create a public link, set `share=True` in `launch()`.
