# Building a RAG System Locally with Ollama, LlamaIndex, and Chroma DB

## Exercise 0 - Install Workshop Dependencies

Before starting the workshop, ensure all necessary dependencies are installed in your Python environment. Use the following steps to set up your environment.

### Step 1: Create a Virtual Environment

Create and activate a virtual environment to isolate the workshop dependencies. For this workshop, we use **Python 3.11**. Choose between **venv** or **conda** (using Mamba for efficiency).

##### Using `venv`

On Linux/Mac:
  ```bash
  python3.11 -m venv local-rag
  source local-rag/bin/activate
  ```
On Windows:
  ```bash
  python3.11 -m venv local-rag
  local-rag\Scripts\activate
  ```

##### Using `conda`

   ```bash
   conda create -n local-rag python=3.11
   conda activate local-rag
   ```

### Step 2: Install Required Packages

Install all the required dependencies:

```bash
pip install -r requirements.txt
```

### Step 3: Verify Installation

Check that the key packages are installed correctly by importing them in Python:

In [None]:
import chromadb
import llama_index
import ollama

print("Dependencies installed successfully!")

Dependencies installed successfully!


## Exercise 1 - Setting up Ollama

### Install Ollama

First, download and install Ollama from the official website: [https://ollama.com/download/](https://ollama.com/download/).

### Pull Required Models

Open a terminal and run the following commands to download the necessary models:

1. Pull the `llama3` model:
   ```bash
   ollama pull llama3
   ```

2. Pull the Nomic embedding model if required:
   ```bash
   ollama pull nomic
   ```

### Run the Model

Once the models are installed, you can run the `llama3` model and test it by writing some prompts. Use the following command:

```bash
ollama run llama3
```

Type a prompt and observe the output to ensure everything is working correctly.

### Interact with Ollama in Python



In [None]:
import ollama

response = ollama.generate(model="llama3", prompt="What is EPFL?", stream=True)

for r in response:
    print(r["response"], end="")

EPFL stands for École Polytechnique Fédérale de Lausanne, which is a Swiss federal institute of technology located in Lausanne, Switzerland. It is one of the two Swiss Federal Institutes of Technology (the other being ETH Zurich), and it is considered one of the best technical universities in the world.

EPFL was founded in 1853 as the École d'ingénieurs de l'état, and it has since grown to become a leading institution in Switzerland for education and research in science, technology, engineering, and mathematics (STEM). The university has a strong reputation for its programs in fields such as computer science, electrical engineering, mechanical engineering, physics, chemistry, biology, and more.

EPFL is known for its innovative teaching methods, which emphasize hands-on learning and collaboration between students and professors. The university also has a strong research focus, with many faculty members being world-renowned experts in their fields. EPFL has partnerships with other lead

## Exercise 2 - Getting Started with LlamaIndex and ChromaDB

**LlamaIndex** ([official site](https://llamaindex.ai)) is a framework for connecting LLMs with data sources, enabling efficient retrieval and interaction with structured or unstructured data.

**Chroma** ([official site](https://www.trychroma.com)) is a vector database designed for managing embeddings and serving as a retrieval layer for LLM applications.

In this exercise, we’ll explore how to set up and use LlamaIndex to index and retrieve data in a **Chroma** database.

### Step 0: Let's download a PDF

You can start by adding documents to the `./docs` folder. If you don't know what to use, we suggest downloading the PDF at the following link:

https://observationofalostsoul.wordpress.com/wp-content/uploads/2011/05/the-gospel-of-the-flying-spaghetti-monster.pdf

### Step 1: Set Up Chroma as the Storage Backend

Initialize the Chroma database and configure it for use with LlamaIndex. Here, we create an **Ephemeral Client** and collection, which stores data temporarily in memory without persisting it. This is ideal for testing and experimentation.

In [None]:
import chromadb

chroma_client = chromadb.EphemeralClient()
chroma_collection = chroma_client.get_or_create_collection("mydocs")

You can also create a **Persistent Client** that will preserve your database across sessions with:

```python
client = chromadb.PersistentClient(path="/path/to/save/to")
```

### Step 2: Set Up LlamaIndex connectors

Configure LlamaIndex to connect with Chroma as the vector store and set up a storage context. A **storage context** is an abstraction that manages how data is stored and retrieved, enabling seamless integration with different storage backends like Chroma.

In [None]:
from llama_index.core import StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

### Step 3: Load and explore documents

We can use LlamaIndex's `SimpleDirectoryReader` to **ingest documents from a directory**. This utility reads files from a specified directory and prepares them for indexing by splitting the content into manageable chunks.

In [None]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./docs", recursive=True).load_data()

Let's explore the content of the documents further with a dataframe.

In [None]:
from typing import List

import pandas as pd
from llama_index.core.schema import MetadataMode, TextNode


def data_to_df(nodes: List[TextNode]):
    """Convert a list of TextNode objects to a pandas DataFrame."""
    return pd.DataFrame([node.dict() for node in nodes])

In [None]:
document_df = data_to_df(documents)

document_df.head()

Unnamed: 0,id_,embedding,metadata,excluded_embed_metadata_keys,excluded_llm_metadata_keys,relationships,text,mimetype,start_char_idx,end_char_idx,text_template,metadata_template,metadata_seperator,class_name
0,5cbb133c-4d4d-4d5b-88bd-5e7649397c22,,"{'page_label': '1', 'file_name': 'the-gospel-o...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{},,text/plain,,,{metadata_str}\n\n{content},{key}: {value},\n,Document
1,881a8f06-e8f3-492a-86b8-7e914ab4125e,,"{'page_label': '2', 'file_name': 'the-gospel-o...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{},BOBBY HENDERSON,text/plain,,,{metadata_str}\n\n{content},{key}: {value},\n,Document
2,f548c13a-455d-4950-b1b0-e69e688d8bd0,,"{'page_label': '3', 'file_name': 'the-gospel-o...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{},A Villard Books Trade Paperback Original \nCop...,text/plain,,,{metadata_str}\n\n{content},{key}: {value},\n,Document
3,794c260b-04c0-4900-b2ce-770330eb08a1,,"{'page_label': '4', 'file_name': 'the-gospel-o...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{},"In the beginning was the Word, \nand the Word ...",text/plain,,,{metadata_str}\n\n{content},{key}: {value},\n,Document
4,1ba38dcb-b39f-4134-b7d7-0bd47baa4f15,,"{'page_label': '5', 'file_name': 'the-gospel-o...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{},Ackn owl ed gm en ts \nDELIVERING A DIVINE MES...,text/plain,,,{metadata_str}\n\n{content},{key}: {value},\n,Document


We observe several attributes, including `metadata`, `text`, `text_template`, and others. Let's focus on these three key categories:

- **`metadata`**: This attribute contains additional information about the document, such as its source, creation date, or tags that can be used for filtering or retrieval purposes.
- **`text`**: The main content of the document, representing the raw textual data that will be indexed and queried.
- **`text_template`**: A structured format or schema for the document's text, often used to define how the content should be presented or processed during queries. 

These attributes play distinct roles in organizing and interacting with your data. Feel free to explore the different attributes at this point.

### Step 4: Index and the documents

To ingest documents into an index, we will need an embedder model to convert the document content into vector representations. These embeddings enable efficient similarity searches and retrievals.

In [None]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

In LlamaIndex, we can create an index using the `VectorStoreIndex` class, which enables efficient storage and retrieval of document embeddings and integrates with various storage backends and embedding models. We use here the chroma collection we previously defined.

In [None]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    embed_model=embed_model,
    show_progress=True,
)

Parsing nodes: 100%|██████████| 178/178 [00:00<00:00, 3107.49it/s]
Generating embeddings: 100%|██████████| 178/178 [00:10<00:00, 17.43it/s]


### Step 5: Query the Index for Retrieval

Once the documents are indexed, we can perform retrieval on them. This allows us to ask questions or search for relevant content based on the embeddings stored in the index.

In [None]:
retriever = index.as_retriever(
    similarity_top_k=3,
)

nodes_with_score = retriever.retrieve("What is the Flying Spaghetti Monster?")
nodes = [n.node for n in nodes_with_score]
data_to_df(nodes)

Unnamed: 0,id_,embedding,metadata,excluded_embed_metadata_keys,excluded_llm_metadata_keys,relationships,text,mimetype,start_char_idx,end_char_idx,text_template,metadata_template,metadata_seperator,class_name
0,58aed9a8-f226-48cb-9e01-d245d2daf98d,,"{'page_label': '66', 'file_name': 'the-gospel-...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{'NodeRelationship.SOURCE': {'node_id': '29abd...,Key Moments in FSM History • • 59 \nOriginally...,text/plain,0,363,{metadata_str}\n\n{content},{key}: {value},\n,TextNode
1,125169e1-9b61-47d6-bfc7-88e444cde9d9,,"{'page_label': '138', 'file_name': 'the-gospel...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{'NodeRelationship.SOURCE': {'node_id': 'cb791...,1 34« -The Gospel of the Flying Spaghetti Mons...,text/plain,0,2198,{metadata_str}\n\n{content},{key}: {value},\n,TextNode
2,d77af2c6-ff05-4215-b63c-a0a9067caeb2,,"{'page_label': '65', 'file_name': 'the-gospel-...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{'NodeRelationship.SOURCE': {'node_id': '2e43c...,58 • 'Che Gospel of the Flying Spaghetti Monst...,text/plain,0,580,{metadata_str}\n\n{content},{key}: {value},\n,TextNode


Congrats! You've retrieved your first data!

## Exercise 3 - Your First RAG!

For a Retrieval-Augmented Generation (RAG) system, you need a Large Language Model (LLM) to generate answers to your queries by combining retrieved knowledge with the model's reasoning capabilities. At this point, Ollama comes to help as the LLM powering your RAG system. We set it up for LlamaIndex.

In [None]:
from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama3", request_timeout=120.0)

Everything is ready for querying your data. You can define a query engine and start asking it questions. Congrats, You have a working RAG!

In [None]:
query_engine = index.as_query_engine(
    llm=llm,
    similarity_top_k=3,
    streaming=True,
)

response = query_engine.query("What is the Flying Spaghetti Monster?")

In [None]:
response.print_response_stream()

A creator deity who has touched every continent and culture, leaving His mark with His Noodly Appendage.

### Going further

Under the hood, a basic retriever is used. You can see that the same nodes are returned.

In [658]:
nodes_with_score = response.source_nodes
nodes = [n.node for n in nodes_with_score]
data_to_df(nodes)

Unnamed: 0,id_,embedding,metadata,excluded_embed_metadata_keys,excluded_llm_metadata_keys,relationships,text,mimetype,start_char_idx,end_char_idx,text_template,metadata_template,metadata_seperator,class_name
0,58aed9a8-f226-48cb-9e01-d245d2daf98d,,"{'page_label': '66', 'file_name': 'the-gospel-...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{'NodeRelationship.SOURCE': {'node_id': '29abd...,Key Moments in FSM History • • 59 \nOriginally...,text/plain,0,363,{metadata_str}\n\n{content},{key}: {value},\n,TextNode
1,125169e1-9b61-47d6-bfc7-88e444cde9d9,,"{'page_label': '138', 'file_name': 'the-gospel...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{'NodeRelationship.SOURCE': {'node_id': 'cb791...,1 34« -The Gospel of the Flying Spaghetti Mons...,text/plain,0,2198,{metadata_str}\n\n{content},{key}: {value},\n,TextNode
2,d77af2c6-ff05-4215-b63c-a0a9067caeb2,,"{'page_label': '65', 'file_name': 'the-gospel-...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{'NodeRelationship.SOURCE': {'node_id': '2e43c...,58 • 'Che Gospel of the Flying Spaghetti Monst...,text/plain,0,580,{metadata_str}\n\n{content},{key}: {value},\n,TextNode


In [None]:
from llama_index.core import PromptTemplate

# custome prompt template
template = (
    "Imagine you are an advanced AI expert in cyber security laws, with access to all current and relevant legal documents, "
    "case studies, and expert analyses. Your goal is to provide insightful, accurate, and concise answers to questions in this domain.\n\n"
    "Here is some context related to the query:\n"
    "-----------------------------------------\n"
    "{context_str}\n"
    "-----------------------------------------\n"
    "Considering the above information, please respond to the following inquiry with detailed references to applicable laws, "
    "precedents, or principles where appropriate:\n\n"
    "Question: {query_str}\n\n"
    "Answer succinctly, starting with the phrase 'According to cyber security law,' and ensure your response is understandable to someone without a legal background."
)
qa_template = PromptTemplate(template)


query_engine = index.as_query_engine(
    llm=llm,
    similartiy_top_k=3,
    streaming=True,
    text_qa_template=qa_template,
)

response = query_engine.query("What is Red Teaming?")

In [None]:
from llama_index.core import Document
from llama_index.core.extractors import TitleExtractor
from llama_index.core.ingestion import IngestionCache, IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter, SentenceWindowNodeParser

sentence_splitter = SentenceSplitter()

sentence_window_parser = SentenceWindowNodeParser.from_defaults(
    window_size=5,
    include_prev_next_rel=True,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        sentence_window_parser,
    ]
)

In [None]:
nodes = pipeline.run(documents=documents[:10])

In [None]:
display_nodes(nodes).head()

In [None]:
node = nodes[49]

In [None]:
node.text

In [None]:
node.metadata

In [None]:
node.excluded_embed_metadata_keys

In [None]:
print(
    "The LLM sees this: \n",
    node.get_content(metadata_mode=MetadataMode.LLM),
)

In [None]:
node.metadata["window"]

In [None]:
print(
    "The Embedding model sees this: \n",
    node.get_content(metadata_mode=MetadataMode.EMBED),
)

In [None]:
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

node_postprocessors = (
    [MetadataReplacementPostProcessor(target_metadata_key="window")],
)

### ChromaDB

We can now embed our documents and store them in a ChromaDB.

In [None]:
sentence_index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, embed_model=embed_model
)

In [None]:
sentence_index.vector_store.clear()

In [None]:
sentence_index.insert_nodes(nodes)

In [None]:
chroma_collection.get()["metadatas"][0]["window"]

In [None]:
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    embed_model=embed_model,
    show_progress=True,
)

In [None]:
index.vector_store.clear()

## Exercise 3 - Your First RAG!

### Retrieve

Let's first practice the retrieval of a document based on a the similarity with a query

We can check which sources were identified to be the most relevant

In [None]:
print(response.get_formatted_sources())

In [None]:
response.source_nodes[0].node.metadata

In [None]:
query_engine = index.as_query_engine(
    llm=llm,
    similartiy_top_k=3,
    streaming=True,
)

response = query_engine.query("What is Red Teaming?")

In [None]:
response.print_response_stream()

In [None]:
query_engine = index.as_query_engine(
    llm=llm,
    similartiy_top_k=3,
    streaming=True,
)

response = query_engine.query("What is Red Teaming?")

Finally, let's practice the reprompting of our LLM with a custom template, in which the relevant context will be fed.

In [None]:
from llama_index.core import PromptTemplate

# custome prompt template
template = (
    "Imagine you are an advanced AI expert in cyber security laws, with access to all current and relevant legal documents, "
    "case studies, and expert analyses. Your goal is to provide insightful, accurate, and concise answers to questions in this domain.\n\n"
    "Here is some context related to the query:\n"
    "-----------------------------------------\n"
    "{context_str}\n"
    "-----------------------------------------\n"
    "Considering the above information, please respond to the following inquiry with detailed references to applicable laws, "
    "precedents, or principles where appropriate:\n\n"
    "Question: {query_str}\n\n"
    "Answer succinctly, starting with the phrase 'According to cyber security law,' and ensure your response is understandable to someone without a legal background."
)
qa_template = PromptTemplate(template)


query_engine = index.as_query_engine(
    llm=llm,
    similartiy_top_k=3,
    streaming=True,
    text_qa_template=qa_template,
)

response = query_engine.query("What is Red Teaming?")

In [None]:
response.print_response_stream()