# RAG from Scracth

This notebook goal is to implement a simple [[Retrieval Augmented Generation (RAG)]] using [[Vector#Dot Product|Cosine Simillarity]].

The pipelines goes like this: first you create [[Retrieval Augmented Generation (RAG)#Chunks|Chunks]] for the documents in which you want to search and create [[Embeddings]]. When the user asks a questions, it creates embeddings of the questions and using a similarity score retrieve the top $K$ results, and feed them to the [[Language Models|LM]].

# Requirements

You must have ollama installed. Use this to get the same model as I am.

```bash
ollama run gemma3:4b
```

You also must have authorization from this link: 

# Imports

In [1]:
import ollama
from sentence_transformers import SentenceTransformer
import numpy as np
from glob import glob
from IPython.display import Markdown, display

EMBEDDING_MODEL = SentenceTransformer("google/embeddinggemma-300m")
LM_VERSION = "gemma3:4b"

2026-01-26 17:53:10.788819: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-01-26 17:53:10.796219: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1769460790.804473   59637 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1769460790.807452   59637 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1769460790.815022   59637 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

For this notebook I will use this obsidian vault as the dataset. I will try to use each separated file as a chunk since there isn't no really big files.

For context of the reader, EE is Exploration vs Exploitation in my notes.

In [2]:
question = "give me two types of EE heuristics"

In [3]:
VECTOR_DB = []

def add_chunk_to_database(chunk):
  embedding = EMBEDDING_MODEL.encode(chunk)
  VECTOR_DB.append((chunk, embedding))

In [4]:
filepath = glob("../../notes/**/**.md", recursive=True)
filepath

['../../notes/01-concepts/linear-algebra/Quadratic Form.md',
 '../../notes/01-concepts/linear-algebra/Vector Spaces.md',
 '../../notes/01-concepts/linear-algebra/Kernel and Image.md',
 '../../notes/01-concepts/linear-algebra/Linear Systems.md',
 '../../notes/01-concepts/linear-algebra/Matrix.md',
 '../../notes/01-concepts/linear-algebra/Linearity.md',
 '../../notes/01-concepts/linear-algebra/Orthonormal and Orthogonal.md',
 '../../notes/01-concepts/linear-algebra/Types of Matrices.md',
 '../../notes/01-concepts/linear-algebra/Base vectors.md',
 '../../notes/01-concepts/linear-algebra/Unit Vector.md',
 '../../notes/01-concepts/linear-algebra/Internal Product.md',
 '../../notes/01-concepts/linear-algebra/Vector.md',
 '../../notes/01-concepts/linear-algebra/Base change.md',
 '../../notes/01-concepts/linear-algebra/Projection.md',
 '../../notes/01-concepts/linear-algebra/Tensor.md',
 '../../notes/01-concepts/linear-algebra/Linear definitions.md',
 '../../notes/01-concepts/linear-algebra/Ei

In [5]:
# load each file and add it to the database
for file in filepath:
  with open(file, "r") as f:
    content = f.read()
    # append the file name to the chunk
    if content == "": continue
    content = f"File: {file}\n{content}"
    
    add_chunk_to_database(content)

# turn the database into a numpy array
VECTOR_DB_NP = np.array([v[1] for v in VECTOR_DB])

In [6]:
# Implement a simple cosine similarity function using numpy
def cosine_similarity(a, b):
  return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Implement a simple RAG function
def retrival(question, k=5):
  query_embedding = EMBEDDING_MODEL.encode(question)
  similarities = cosine_similarity(VECTOR_DB_NP, query_embedding)
  top_k_indices = np.argsort(similarities)[-k:][::-1]
  top_k_chunks = [VECTOR_DB[i][0] for i in top_k_indices]
  return top_k_chunks

# Test the RAG function
top_k_chunks = retrival(question)
print(top_k_chunks)


['File: ../../notes/01-concepts/machine-learning/Reinforcement/Exploration vs Exploitation.md\nWhen Agents learn the practical problem is that the number of states and actions can grow indefinitely depending on the problem, and depending on the algorithm used, it might need more samples of the environment to learn, so the problem becomes, use a action already used or explore unknowns. \n## Heuristics\nTo try to mitigate this problem and bring some kind of balance there are several heuristics for saying if a agent should explore or should exploit. All of them can have some kind of decay, meaning it some how it can start decreasing the amount of exploration the agent does, this decay can be implemented in multiple forms like in each iteration decrease a value, or multiple by a value between 0 and 1 for faster decay. \n### $\\epsilon$ greedy\nThis is the most famous of them all. Which says that Agents should choose a random policy with probability $\\epsilon$ and should choose the best ac

In [7]:

def RAG(question, k=5):
  top_k_chunks = retrival(question, k)
  print(top_k_chunks)
  k_str = '\n'.join([f' - {chunk}' for chunk in top_k_chunks])
  instruction_prompt = f'''You are a helpful chatbot.
Use only the following pieces of context to answer the question. Don't make up any new information:
{k_str}
'''
  response = ollama.generate(model=LM_VERSION, prompt=instruction_prompt)
  return response

response = RAG(question)
display(Markdown(response.response))

['File: ../../notes/01-concepts/machine-learning/Reinforcement/Exploration vs Exploitation.md\nWhen Agents learn the practical problem is that the number of states and actions can grow indefinitely depending on the problem, and depending on the algorithm used, it might need more samples of the environment to learn, so the problem becomes, use a action already used or explore unknowns. \n## Heuristics\nTo try to mitigate this problem and bring some kind of balance there are several heuristics for saying if a agent should explore or should exploit. All of them can have some kind of decay, meaning it some how it can start decreasing the amount of exploration the agent does, this decay can be implemented in multiple forms like in each iteration decrease a value, or multiple by a value between 0 and 1 for faster decay. \n### $\\epsilon$ greedy\nThis is the most famous of them all. Which says that Agents should choose a random policy with probability $\\epsilon$ and should choose the best ac

Okay, I understand. Based on the provided context, here’s a summary of how agents handle the exploration vs. exploitation dilemma in reinforcement learning:

**The Problem:** When dealing with complex environments (many states and actions), algorithms often need a lot of data to learn effectively. This creates a challenge: should the agent continue to explore new, potentially rewarding actions (exploration) or stick with actions it already knows are good (exploitation)?

**Heuristics for Balance:** To address this, several heuristics are used to guide the balance between exploration and exploitation. These heuristics often involve a "decay" mechanism – a way to gradually reduce exploration as the agent learns.

**Common Heuristics:**

*   **Epsilon-Greedy:** This is the most common method. The agent randomly chooses an action with probability ε (epsilon) and chooses the best-known action with probability 1 - ε.
*   **Boltzmann (or Softmax) Approach:** This method uses the Boltzmann distribution to assign probabilities to actions based on their Q-values (expected rewards). It relies on the Boltzmann equation:

    `P(s,a) =  (e^(Q(s,a) / T)) / sum_b (e^(Q(s,b) / T))`

    where:
    *   `Q(s,a)` is the Q-value for state `s` and action `a`.
    *   `T` is a temperature parameter (higher T = more exploration, lower T = more exploitation).

**Algorithm: Sarsa**

*   Sarsa is a specific "on-policy" algorithm based on Temporal Difference (TD) learning.
*   It updates the Q-value based on the immediate reward and the estimated Q-value of the *next* state and action.
*   The update rule is:

    `delta_t = r_{t+1} + gamma * Q(s_{t+1}, a_{t+1}) - Q(s_t, a_t)`
    `Q(s_t, a_t) = Q(s_t, a_t) + alpha * delta_t`

    Where:
    *   `alpha` is the learning rate.
    *   `gamma` is the discount factor.

**Important Note:**  The context also highlights that reasoning, as traditionally defined, is different from techniques like Chain of Thought, Tree of Thought, Graph of Thought, and Prompt Agentic.

Do you have a specific question you'd like me to answer, or would you like me to elaborate on a particular aspect of this information?

In [8]:
# without rag
response = ollama.generate(model=LM_VERSION, prompt=question)
display(Markdown(response.response))

Okay, let's break down two key types of Enterprise Experience (EE) heuristics. These heuristics, developed by McKinsey, help organizations understand and address the complexities of transforming customer experiences across all touchpoints.

**1. The "5 Pillars" Heuristics:**

This is arguably the most well-known and foundational set of EE heuristics. It identifies five key areas that need to be addressed for successful EE transformation.

*   **Customer Journey:**  This pillar focuses on understanding the *actual* customer journey, not just the intended one. It involves mapping, analyzing, and optimizing the steps customers take when interacting with your brand, from initial awareness to post-purchase support. It's about uncovering pain points, moments of truth, and opportunities for improvement.
*   **Channels:**  This pillar looks at how customers interact with your brand across all available channels – website, mobile app, email, social media, physical store, call center, etc.  It’s not just about *having* channels, but about how they’re connected, integrated, and working *together* for a seamless experience.  Duplication and silos are major concerns.
*   **Digital Integration:**  This is about unifying digital experiences. It addresses how digital channels (like websites and apps) connect with offline channels (like stores and customer service). It's about creating a consistent and personalized experience regardless of where the customer interacts.
*   **Data & Analytics:**  This pillar highlights the critical role of data in driving EE.  It's about capturing, analyzing, and using customer data to understand behaviors, personalize experiences, and measure the impact of changes.  It’s not just about collecting data, but turning it into actionable insights.
*   **Organization & Culture:** This focuses on the people and processes within the organization. It recognizes that EE success requires a cultural shift – one where customer experience is truly embedded in everyone's thinking and actions, not just the responsibility of a single department. It includes things like empowered teams, cross-functional collaboration, and a customer-centric mindset.



**2. The "Four Dimensions" Heuristics:**

This heuristic focuses on the depth of the EE effort. It’s a more granular framework that builds upon the 5 pillars by organizing them into four distinct dimensions.

*   **Depth:** This dimension concerns *how deeply* you're exploring the customer experience.  It ranges from shallow (e.g., simply redesigning a website) to deep (e.g., fundamentally rethinking your business model around the customer).  This is about the level of analysis and the scope of changes.
*   **Breadth:** This dimension refers to *how many* customer journeys and touchpoints you are addressing. Are you focused on a core customer segment or attempting to optimize the entire customer experience?
*   **Speed:** This dimension assesses *how quickly* you're implementing changes. Rapid experimentation and iterative improvements are crucial for success. It's about balancing thoroughness with agility.
*   **Scope:** This dimension looks at *what* you are focusing on. Is it a specific channel, product, or customer segment?



**Key Differences & Relationship:**

*   The "5 Pillars" provide a broad strategic framework for EE initiatives.
*   The "Four Dimensions" provide a more tactical framework for planning and executing those initiatives, by breaking them down into measurable aspects.
*   You'd typically use the 5 pillars to *define* the strategic goals for your EE transformation, and then use the 4 dimensions to *plan* how to achieve those goals.

---

**Resources for Further Learning:**

*   **McKinsey & Company – Enterprise Experience:** [https://www.mckinsey.com/industries/retail/our-insights/enterprise-experience](https://www.mckinsey.com/industries/retail/our-insights/enterprise-experience) (This is the primary source for these heuristics).
*   **Harvard Business Review – Enterprise Experience:** [https://hbr.org/topic/enterprise-experience](https://hbr.org/topic/enterprise-experience)

Do you want me to delve deeper into a specific aspect of these heuristics, such as:

*   A particular pillar in more detail?
*   How to apply these heuristics in a specific industry?
*   How these heuristics compare to other customer experience frameworks?